Lysandre Debut
|
de5ca373ac
|
Responses API in transformers serve (#39155)
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Responses API (to be merged into #39155) (#39338)
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
---------
Co-authored-by: Lysandre <hi@lysand.re>
* Slight bugfixes
* PR comments from #39338
* make fixup
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
|
2025-07-16 14:16:16 +02:00 |
|
Joao Gante
|
df49b399dc
|
[tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?
* add todo
|
2025-07-10 15:40:08 +00:00 |
|
Joao Gante
|
38c3931362
|
[server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config
* tool integration test
* add install step
* add accelerate as dep to serving
* add todo
|
2025-07-10 13:41:38 +00:00 |
|
Lysandre Debut
|
ed36f8490e
|
Licenses (#39127)
* Licenses
* Licenses
|
2025-06-30 15:25:36 +02:00 |
|
Lysandre Debut
|
e8f90b5397
|
Split transformers chat and transformers serve (#38443)
* Next token
* Split chat and serve
* Support both generation methods
* Style
* Generation Config
* temp
* temp
* Finalize serving.py
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
* Finalize chat.py
* Update src/transformers/commands/serving.py
Co-authored-by: célina <hanouticelina@gmail.com>
* Lucain's comments
Co-authored-by: Lucain <lucain@huggingface.co>
* Update
* Last comments on PR
* Better error handling
* Better error handling
* CI errors
* CI errors
* Add tests
* Fix tests
* Fix tests
* [chat] Split chat/serve (built on top of lysandre's PR) (#39031)
* Next token
* Split chat and serve
* Support both generation methods
* Style
* Generation Config
* temp
* temp
* Finalize serving.py
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
* Finalize chat.py
* Update src/transformers/commands/serving.py
Co-authored-by: célina <hanouticelina@gmail.com>
* Lucain's comments
Co-authored-by: Lucain <lucain@huggingface.co>
* Update
* Last comments on PR
* Better error handling
* Better error handling
* CI errors
* CI errors
* Add tests
* Fix tests
* Fix tests
* streaming tool call
* abstract tool state; set tool start as eos
* todos
* server working on models without tools
* rm chat's deprecated flags
* chat defaults
* kv cache persists across calls
* add server docs
* link
* Update src/transformers/commands/serving.py
* Apply suggestions from code review
* i love merge conflicts
* solve multi turn with tiny-agents
* On the fly switching of the models
* Remove required positional arg
---------
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
* Protect names
* Fix tests
---------
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
|
2025-06-30 15:10:53 +02:00 |
|