Lysandre Debut
88ead3f518
Fix responses add tests ( #39848 )
...
* Quick responses fix
* [serve] Fix responses API and add tests
* Remove typo
* Remove typo
* Tests
2025-08-01 18:06:08 +02:00
Lysandre Debut
a0e5a7d34b
Transformers serve VLM ( #39454 )
...
* Add support for VLMs in Transformers Serve
* Raushan comments
* Update src/transformers/commands/serving.py
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
* Quick fix
* CPU -> Auto
* Update src/transformers/commands/serving.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* Fixup
---------
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-07-23 17:03:18 +02:00
Joao Gante
bf6c997685
[serve] Add speech to text (/v1/audio/transcriptions) ( #39434 )
...
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
* experiment: base model from typed dict
* audio working
* fix bad rebase
* load audio with librosa
* implement timed models
* almost working
* make fixup
* fix tests
* transcription request type
* tokenizer -> processor
* add example in docs
---------
Co-authored-by: Lysandre <hi@lysand.re >
2025-07-17 14:29:57 +00:00
Lysandre Debut
de5ca373ac
Responses API in transformers serve ( #39155 )
...
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Responses API (to be merged into #39155 ) (#39338 )
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
---------
Co-authored-by: Lysandre <hi@lysand.re >
* Slight bugfixes
* PR comments from #39338
* make fixup
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: Joao Gante <joao@huggingface.co >
2025-07-16 14:16:16 +02:00
Joao Gante
df49b399dc
[tests] tag serve tests as slow ( #39343 )
...
* maybe they need more cpu resources?
* add todo
2025-07-10 15:40:08 +00:00
Joao Gante
38c3931362
[server] add tests and fix passing a custom generation_config ( #39230 )
...
* add tests; fix passing a custom generation_config
* tool integration test
* add install step
* add accelerate as dep to serving
* add todo
2025-07-10 13:41:38 +00:00
Lysandre Debut
ed36f8490e
Licenses ( #39127 )
...
* Licenses
* Licenses
2025-06-30 15:25:36 +02:00
Lysandre Debut
e8f90b5397
Split transformers chat and transformers serve ( #38443 )
...
* Next token
* Split chat and serve
* Support both generation methods
* Style
* Generation Config
* temp
* temp
* Finalize serving.py
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com >
* Finalize chat.py
* Update src/transformers/commands/serving.py
Co-authored-by: célina <hanouticelina@gmail.com >
* Lucain's comments
Co-authored-by: Lucain <lucain@huggingface.co >
* Update
* Last comments on PR
* Better error handling
* Better error handling
* CI errors
* CI errors
* Add tests
* Fix tests
* Fix tests
* [chat] Split chat/serve (built on top of lysandre's PR) (#39031 )
* Next token
* Split chat and serve
* Support both generation methods
* Style
* Generation Config
* temp
* temp
* Finalize serving.py
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com >
* Finalize chat.py
* Update src/transformers/commands/serving.py
Co-authored-by: célina <hanouticelina@gmail.com >
* Lucain's comments
Co-authored-by: Lucain <lucain@huggingface.co >
* Update
* Last comments on PR
* Better error handling
* Better error handling
* CI errors
* CI errors
* Add tests
* Fix tests
* Fix tests
* streaming tool call
* abstract tool state; set tool start as eos
* todos
* server working on models without tools
* rm chat's deprecated flags
* chat defaults
* kv cache persists across calls
* add server docs
* link
* Update src/transformers/commands/serving.py
* Apply suggestions from code review
* i love merge conflicts
* solve multi turn with tiny-agents
* On the fly switching of the models
* Remove required positional arg
---------
Co-authored-by: Lysandre <hi@lysand.re >
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com >
Co-authored-by: Lucain <lucain@huggingface.co >
* Protect names
* Fix tests
---------
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com >
Co-authored-by: Lucain <lucain@huggingface.co >
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-06-30 15:10:53 +02:00