Joao Gante
295a90cb40
Generate: remove most decoder-only LLMs prepare_inputs_for_generation ( #33870 )
2024-10-09 12:15:48 +01:00
Joao Gante
38f9f10dd9
Cache: revert DynamicCache init for BC ( #33861 )
...
* tmp commit
* tmp commit
* make fixup
* missing removal
* fix condition
* fix end-to-end compilation
* if -> elif
* BC
* BC
* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")
* wups the import
* 🥴
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
2024-10-04 22:47:08 +02:00
pglorio
f319ba16fa
Add Zamba ( #30950 )
...
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* Moved mamba init into `_init_weights`
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Moved mamba init into `_init_weights`
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* make fixup fixes
* quality test fixes
* Fix Zamba model path
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* Update
* circleci fixes
* fix zamba test from merge
* fix ValueError for disabling mamba kernels
* add HF copyright
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* shared_transf --> shared_transformer
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Fixes
* Move attention head dim to config
* Fix circle/ci tests
* Update modeling_zamba.py
* apply GenerationMixin inheritance change from upstream
* apply import ordering
* update needed transformers version for zamba
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* add contribution author
* add @slow to avoid CI
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Define attention_hidden_size
* Added doc for attention_head_size
* trigger CI
* Fix doc of attention_hidden_size
* [run-slow] zamba
* Fixed shared layer logic, swapped up<->gate in mlp
* shared_transformer -> shared_transf
* reformat HybridLayer __init__
* fix docstrings in zamba config
* added definition of _get_input_ids_and_config
* fixed formatting of _get_input_ids_and_config
---------
Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal >
Co-authored-by: Quentin Anthony <qganthony@yahoo.com >
2024-10-04 22:28:05 +02:00
Joao Gante
d29738f5b4
Generate tests: modality-agnostic input preparation ( #33685 )
2024-10-03 14:01:24 +01:00
Marc Sun
cac4a4876b
[Quantization] Switch to optimum-quanto ( #31732 )
...
* switch to optimum-quanto rebase squach
* fix import check
* again
* test try-except
* style
2024-10-02 15:14:34 +02:00
Arthur
19d58d31f1
Add MLLama ( #33703 )
...
* current changes
* nit
* Add cross_attenttion_mask to processor
* multi-image fixed
* Add cross_attenttion_mask to processor
* cross attn works in all cases
* WIP refactoring function for image processor
* WIP refactoring image processor functions
* Refactor preprocess to use global loops instead of list nested list comps
* Docstrings
* Add channels unification
* fix dtype issues
* Update docsrings and format
* Consistent max_image_tiles
* current script
* updates
* Add convert to rgb
* Add image processor tests
* updates!
* update
* god damn it I am dumb sometimes
* Precompute aspect ratios
* now this works, full match
* fix 😉
* nits
* style
* fix model and conversion
* nit
* nit
* kinda works
* hack for sdpa non-contiguous bias
* nits here and there
* latest c hanges
* merge?
* run forward
* Add aspect_ratio_mask
* vision attention mask
* update script and config variable names
* nit
* nits
* be able to load
* style
* nits
* there
* nits
* make forward run
* small update
* enable generation multi-turn
* nit
* nit
* Clean up a bit for errors and typos
* A bit more constant fixes
* 90B keys and shapes match
* Fix for 11B model
* Fixup, remove debug part
* Docs
* Make max_aspect_ratio_id to be minimal
* Update image processing code to match new implementation
* Adjust conversion for final checkpoint state
* Change dim in repeat_interleave (accordig to meta code)
* tmp fix for num_tiles
* Fix for conversion (gate<->up, q/k_proj rope permute)
* nits
* codestyle
* Vision encoder fixes
* pass cross attn mask further
* Refactor aspect ratio mask
* Disable text-only generation
* Fix cross attention layers order, remove q/k norm rotation for cross atention layers
* Refactor gated position embeddings
* fix bugs but needs test with new weights
* rope scaling should be llama3
* Fix rope scaling name
* Remove debug for linear layer
* fix copies
* Make mask prepare private func
* Remove linear patch embed
* Make precomputed embeddings as nn.Embedding module
* MllamaPrecomputedAspectRatioEmbedding with config init
* Remove unused self.output_dim
* nit, intermediate layers
* Rename ln and pos_embed
* vision_chunk_size -> image_size
* return_intermediate -> intermediate_layers_indices
* vision_input_dim -> hidden_size
* Fix copied from statements
* fix most tests
* Fix more copied from
* layer_id->layer_idx
* Comment
* Fix tests for processor
* Copied from for _prepare_4d_causal_attention_mask_with_cache_position
* Style fix
* Add MllamaForCausalLM
* WIP fixing tests
* Remove duplicated layers
* Remove dummy file
* Fix style
* Fix consistency
* Fix some TODOs
* fix language_model instantiation, add docstring
* Move docstring, remove todos for precomputed embeds (we cannot init them properly)
* Add initial docstrings
* Fix
* fix some tests
* lets skip these
* nits, remove print, style
* Add one more copied from
* Improve test message
* Make validate func private
* Fix dummy objects
* Refactor `data_format` a bit + add comment
* typos/nits
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
* fix dummy objects and imports
* Add chat template config json
* remove num_kv_heads from vision attention
* fix
* move some commits and add more tests
* fix test
* Remove `update_key_name` from modeling utils
* remove num-kv-heads again
* some prelimiary docs
* Update chat template + tests
* nit, conversion script max_num_tiles from params
* Fix warning for text-only generation
* Update conversion script for instruct models
* Update chat template in converstion + test
* add tests for CausalLM model
* model_max_length, avoid null chat_template
* Refactor conversion script
* Fix forward
* Fix integration tests
* Refactor vision config + docs
* Fix default
* Refactor text config
* Doc fixes
* Remove unused args, fix docs example
* Squashed commit of the following:
commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com >
Date: Wed Sep 18 13:39:15 2024 +0000
Move model + add output hidden states and output attentions
* Fix num_channels
* Add mllama text and mllama vision models
* Fixing repo consistency
* Style fix
* Fixing repo consistency
* Fixing unused config params
* Fix failed tests after refactoring
* hidden_activation -> hidden_act for text mlp
* Remove from_pretrained from sub-configs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Reuse lambda in conversion script
* Remove run.py
* Update docs/source/en/model_doc/mllama.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/models/mllama/processing_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Remove unused LlamaTokenizerFast
* Fix logging
* Refactor gating
* Remove cycle for collecting intermediate states
* Refactor text-only check, add integration test for text-only
* Revert from pretrained to configs
* Fix example
* Add auto `bos_token` adding in processor
* Fix tips
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Enable supports_gradient_checkpointing model flag
* add eager/sdpa options
* don't skip attn tests and bring back GC skips (did i really remove those?)
* Fix signature, but get error with None gradient
* Fix output attention tests
* Disable GC back
* Change no split modules
* Fix dropout
* Style
* Add Mllama to sdpa list
* Add post init for vision model
* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model
* if skipped, say it, don't pass
* Clean vision tester config
* Doc for args
* Update tests/models/mllama/test_modeling_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Add cross_attention_mask to test
* typehint
* Remove todo
* Enable gradient checkpointing
* Docstring
* Style
* Fixing and skipping some tests for new cache
* Mark flaky test
* Skip `test_sdpa_can_compile_dynamic` test
* Fixing some offload tests
* Add direct GenerationMixin inheritance
* Remove unused code
* Add initializer_range to vision config
* update the test to make sure we show if split
* fix gc?
* Fix repo consistency
* Undo modeling utils debug changes
* Fix link
* mllama -> Mllama
* [mllama] -> [Mllama]
* Enable compile test for CausalLM model (text-only)
* Fix TextModel prefix
* Update doc
* Docs for forward, type hints, and vision model prefix
* make sure to reset
* fix init
* small script refactor and styling
* nit
* updates!
* some nits
* Interpolate embeddings for 560 size and update integration tests
* nit
* does not suppor static cache!
* update
* fix
* nit2
* this?
* Fix conversion
* Style
* 4x memory improvement with image cache AFAIK
* Token decorator for tests
* Skip failing tests
* update processor errors
* fix split issues
* style
* weird
* style
* fix failing tests
* update
* nit fixing the whisper tests
* fix path
* update
---------
Co-authored-by: raushan <raushan@huggingface.co >
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal >
Co-authored-by: qubvel <qubvel@gmail.com >
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
2024-09-25 19:56:25 +02:00
Jonathan Mamou
52daf4ec76
🚨 🚨 Setting default behavior of assisted decoding ( #33657 )
2024-09-25 09:39:09 +01:00
Joao Gante
a7734238ff
Generation tests: update imagegpt input name, remove unused functions ( #33663 )
2024-09-24 16:40:48 +01:00
Joao Gante
e15687fffe
Generation: deprecate PreTrainedModel inheriting from GenerationMixin ( #33203 )
2024-09-23 18:28:36 +01:00
Yih-Dar
077b552f07
Fix some missing tests in circleci ( #33559 )
...
* fix
* fix
* fix
* fix
* skip
* skip more
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-09-20 20:58:51 +02:00
Duc-Viet Hoang
dc8b6eaeee
Fix contrastive search to correctly handle input with padding ( #33507 )
...
* fix: handle padding in contrastive search for decoder-only models
* fix: handle padding in contrastive search for encoder-decoder models
* tests: move padding contrastive test to test_util, add t5 test
* fix: handle if model_kwargs["decoder_attention_mask"] is None
* refactor: improve padding input contrastive search generation tests
* chore: _ranking_fast to use LongTensor for cosine_matrix_mask
2024-09-20 16:52:08 +01:00
Yih-Dar
31caf0b95f
Fix missing test in torch_job ( #33593 )
...
fix missing tests
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-09-20 17:16:44 +02:00
Joao Gante
2fdb5e74cc
VLM generate: tests can't generate image/video tokens ( #33623 )
2024-09-20 15:43:27 +01:00
Joao Gante
266d0a6375
Generate: remove flakyness in test_generate_from_inputs_embeds_decoder_only ( #33602 )
...
almost zero is not zero
2024-09-20 14:50:42 +02:00
Vladislav Bronzov
162056a3f4
change sequence_bias type of SequenceBiasLogitsProcessor to list, add… ( #33375 )
...
* change sequence_bias type of SequenceBiasLogitsProcessor tp list, add config tests for all processors
* fix format
* small fix for all_token_bias_pairs_are_valid internal func
* small typo fix in description
* improve test impl, some SequenceBiasLogitsProcessor refactoring
2024-09-19 17:35:44 +01:00
Raushan Turganbay
d7975a5874
VLMs: enable generation tests ( #33533 )
...
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2024-09-19 12:04:24 +02:00
Marc Sun
6cc4dfe3f1
Fix the initialization of the cache when we have multi gpu ( #33303 )
...
* init cache multi-gpu
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* switch to execution device map
* naming more consistant
* fix
* mutually exclusive device
* added an integration example
* remove useless check
* suggestion from joao + typing
* fix couple of typo and add test
* revert check
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2024-09-13 15:06:08 +02:00
Jonathan Mamou
7a51cbc65f
Dynamic number of speculative tokens in order to accelerate speculative decoding ( #33258 )
...
* optimal Speculation Lookahead based on probability
* update peer finished condition
* add support to do_sample True
* add stopping criteria
* gitignore
* add print
* remove prints
* minor
* minor
* git ignore
* adding test to stopping ConfidenceCriteria
* doc + format
* add doc
* Update .gitignore
* update docstring and default value of assistant_confidence_threshold
* add docstring
* Update src/transformers/generation/configuration_utils.py
implicit default value (None)
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* style fix
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2024-09-11 14:22:28 +02:00
Raushan Turganbay
1759bb9126
Fix: StaticCache & inputs_embeds ( #32932 )
...
squash commit
2024-09-06 12:56:59 +05:00
Raushan Turganbay
43df47d8e7
Llava Onevision: add model ( #32673 )
...
* working version
* fix copies
* update
* tests
* update docs
* codestyle
* add more tests
* add returns for docs
* clean up
* Update src/transformers/models/llava_onevision/processing_llava_onevision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* updates
* codestyle
* style
* shouldn't be reversed
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* add pooling in videos
* [run-slow] llava_onevision
* num-logits-to-keep
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* video matched orig impl
* fix tests
* chat template was modified
* Update docs/source/en/model_doc/llava_onevision.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* add morer info in the doc page
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-09-05 14:43:20 +05:00
Joao Gante
d750b509fc
Config: unified logic to retrieve text config ( #33219 )
2024-09-04 12:03:30 +01:00
Joao Gante
97c0f45b9c
Generate: fix assistant in different device ( #33257 )
2024-09-02 14:37:49 +01:00
Joao Gante
eb5b968c5d
Generate: throw warning when return_dict_in_generate is False but should be True ( #33146 )
2024-08-31 10:47:08 +01:00
Arthur
b017a9eb11
Refactor CI: more explicit ( #30674 )
...
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2024-08-30 18:17:25 +02:00
Joao Gante
c6b23fda65
Llama: make slow tests green 🟢 ( #33138 )
2024-08-27 14:44:42 +01:00
Aya
7562366d4b
fix: multilingual midel convert to tflite get wrong token ( #32079 )
...
* fix: multilingual midel convert to tflite get wrong token
* fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min
---------
Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com >
Co-authored-by: Aya <[kent831217@gmail.com ]>
2024-08-27 11:44:09 +02:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 ( #32659 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-08-23 11:12:53 +01:00
Cyril Vallez
22e6f14525
Reducing memory usage: removing useless logits computation in generate() ( #31292 )
...
* Add .float() in all generation methods logit outputs
* Switch float-casting of logits to training only for main models
* Add `num_logits_to_keep` in Llama and add it by default in generate
* Apply style
* Add num_logits_to_keep as arg in prepare_input_for_generation
* Add support for Mistral
* Revert models except llama and mistral
* Fix default None value in _supports_num_logits_to_keep()
* Fix dimension of dummy input
* Add exception for prophetnet in _supports_num_logits_to_keep()
* Update _supports_num_logits_to_keep() to use inspect.signature()
* Add deprecation cycle + remove modification with pretraining_tp
* Apply style
* Add most used models
* Apply style
* Make `num_logits_to_keep` an int in all cases to remove if-else clause
* Add compile check for the warning
* Fix torch versions
* style
* Add gemma2
* Update warning version
* Add comment about .float operations in generation utils
* Add tests in GenerationTesterMixin and ModelTesterMixin
* Fix batch size for assisted decoding in tests
* fix small issues in test
* refacor test
* fix slicing removing dim issue
* Add nemotron support (should fix check-copy issue in CIs)
* Trigger new CIs
* Trigger new CIs
* Bump version
* Bump version in TODO
* Trigger CIs
* remove blank space
* Trigger CIs
2024-08-23 11:08:34 +01:00
Joao Gante
a26de15139
Generate: Deprecate returning legacy cache by default; Handle use_cache=False ( #32863 )
2024-08-22 20:01:52 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper and LogitsProcessor ( #32626 )
2024-08-16 11:20:41 +01:00
Raushan Turganbay
a30c865f99
Cache: new Cache format in decoder-only models ( #31421 )
...
* draft bart with new cache
* add cache for decoder-only models
* revert utils
* modify docstring
* revert bart
* minor fixes
* fix copies (not related)
* revert tests
* remove enc-dec related code
* remove bloom
* remove opt (enc-dec)
* update docstring
* git, codegen, gpt_neo, gpt_neox, gpj
* clean up
* copied from statements
* revert
* tmp
* update warning msg
* forgot git
* add more flags
* run-slow git,codegen,gpt_neo,gpt_neox,gpj
* add cache flag to VLMs
* remove files
* style
* video LLMs also need a flag
* style
* llava will go in another PR
* style
* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* copy from
* deprecate until v4.45 and warn if not training
* nit
* fix test
* test static cache
* add more tests and fix models
* fix copies
* return sliding window mask
* run slow tests & fix + codestyle
* one more falcon fix for alibi
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-08-07 10:02:16 +05:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation ( #30788 )
...
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Raushan Turganbay
f739687684
🚨 Bloom support for cache class ( #31445 )
...
* bloom dynamic cache
* bloom follows standard cache format
* no skips for bloom anymore
* use cache position when possible
* clean up
* codestyle
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* pr comments
* isinstance fix
* address comments
* make musicgen test happy
* [run-slow] bloom
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-07-29 10:58:59 +05:00
Raushan Turganbay
4ab33c2d81
Generation: stop at eos for assisted decoding ( #31301 )
...
* fix
* move changes to prompt lookup
* add test
* set eos in assistant model
* style
* fix flakiness
* changes for new `main`
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* add comment to explain
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-07-26 10:16:06 +05:00
Yih-Dar
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-07-25 16:12:23 +02:00
Joao Gante
c38c55f4fb
Generate: store special token tensors under a unique variable name ( #31980 )
...
* rename stuff
* english; this one shouldn't be changed
* add a _ to the new var names
* musicgen
* derp
2024-07-22 14:06:49 +01:00
Yih-Dar
a1a34657d4
Avoid race condition ( #31973 )
...
* [test_all] hub
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-07-15 17:56:24 +02:00
Yung-Sung Chuang
d094d8d9ec
Generate: Add new decoding strategy "DoLa" in .generate() ( #29619 )
...
Co-authored-by: Joao Gante <joao@huggingface.co >
2024-07-09 17:37:38 +01:00
jiqing-feng
7f91f168a1
fix assisted decoding ( #31401 )
...
* fix assisted decoding
* check None
* fix typo
* fix _prepare_special_tokens
* fix style
* fix lint
* add tests for assisted decoding
* fix style
* fix tests check
2024-07-03 09:22:56 +01:00
Joao Gante
82486e5995
🚨 🚨 TextGenerationPipeline: rely on the tokenizer default kwargs ( #31747 )
...
* rely on the tokenizer default kwargs
* fix a few tests
2024-07-02 16:17:42 +02:00
Sanchit Gandhi
a9701953ff
[whisper] static kv cache ( #31166 )
...
* make work with cache abstraction
* correct for static cache
* hacks for compile
* make fast
* fix
* fix pos ids
* generate
* fix sdpa
* fix sdpa cache pos
* fix fa2
* clean fa2
* integrate cache into generate
* make style
* copies
* more copies
* update eager
* update sdpa
* update fa2
* simplify
* use cache pos
* always compute cross-cache for debug
* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co >
* fix fix
* fix fix fix
* more fix
* try encoder-decoder cache (too messy)
* revert encoder-decoder cache
* check cross-attn cache
* use enc-dec dataclass
* use richer enc-dec dataclass
* clean-up
* revert static cache changes
* small fixes
* revert to cpu flag
* fix copies
* add static slow test
* past k/v docstring
* more docstrings
* cache_position docstrings
* add to docs
* add enc-dec cache to docs
* make style
* fix after rebase
* fix beam
* style
* fix generation strategies
* fix most decoder-only tests
* style
* skip test
* more clean up
* small docstrings
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* add todo
* only crop self-attn
* check cache in mixin
* style
* fix re-compile after rebase
* move `is_updated` logic to enc-dec wrapper
* revert back
* revert cache back
* finalise design
* fix
* fix fix
* style
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* deprecate
* updates
* final updates
* style
* style
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-07-02 13:24:15 +01:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Joao Gante
1fd60fec75
RWKV: enable generation tests ( #31490 )
...
* add rwkv tests
* has_attentions set in individual tests
2024-06-20 14:15:01 +01:00
Joao Gante
83259e406d
Mamba: add generative tests ( #31478 )
2024-06-19 10:27:23 +01:00
Matt
28316d0e8b
Fix single letter stop strings ( #31448 )
...
* Fix single letter stop strings
* Change the 0 to a 1 to avoid potential empty vector headaches later
* Restructure for clarity
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Add the unsqueeze
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-06-18 14:07:16 +01:00
Raushan Turganbay
5fabd1e83b
Generation: fix handling of special tokens ( #31254 )
...
* fix special tokens in generatioon
* fix test
* add warning
* fix the check
* warn once
* fix
2024-06-06 15:21:32 +05:00
Raushan Turganbay
83238eeebc
Pass device in Logits Processor's init ( #29804 )
...
* add device in logits processor
* remove device when not needed
* codestyle
* tests
* forgot `melody` version
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* codestyle
* updates
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2024-06-04 10:19:19 +05:00
Ahmed Moubtahij
39b2ff69d6
Token healing ( #30081 )
...
* token healing impl + trie with extensions
* make fixup
* prefix-robust space tokenization
* examples readme and requirements
* make fixup
* allow input prompt and model
* redundant defaults
* Specialized Trie
* make fixup
* updated tests with new inherited Tree
* input ids to auto device_map
* rm unused import
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* naming convention
* Revert "naming convention"
This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.
* naming convention
* last -hopefully- changes
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-06-03 10:53:15 +02:00
Raushan Turganbay
779bc360ff
Watermark: fix tests ( #30961 )
...
* fix tests
* style
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-28 17:07:42 +05:00
Raushan Turganbay
d583f1317b
Quantized KV Cache ( #30483 )
...
* clean-up
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* fixup
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* more suggestions
* mapping if torch available
* run tests & add 'support_quantized' flag
* fix jamba test
* revert, will be fixed by another PR
* codestyle
* HQQ and versatile cache classes
* final update
* typo
* make tests happy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-05-23 17:25:20 +05:00