Gemma2: add cache warning (#32279)
* gemma2 fallback to dynamic cache * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * raise error and dont fallback to dynamic cache * prev will break most forward calls/tests * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * fix copies --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
a30c865f99
commit
7ad784ae9d
@@ -398,6 +398,7 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
||||
|
||||
[[autodoc]] HybridCache
|
||||
- update
|
||||
- get_seq_length
|
||||
- reset
|
||||
|
||||
[[autodoc]] SlidingWindowCache
|
||||
|
||||
@@ -30,6 +30,12 @@ Tips:
|
||||
|
||||
- The original checkpoints can be converted using the conversion script `src/transformers/models/Gemma2/convert_Gemma2_weights_to_hf.py`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
- Gemma2 uses sliding window attention every second layer, which makes it unsuitable for typical kv caching with [`~DynamicCache`] or tuples of tensors. To enable caching in Gemma2 forward call, you must initialize a [`~HybridCache`] instance and pass it as `past_key_values` to the forward call. Note, that you also have to prepare `cache_position` if the `past_key_values` already contains previous keys and values.
|
||||
|
||||
</Tip>
|
||||
|
||||
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Pedro Cuenca](https://huggingface.co/pcuenq) and [Tom Arsen]().
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user