Docs: add more cross-references to the KV cache docs (#33323)

* add more cross-references

* nit

* import guard

* more import guards

* nit

* Update src/transformers/generation/configuration_utils.py
This commit is contained in:
Joao Gante
2024-09-06 10:22:00 +01:00
committed by GitHub
parent 1759bb9126
commit 2b789f27f3
29 changed files with 99 additions and 57 deletions

View File

@@ -662,7 +662,7 @@ Using the key-value cache has two advantages:
- Significant increase in computational efficiency as less computations are performed compared to computing the full \\( \mathbf{QK}^T \\) matrix. This leads to an increase in inference speed
- The maximum required memory is not increased quadratically with the number of generated tokens, but only increases linearly.
> One should *always* make use of the key-value cache as it leads to identical results and a significant speed-up for longer input sequences. Transformers has the key-value cache enabled by default when making use of the text pipeline or the [`generate` method](https://huggingface.co/docs/transformers/main_classes/text_generation).
> One should *always* make use of the key-value cache as it leads to identical results and a significant speed-up for longer input sequences. Transformers has the key-value cache enabled by default when making use of the text pipeline or the [`generate` method](https://huggingface.co/docs/transformers/main_classes/text_generation). We have an entire guide dedicated to caches [here](./kv_cache).
<Tip warning={true}>