Docs: add more cross-references to the KV cache docs (#33323)
* add more cross-references * nit * import guard * more import guards * nit * Update src/transformers/generation/configuration_utils.py
This commit is contained in:
@@ -662,7 +662,7 @@ Using the key-value cache has two advantages:
|
||||
- Significant increase in computational efficiency as less computations are performed compared to computing the full \\( \mathbf{QK}^T \\) matrix. This leads to an increase in inference speed
|
||||
- The maximum required memory is not increased quadratically with the number of generated tokens, but only increases linearly.
|
||||
|
||||
> One should *always* make use of the key-value cache as it leads to identical results and a significant speed-up for longer input sequences. Transformers has the key-value cache enabled by default when making use of the text pipeline or the [`generate` method](https://huggingface.co/docs/transformers/main_classes/text_generation).
|
||||
> One should *always* make use of the key-value cache as it leads to identical results and a significant speed-up for longer input sequences. Transformers has the key-value cache enabled by default when making use of the text pipeline or the [`generate` method](https://huggingface.co/docs/transformers/main_classes/text_generation). We have an entire guide dedicated to caches [here](./kv_cache).
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user