Files
HuggingFace_transformer/docs/source/en
Nikos Karampatziakis ca59d6f77c Offloaded KV Cache (#31325)
* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams
2024-08-01 14:42:07 +02:00
..
2024-07-22 20:21:59 +02:00
2024-07-08 11:52:47 +01:00
2023-12-20 10:37:23 -08:00
2024-07-08 11:52:47 +01:00
2024-07-17 10:41:43 +05:00
2024-04-18 12:49:43 -04:00
2024-07-08 11:52:47 +01:00
2024-06-12 11:33:00 +01:00