Add a static cache that offloads to the CPU or other device (#32161)

* Add a static cache that offloads to the CPU or other device

* Fix PR comments, add unit-tests
This commit is contained in:
Gerben van V
2024-08-29 11:51:09 +02:00
committed by GitHub
parent 92a75ff6b1
commit 5129671290
7 changed files with 350 additions and 19 deletions

View File

@@ -390,6 +390,11 @@ A [`Constraint`] can be used to force the generation to include specific tokens
- get_seq_length
- reset
[[autodoc]] OffloadedStaticCache
- update
- get_seq_length
- reset
[[autodoc]] HybridCache
- update
- get_seq_length