Test: generate with torch.compile(model.forward) as a fast test (#34544)

2025-01-28 14:10:38 +00:00
parent f48ecd7608
commit ece8c42488
25 changed files with 105 additions and 53 deletions
--- a/docs/source/en/kv_cache.md
+++ b/docs/source/en/kv_cache.md
@@ -349,7 +349,7 @@ In case you are using Sink Cache, you have to crop your inputs to that maximum l
 >>> user_prompts = ["Hello, what's your name?", "Btw, yesterday I was on a rock concert."]

 >>> past_key_values = DynamicCache()
->>> max_cache_length = past_key_values.get_max_length()
+>>> max_cache_length = past_key_values.get_max_cache_shape()

 >>> messages = []
 >>> for prompt in user_prompts: