Adaptive dynamic number of speculative tokens (#34156)

* initial commit * update strategy * add tradeoff FPR TPR with cost * all probs * fix * fix * fix style * Update src/transformers/generation/configuration_utils.py shorter docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * import guard * fix style * add is_sklearn_available condition * vectorizing to flatten the for-loop * fix style * disable adaptation for UAG * update doc * add TestAssistedCandidateGeneratorUpdateStrategy * fix style * protect import * fix style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-12-05 18:07:33 +02:00
parent b0a51e5cff
commit e27465c801
4 changed files with 177 additions and 2 deletions
--- a/docs/source/en/generation_strategies.md
+++ b/docs/source/en/generation_strategies.md
@@ -456,6 +456,8 @@ just like in multinomial sampling. However, in assisted decoding, reducing the t
 ['Alice and Bob, a couple of friends of mine, who are both in the same office as']
 ```

+We recommend to install `scikit-learn` library to enhance the candidate generation strategy and achieve additional speedup.
+
 #### Universal Assisted Decoding

 Universal Assisted Decoding (UAD) adds support for main and assistant models with different tokenizers.