Adaptive dynamic number of speculative tokens (#34156)
* initial commit * update strategy * add tradeoff FPR TPR with cost * all probs * fix * fix * fix style * Update src/transformers/generation/configuration_utils.py shorter docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * import guard * fix style * add is_sklearn_available condition * vectorizing to flatten the for-loop * fix style * disable adaptation for UAG * update doc * add TestAssistedCandidateGeneratorUpdateStrategy * fix style * protect import * fix style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
This commit is contained in:
@@ -456,6 +456,8 @@ just like in multinomial sampling. However, in assisted decoding, reducing the t
|
||||
['Alice and Bob, a couple of friends of mine, who are both in the same office as']
|
||||
```
|
||||
|
||||
We recommend to install `scikit-learn` library to enhance the candidate generation strategy and achieve additional speedup.
|
||||
|
||||
#### Universal Assisted Decoding
|
||||
|
||||
Universal Assisted Decoding (UAD) adds support for main and assistant models with different tokenizers.
|
||||
|
||||
Reference in New Issue
Block a user