From 5b1ad0eb732a07ccc4ea406fb33dd21c590c80be Mon Sep 17 00:00:00 2001 From: Joao Gante Date: Tue, 16 May 2023 18:54:34 +0100 Subject: [PATCH] Docs: add link to assisted generation blog post (#23397) --- docs/source/en/generation_strategies.mdx | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/docs/source/en/generation_strategies.mdx b/docs/source/en/generation_strategies.mdx index 2b4f9880cf..b59649bae4 100644 --- a/docs/source/en/generation_strategies.mdx +++ b/docs/source/en/generation_strategies.mdx @@ -338,9 +338,8 @@ For the complete list of the available parameters, refer to the [API documentati Assisted decoding is a modification of the decoding strategies above that uses an assistant model with the same tokenizer (ideally a much smaller model) to greedily generate a few candidate tokens. The main model then validates the candidate tokens in a single forward pass, which speeds up the decoding process. Currently, only greedy search -and sampling are supported with assisted decoding, and doesn't support batched inputs. - - +and sampling are supported with assisted decoding, and doesn't support batched inputs. To learn more about assisted +decoding, check [this blog post](https://huggingface.co/blog/assisted-generation). To enable assisted decoding, set the `assistant_model` argument with a model. @@ -364,8 +363,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model. When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency. - - ```python >>> from transformers import AutoModelForCausalLM, AutoTokenizer