From 5a42bb431e15d74a3ab53318d14d53a70dae02cc Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Mon, 21 Mar 2022 09:37:18 -0700 Subject: [PATCH] Update troubleshoot with more content (#16243) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 📝 first draft * 🖍 apply feedback --- docs/source/troubleshooting.mdx | 55 ++++++++++++++++++++++++++++++++- 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/docs/source/troubleshooting.mdx b/docs/source/troubleshooting.mdx index 318a94228e..ea0724cd4e 100644 --- a/docs/source/troubleshooting.mdx +++ b/docs/source/troubleshooting.mdx @@ -120,4 +120,57 @@ Another option is to get a better traceback from the GPU. Add the following envi >>> import os >>> os.environ["CUDA_LAUNCH_BLOCKING"] = "1" -``` \ No newline at end of file +``` + +## Incorrect output when padding tokens aren't masked + +In some cases, the output `hidden_state` may be incorrect if the `input_ids` include padding tokens. To demonstrate, load a model and tokenizer. You can access a model's `pad_token_id` to see its value. The `pad_token_id` may be `None` for some models, but you can always manually set it. + +```py +>>> from transformers import AutoModelForSequenceClassification +>>> import torch + +>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") +>>> model.config.pad_token_id +0 +``` + +The following example shows the output without masking the padding tokens: + +```py +>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]]) +>>> output = model(input_ids) +>>> print(output.logits) +tensor([[ 0.0082, -0.2307], + [ 0.1317, -0.1683]], grad_fn=) +``` + +Here is the actual output of the second sequence: + +```py +>>> input_ids = torch.tensor([[7592]]) +>>> output = model(input_ids) +>>> print(output.logits) +tensor([[-0.1008, -0.4061]], grad_fn=) +``` + +Most of the time, you should provide an `attention_mask` to your model to ignore the padding tokens to avoid this silent error. Now the output of the second sequence matches its actual output: + + + +By default, the tokenizer creates an `attention_mask` for you based on your specific tokenizer's defaults. + + + +```py +>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]]) +>>> output = model(input_ids, attention_mask=attention_mask) +>>> print(output.logits) +tensor([[ 0.0082, -0.2307], + [-0.1008, -0.4061]], grad_fn=) +``` + +🤗 Transformers doesn't automatically create an `attention_mask` to mask a padding token if it is provided because: + +- Some models don't have a padding token. +- For some use-cases, users want a model to attend to a padding token. \ No newline at end of file