From daf53241d6276c0cd932ee8ce3e5b0a403f392b7 Mon Sep 17 00:00:00 2001 From: Mayank Agarwal Date: Fri, 14 Apr 2023 20:48:15 +0530 Subject: [PATCH] Fix word_ids hyperlink (#22765) * Fix word_ids hyperlink * Add suggested fix --- docs/source/en/tasks/token_classification.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/tasks/token_classification.mdx b/docs/source/en/tasks/token_classification.mdx index 72045b4bcd..b3e1bdad62 100644 --- a/docs/source/en/tasks/token_classification.mdx +++ b/docs/source/en/tasks/token_classification.mdx @@ -121,7 +121,7 @@ As you saw in the example `tokens` field above, it looks like the input has alre However, this adds some special tokens `[CLS]` and `[SEP]` and the subword tokenization creates a mismatch between the input and labels. A single word corresponding to a single label may now be split into two subwords. You'll need to realign the tokens and labels by: -1. Mapping all tokens to their corresponding word with the [`word_ids`](https://huggingface.co/docs/tokenizers/python/latest/api/reference.html#tokenizers.Encoding.word_ids) method. +1. Mapping all tokens to their corresponding word with the [`word_ids`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.BatchEncoding.word_ids) method. 2. Assigning the label `-100` to the special tokens `[CLS]` and `[SEP]` so they're ignored by the PyTorch loss function. 3. Only labeling the first token of a given word. Assign `-100` to other subtokens from the same word.