Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)
* Added missing code in exemplary notebook - custom datasets fine-tuning Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification. The missing code concerns adding labels for all but first token in a single word. The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb). * Changes requested in the review - keep the code as simple as possible
This commit is contained in:
@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
|
|||||||
label_ids.append(-100)
|
label_ids.append(-100)
|
||||||
elif word_idx != previous_word_idx: # Only label the first token of a given word.
|
elif word_idx != previous_word_idx: # Only label the first token of a given word.
|
||||||
label_ids.append(label[word_idx])
|
label_ids.append(label[word_idx])
|
||||||
|
else:
|
||||||
|
label_ids.append(-100)
|
||||||
|
previous_word_idx = word_idx
|
||||||
labels.append(label_ids)
|
labels.append(label_ids)
|
||||||
|
|
||||||
tokenized_inputs["labels"] = labels
|
tokenized_inputs["labels"] = labels
|
||||||
|
|||||||
Reference in New Issue
Block a user