Fix seq2seq collator padding (#30556)

* fix seq2seq data collator to respect the given padding strategy

further added tests for the seq2seq data collator in the style of the `data_collator_for_token_classification` (pt, tf, np)

* formatting and change bool equals "==" to "is"

* add missed return types in tests

* update numpy test as it can handle unequal shapes, not like pt or tf
This commit is contained in:
Anton Vlasjuk
2024-04-30 19:32:30 +02:00
committed by GitHub
parent 78a57c5e1a
commit 9112520b15
3 changed files with 221 additions and 3 deletions

View File

@@ -122,7 +122,8 @@ class ModelArguments:
metadata={"help": "Deprecated. Please use the `language` and `task` arguments instead."},
)
suppress_tokens: List[int] = field(
default=None, metadata={
default=None,
metadata={
"help": (
"Deprecated. The use of `suppress_tokens` should not be required for the majority of fine-tuning examples."
"Should you need to use `suppress_tokens`, please manually update them in the fine-tuning script directly."