Fix seq2seq collator padding (#30556)

* fix seq2seq data collator to respect the given padding strategy further added tests for the seq2seq data collator in the style of the `data_collator_for_token_classification` (pt, tf, np) * formatting and change bool equals "==" to "is" * add missed return types in tests * update numpy test as it can handle unequal shapes, not like pt or tf
2024-04-30 19:32:30 +02:00
parent 78a57c5e1a
commit 9112520b15
3 changed files with 221 additions and 3 deletions
--- a/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
+++ b/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
@@ -122,7 +122,8 @@ class ModelArguments:
        metadata={"help": "Deprecated. Please use the `language` and `task` arguments instead."},
    )
    suppress_tokens: List[int] = field(
-        default=None, metadata={
+        default=None,
+        metadata={
            "help": (
                "Deprecated. The use of `suppress_tokens` should not be required for the majority of fine-tuning examples."
                "Should you need to use `suppress_tokens`, please manually update them in the fine-tuning script directly."