Terminator strings for generate() (#28932)

* stash commit (will discard all of this) * stash commit * First commit - needs a lot of testing! * Add a test * Fix imports and make the tests actually test something * Tests pass! * Rearrange test * Add comments (but it's still a bit confusing) * Stop storing the tokenizer * Comment fixup * Fix for input_ids with a single sequence * Update tests to test single sequences * make fixup * Fix incorrect use of isin() * Expand tests to catch more cases * Expand tests to catch more cases * make fixup * Fix length calculation and update tests * Handle Ġ as a space replacement too * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add optimizations from Joao's suggestion * Remove TODO * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * make fixup * Rename some variables and remove some debugging clauses for clarity * Add tests for the sub-methods * Clarify one test slightly * Add stop_strings to GenerationConfig * generate() supports stop_string arg, asks for tokenizer if not provided * make fixup * Cleanup code and rename variables for clarity * Update tokenizer error * Update tokenizer passing, handle generation on GPU * Slightly more explanation cleanup * More comment cleanup * Factor out the token cleanup so it's more obvious what we're doing, and we can change it later * Careful with that cleanup! * Cleanup + optimizations to _get_matching_positions * More minor performance tweaks * Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms) * Remove the pin_memory call * Parallelize across all stop strings! * Quick fix for tensor devices * Update embeddings test for the new format * Fix test imports * Manual patching for BERT-like tokenizers * Return a bool vector instead of a single True/False * Better comment * Better comment * Add tests from @zucchini-nlp * Amy's list creation nit * tok_list -> token_list * Push a big expanded docstring (should we put it somewhere else?) * Expand docstrings * Docstring fixups * Rebase * make fixup * Make a properly general method for figuring out token strings * Fix naming throughout the functions * Move cache, refactor, fix tests * Add comment * Remove finished TODO * Remove finished TODO * make fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update and shorten docstring * Update tests to be shorter/clearer and test specific cases --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 14:13:04 +01:00
parent 0e9d44d7a1
commit 0d84901cb7
6 changed files with 529 additions and 4 deletions
--- a/tests/generation/test_utils.py
+++ b/tests/generation/test_utils.py
@@ -2330,6 +2330,43 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi

        self.assertListEqual(outputs, ["Wie alt sind Sie?"])

+    @slow
+    def test_per_row_stopping_criteria(self):
+        text = [
+            "They completed the challenging puzzle, revealing the hidden",
+            "Today a dragon flew over France",
+            "The aroma of freshly baked pizza filled the kitchen",
+        ]
+        stop_strings = ["secrets"]
+
+        model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2").to(torch_device)
+        tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+        tokenizer.padding_side = "left"
+        tokenizer.pad_token_id = tokenizer.eos_token_id
+        input_ids = tokenizer(text, return_tensors="pt", padding="longest", add_special_tokens=False).input_ids.to(
+            torch_device
+        )
+
+        # normal generation with one stopping criteria
+        out = model.generate(input_ids, max_length=15)
+        out_text = tokenizer.batch_decode(out)
+        expected_out = [
+            "They completed the challenging puzzle, revealing the hidden secrets of the world.\n",
+            "<|endoftext|><|endoftext|><|endoftext|>Today a dragon flew over France and the French government was forced",
+            "The aroma of freshly baked pizza filled the kitchen with a sense of freshness",
+        ]
+        self.assertListEqual(out_text, expected_out)
+
+        # generation should stop at "secrets" for first batch only, filling the rest with eos tokens
+        out = model.generate(input_ids, max_length=15, stop_strings=stop_strings, tokenizer=tokenizer)
+        out_text = tokenizer.batch_decode(out)
+        expected_out = [
+            "They completed the challenging puzzle, revealing the hidden secrets<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>",
+            "<|endoftext|><|endoftext|><|endoftext|>Today a dragon flew over France and the French government was forced",
+            "The aroma of freshly baked pizza filled the kitchen with a sense of freshness",
+        ]
+        self.assertListEqual(out_text, expected_out)
+
    def test_constrained_beam_search_mixin_type_checks(self):
        # PT-only test: TF doesn't have constrained beam search
        tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/t5-tiny-random")