Misc. fixes for Pytorch QA examples: (#16958)

1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed). 2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion). 3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions. 4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
2022-04-27 12:51:39 +00:00
parent 49d5bcb0f3
commit c82e017aa9
6 changed files with 105 additions and 16 deletions
--- a/examples/pytorch/test_accelerate_examples.py
+++ b/examples/pytorch/test_accelerate_examples.py
@@ -200,7 +200,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
        testargs = f"""
            run_qa_no_trainer.py
            --model_name_or_path bert-base-uncased
-            --version_2_with_negative=False
+            --version_2_with_negative
            --train_file tests/fixtures/tests_samples/SQUAD/sample.json
            --validation_file tests/fixtures/tests_samples/SQUAD/sample.json
            --output_dir {tmp_dir}
@@ -216,6 +216,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
        with patch.object(sys, "argv", testargs):
            run_squad_no_trainer.main()
            result = get_results(tmp_dir)
+            # Because we use --version_2_with_negative the testing script uses SQuAD v2 metrics.
            self.assertGreaterEqual(result["eval_f1"], 30)
            self.assertGreaterEqual(result["eval_exact"], 30)
            self.assertTrue(os.path.exists(os.path.join(tmp_dir, "epoch_0")))