Misc. fixes for Pytorch QA examples: (#16958)
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed). 2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion). 3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions. 4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
This commit is contained in:
@@ -200,7 +200,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
testargs = f"""
|
||||
run_qa_no_trainer.py
|
||||
--model_name_or_path bert-base-uncased
|
||||
--version_2_with_negative=False
|
||||
--version_2_with_negative
|
||||
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
|
||||
--output_dir {tmp_dir}
|
||||
@@ -216,6 +216,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
|
||||
with patch.object(sys, "argv", testargs):
|
||||
run_squad_no_trainer.main()
|
||||
result = get_results(tmp_dir)
|
||||
# Because we use --version_2_with_negative the testing script uses SQuAD v2 metrics.
|
||||
self.assertGreaterEqual(result["eval_f1"], 30)
|
||||
self.assertGreaterEqual(result["eval_exact"], 30)
|
||||
self.assertTrue(os.path.exists(os.path.join(tmp_dir, "epoch_0")))
|
||||
|
||||
Reference in New Issue
Block a user