[Trainer] memory tracker metrics (#10225)

* memory tracker metrics * go back to eval for somewhat consistency * handle no-gpu case * deal with stackable eval calls * restore callback order * style * simplify the API * add test * docs * consistently use eval_ prefix * improve docs * Update src/transformers/trainer_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename method * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
parent d7f38c5d1d
commit 97e688bc22
7 changed files with 294 additions and 14 deletions
--- a/examples/tests/deepspeed/test_deepspeed.py
+++ b/examples/tests/deepspeed/test_deepspeed.py
@@ -88,8 +88,8 @@ class TestDeepSpeed(TestCasePlus):
            extra_args_str="--do_eval",
            remove_args_str="--do_train",
        )
-        val_metrics = load_json(os.path.join(output_dir, "val_results.json"))
-        assert "val_bleu" in val_metrics
+        val_metrics = load_json(os.path.join(output_dir, "eval_results.json"))
+        assert "eval_bleu" in val_metrics

    # XXX: need to do better validation beyond just that the run was successful
    def run_quick(self, distributed=True, extra_args_str=None, remove_args_str=None):