[WIP] Enable reproducibility for distributed trainings (#16907)

* add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability * change function name to enable determinism, add docstrings, reproducability support for tf * change function name to enable_determinism_for_distributed_training * revert changes in set_seed and call set_seed within enable_full_determinism * add one position argument for seed_worker function * add full_determinism flag in training args and call enable_full_determinism when it is true * add enable_full_determinism to documentation * apply make fixup after the last commit * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-11 15:37:13 +02:00
parent 5229744b26
commit c33f6046c3
6 changed files with 60 additions and 6 deletions
--- a/docs/source/en/internal/trainer_utils.mdx
+++ b/docs/source/en/internal/trainer_utils.mdx
@@ -22,6 +22,8 @@ Most of those are only useful if you are studying the code of the Trainer in the

 [[autodoc]] IntervalStrategy

+[[autodoc]] enable_full_determinism
+
 [[autodoc]] set_seed

 [[autodoc]] torch_distributed_zero_first