[trainer] document resume randomness (#11588)
* document resume randomness * fix link * reword * fix * reword * style
This commit is contained in:
@@ -119,6 +119,20 @@ TFTrainingArguments
|
|||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
Randomness
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the
|
||||||
|
`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint,
|
||||||
|
which should make the "stop and resume" style of training as close as possible to non-stop training.
|
||||||
|
|
||||||
|
However, due to various default non-deterministic pytorch settings this might not fully work. If you want full
|
||||||
|
determinism please refer to `Controlling sources of randomness
|
||||||
|
<https://pytorch.org/docs/stable/notes/randomness.html>`__. As explained in the document, that some of those settings
|
||||||
|
that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this
|
||||||
|
can't be done by default, but you can enable those yourself if needed.
|
||||||
|
|
||||||
|
|
||||||
Trainer Integrations
|
Trainer Integrations
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user