From 79712e7e7a64b53ca21c0a8c63804c949d22adae Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 1 Jun 2021 09:21:21 -0700 Subject: [PATCH] [deepspeed] docs (#11940) * deepspeed docs * cleanup * cleanup --- docs/source/main_classes/trainer.rst | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index 9fc88a658a..674f2ce617 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -1627,6 +1627,34 @@ Here is the `documentation `__. +Batch Size +======================================================================================================================= + +To configure batch size, use: + +.. code-block:: json + + { + "train_batch_size": "auto", + "train_micro_batch_size_per_gpu": "auto" + } + +and the :class:`~transformers.Trainer` will automatically set ``train_micro_batch_size_per_gpu`` to the value of +``args.per_device_train_batch_size`` and ``train_batch_size`` to ``args.world_size * args.per_device_train_batch_size * +args.gradient_accumulation_steps``. + +You can also set the values explicitly: + +.. code-block:: json + + { + "train_batch_size": 12, + "train_micro_batch_size_per_gpu": 4 + } + +But then you're on your own synchronizing the :class:`~transformers.Trainer` command line arguments and the DeepSpeed +configuration. + Gradient Accumulation =======================================================================================================================