diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index 9fc88a658a..674f2ce617 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -1627,6 +1627,34 @@ Here is the `documentation `__. +Batch Size +======================================================================================================================= + +To configure batch size, use: + +.. code-block:: json + + { + "train_batch_size": "auto", + "train_micro_batch_size_per_gpu": "auto" + } + +and the :class:`~transformers.Trainer` will automatically set ``train_micro_batch_size_per_gpu`` to the value of +``args.per_device_train_batch_size`` and ``train_batch_size`` to ``args.world_size * args.per_device_train_batch_size * +args.gradient_accumulation_steps``. + +You can also set the values explicitly: + +.. code-block:: json + + { + "train_batch_size": 12, + "train_micro_batch_size_per_gpu": 4 + } + +But then you're on your own synchronizing the :class:`~transformers.Trainer` command line arguments and the DeepSpeed +configuration. + Gradient Accumulation =======================================================================================================================