From fc8fc400e3944849f02ad15245482617040a8ba1 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Wed, 26 Jan 2022 11:23:32 -0800 Subject: [PATCH] [docs] post-PR merge fix (#15355) * [docs] post-PR merge fix * Update docs/source/main_classes/deepspeed.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- docs/source/main_classes/deepspeed.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/main_classes/deepspeed.mdx b/docs/source/main_classes/deepspeed.mdx index 4685e9acf3..78381264a0 100644 --- a/docs/source/main_classes/deepspeed.mdx +++ b/docs/source/main_classes/deepspeed.mdx @@ -31,7 +31,7 @@ won't be possible on a single GPU. 🤗 Transformers integrates [DeepSpeed](https://github.com/microsoft/DeepSpeed) via 2 options: -1. Integration of the core DeepSpeed features via [`Trainer`]. This is everything done for your type +1. Integration of the core DeepSpeed features via [`Trainer`]. This is an everything-done-for-you type of integration - just supply your custom config file or use our template and you have nothing else to do. Most of this document is focused on this feature. 2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed @@ -604,7 +604,7 @@ The following is an example of configuration for ZeRO stage 2: **Performance tuning:** - enabling `offload_optimizer` should reduce GPU RAM usage (it requires `"stage": 2`) -- `"overlap_comm": true` trade offs increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x +- `"overlap_comm": true` trades off increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x the `allgather_bucket_size` and `reduce_bucket_size` values. So if they are set to 5e8, this requires a 9GB footprint (`5e8 x 2Bytes x 2 x 4.5`). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting OOM-errors you will need to reduce those parameters to about `2e8`, which would require 3.6GB. You will want to do