diff --git a/docs/README.md b/docs/README.md
index 0d68b6abf9..b076c5de9d 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -86,6 +86,32 @@ It should build the static app that will be available under `/docs/_build/html`
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it
in the source directory. You can then link it to the toc-tree by putting the filename without the extension.
+## Renaming section headers and moving sections
+
+It helps to keep the old links working when renaming section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums and Social media and it'd be make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.
+
+Therefore we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.
+
+So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
+
+```
+Sections that were moved:
+
+[ Section A ]
+```
+and of course if you moved it to another file, then:
+
+```
+Sections that were moved:
+
+[ Section A ]
+```
+
+Use the relative style to link to the new file so that the versioned docs continue to work.
+
+For an example of a rich moved sections set please see the very end of [the Trainer doc](https://github.com/huggingface/transformers/blob/master/docs/source/main_classes/trainer.mdx).
+
+
## Preview the documentation in a pull request
Once you have made your pull request, you can check what the documentation will look like after it's merged by
diff --git a/docs/source/main_classes/trainer.mdx b/docs/source/main_classes/trainer.mdx
index 19ee38c903..dabda44439 100644
--- a/docs/source/main_classes/trainer.mdx
+++ b/docs/source/main_classes/trainer.mdx
@@ -442,109 +442,29 @@ Known caveats:
doing this yourself: `--sharded_ddp "zero_dp_3 auto_wrap"`.
-### DeepSpeed
+Sections that were moved:
-
-Moved to [Trainer DeepSpeed integration](deepspeed#trainer-deepspeed-integration).
-
-
-#### Installation
-
-Moved to [Installation](deepspeed#deepspeed-installation).
-
-
-#### Deployment with multiple GPUs
-
-Moved to [Deployment with multiple GPUs](deepspeed#deepspeed-multi-gpu).
-
-
-#### Deployment with one GPU
-
-Moved to [Deployment with one GPU](deepspeed#deepspeed-one-gpu).
-
-
-#### Deployment in Notebooks
-
-Moved to [Deployment in Notebooks](deepspeed#deepspeed-notebook).
-
-
-#### Configuration
-
-Moved to [Configuration](deepspeed#deepspeed-config).
-
-
-#### Passing Configuration
-
-Moved to [Passing Configuration](deepspeed#deepspeed-config-passing).
-
-
-#### Shared Configuration
-
-Moved to [Shared Configuration](deepspeed#deepspeed-config-shared).
-
-#### ZeRO
-
-Moved to [ZeRO](deepspeed#deepspeed-zero).
-
-##### ZeRO-2 Config
-
-Moved to [ZeRO-2 Config](deepspeed#deepspeed-zero2-config).
-
-##### ZeRO-3 Config
-
-Moved to [ZeRO-3 Config](deepspeed#deepspeed-zero3-config).
-
-
-#### NVMe Support
-
-Moved to [NVMe Support](deepspeed#deepspeed-nvme).
-
-##### ZeRO-2 vs ZeRO-3 Performance
-
-Moved to [ZeRO-2 vs ZeRO-3 Performance](deepspeed#deepspeed-zero2-zero3-performance).
-
-##### ZeRO-2 Example
-
-Moved to [ZeRO-2 Example](deepspeed#deepspeed-zero2-example).
-
-##### ZeRO-3 Example
-
-Moved to [ZeRO-3 Example](deepspeed#deepspeed-zero3-example).
-
-
-#### Optimizer and Scheduler
-
-##### Optimizer
-
-Moved to [Optimizer](deepspeed#deepspeed-optimizer).
-
-
-##### Scheduler
-
-Moved to [Scheduler](deepspeed#deepspeed-scheduler).
-
-#### fp32 Precision
-
-Moved to [fp32 Precision](deepspeed#deepspeed-fp32).
-
-#### Automatic Mixed Precision
-
-Moved to [Automatic Mixed Precision](deepspeed#deepspeed-amp).
-
-#### Batch Size
-
-Moved to [Batch Size](deepspeed#deepspeed-bs).
-
-#### Gradient Accumulation
-
-Moved to [Gradient Accumulation](deepspeed#deepspeed-grad-acc).
-
-
-#### Gradient Clipping
-
-Moved to [Gradient Clipping](deepspeed#deepspeed-grad-clip).
-
-
-#### Getting The Model Weights Out
-
-Moved to [Getting The Model Weights Out](deepspeed#deepspeed-weight-extraction).
+[ DeepSpeed
+| Installation
+| Deployment with multiple GPUs
+| Deployment with one GPU
+| Deployment in Notebooks
+| Configuration
+| Passing Configuration
+| Shared Configuration
+| ZeRO
+| ZeRO-2 Config
+| ZeRO-3 Config
+| NVMe Support
+| ZeRO-2 vs ZeRO-3 Performance
+| ZeRO-2 Example
+| ZeRO-3 Example
+| Optimizer
+| Scheduler
+| fp32 Precision
+| Automatic Mixed Precision
+| Batch Size
+| Gradient Accumulation
+| Gradient Clipping
+| Getting The Model Weights Out
+]