[docs] Performance docs tidy up, part 1 (#23963)

* first pass at the single gpu doc * overview: improved clarity and navigation * WIP * updated intro and deepspeed sections * improved torch.compile section * more improvements * minor improvements * make style * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * feedback addressed * mdx -> md * link fix * feedback addressed --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-07-24 08:57:24 -04:00
parent 54ba8608d0
commit 75317aefb3
4 changed files with 607 additions and 595 deletions
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -111,36 +111,40 @@
 - sections:
    - local: performance
      title: Overview
-    - local: perf_train_gpu_one
-      title: Training on one GPU
-    - local: perf_train_gpu_many
-      title: Training on many GPUs
-    - local: perf_train_cpu
-      title: Training on CPU
-    - local: perf_train_cpu_many
-      title: Training on many CPUs
-    - local: perf_train_tpu
-      title: Training on TPUs
-    - local: perf_train_tpu_tf
-      title: Training on TPU with TensorFlow
-    - local: perf_train_special
-      title: Training on Specialized Hardware
-    - local: perf_infer_cpu
-      title: Inference on CPU
-    - local: perf_infer_gpu_one
-      title: Inference on one GPU
-    - local: perf_infer_gpu_many
-      title: Inference on many GPUs
-    - local: perf_infer_special
-      title: Inference on Specialized Hardware
-    - local: perf_hardware
-      title: Custom hardware for training
+    - sections:
+        - local: perf_train_gpu_one
+          title: Methods and tools for efficient training on a single GPU
+        - local: perf_train_gpu_many
+          title: Multiple GPUs and parallelism
+        - local: perf_train_cpu
+          title: Efficient training on CPU
+        - local: perf_train_cpu_many
+          title: Distributed CPU training
+        - local: perf_train_tpu
+          title: Training on TPUs
+        - local: perf_train_tpu_tf
+          title: Training on TPU with TensorFlow
+        - local: perf_train_special
+          title: Training on Specialized Hardware
+        - local: perf_hardware
+          title: Custom hardware for training
+        - local: hpo_train
+          title: Hyperparameter Search using Trainer API
+      title: Efficient training techniques
+    - sections:
+        - local: perf_infer_cpu
+          title: Inference on CPU
+        - local: perf_infer_gpu_one
+          title: Inference on one GPU
+        - local: perf_infer_gpu_many
+          title: Inference on many GPUs
+        - local: perf_infer_special
+          title: Inference on Specialized Hardware
+      title: Optimizing inference
    - local: big_models
      title: Instantiating a big model
    - local: debugging
-      title: Debugging
-    - local: hpo_train
-      title: Hyperparameter Search using Trainer API
+      title: Troubleshooting
    - local: tf_xla
      title: XLA Integration for TensorFlow Models
  title: Performance and scalability
@@ -182,6 +186,8 @@
    title: Perplexity of fixed-length models
  - local: pipeline_webserver
    title: Pipelines for webserver inference
+  - local: model_memory_anatomy
+    title: Model training anatomy
  title: Conceptual guides
 - sections:
  - sections: