diff --git a/README.md b/README.md
index 13b239f4e8..06d5a18fba 100644
--- a/README.md
+++ b/README.md
@@ -152,10 +152,10 @@ Here is a detailed documentation of the classes in the package and how to use th
 
 | Sub-section | Description |
 |-|-|
-| [Loading Google AI's pre-trained weigths](#Loading-Google-AI's-pre-trained-weigths-and-PyTorch-dump) | How to load Google AI's pre-trained weight or a PyTorch saved instance |
+| [Loading Google AI's pre-trained weigths](#Loading-Google-AIs-pre-trained-weigths-and-PyTorch-dump) | How to load Google AI's pre-trained weight or a PyTorch saved instance |
 | [PyTorch models](#PyTorch-models) | API of the six PyTorch model classes: `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering` |
-| [Tokenizer: `BertTokenizer`](#Tokenizer:-BertTokenizer) | API of the `BertTokenizer` class|
-| [Optimizer: `BERTAdam`](#Optimizer:-BERTAdam) |  API of the `BERTAdam` class |
+| [Tokenizer: `BertTokenizer`](#Tokenizer-BertTokenizer) | API of the `BertTokenizer` class|
+| [Optimizer: `BERTAdam`](#Optimizer-BERTAdam) |  API of the `BERTAdam` class |
 
 ### Loading Google AI's pre-trained weigths and PyTorch dump
 
@@ -316,6 +316,12 @@ The optimizer accepts the following arguments:
 
 ## Examples
 
+| Sub-section | Description |
+|-|-|
+| [Training large models: introduction, tools and examples](#Training-large-models-introduction,-tools-and-examples) | How to use gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training to train Bert models |
+| [Fine-tuning with BERT: running the examples](#Fine-tuning-with-BERT-running-the-examples) | Running the examples in [`./examples`](./examples/): `extract_classif.py`, `run_classifier.py` and `run_squad.py` |
+| [Fine-tuning BERT-large on GPUs](#Fine-tuning-BERT-large-on-GPUs) | How to fine tune `BERT large`|
+
 ### Training large models: introduction, tools and examples
 
 BERT-base and BERT-large are respectively 110M and 340M parameters models and it can be difficult to fine-tune them on a single GPU with the recommended batch size for good performance (in most case a batch size of 32).