update readme, file names, removing TF code, moving tests
This commit is contained in:
40
README.md
40
README.md
@@ -1,29 +1,33 @@
|
||||
# PyTorch implementation of Google AI's BERT
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
This is a PyTorch implementation of the [TensorFlow code](https://github.com/google-research/bert) released by Google AI with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
|
||||
|
||||
It is op-for-op reimplementation that can load any pre-trained TensorFlow checkpoint in a PyTorch model (see below).
|
||||
|
||||
## Converting the TensorFlow pre-trained models to Pytorch
|
||||
There are a few differences with the TensorFlow model:
|
||||
|
||||
You can convert the pre-trained weights released by GoogleAI by calling the script `convert_tf_checkpoint_to_pytorch.py`.
|
||||
It takes a TensorFlow checkpoint (`bert_model.ckpt`) containg the pre-trained weights and converts it to a `.bin` file readable for PyTorch.
|
||||
- the PyTorch model has multi-GPU and distributed training capabilities (see below),
|
||||
- there is not TPU support in the current stable version of PyTorch (0.4.1) and as a consequence, the pre-training script are not included in this repo. TPU support is supposed to be available in PyTorch v1.0 that will be released in the coming weeks. We will update the repository with TPU-adapted pre-training scripts when PyTorch will have TPU support. In the meantime, you can use the TensorFlow version to train a model on TPU and import the checkpoint using the following script.
|
||||
|
||||
TensorFlow pre-trained models can be found in the [original TensorFlow code](https://github.com/google-research/bert). We give an example with the `BERT-Base Uncased` model:
|
||||
## Converting a TensorFlow checkpoint (in particular Google's pre-trained models) to Pytorch
|
||||
|
||||
You can convert any TensorFlow checkpoint, and in particular the pre-trained weights released by GoogleAI, by using `convert_tf_checkpoint_to_pytorch.py`.
|
||||
|
||||
This script takes as input a TensorFlow checkpoint (`bert_model.ckpt`) and converts it in a PyTorch dump as a `.bin` that can be imported using the usual `torch.load()` command.
|
||||
|
||||
TensorFlow pre-trained models can be found in the [original TensorFlow code](https://github.com/google-research/bert). Here give an example with the `BERT-Base Uncased` model:
|
||||
|
||||
```shell
|
||||
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
|
||||
export BERT_PYTORCH_DIR=/path/to/pytorch/bert/uncased_L-12_H-768_A-12
|
||||
|
||||
python convert_tf_checkpoint_to_pytorch.py \
|
||||
--tf_checkpoint_path=$BERT_BASE_DIR/bert_model.ckpt \
|
||||
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
|
||||
--pytorch_dump_path=$BERT_PYTORCH_DIR/pytorch_model.bin
|
||||
--pytorch_dump_path=$BERT_BASE_DIR/pytorch_model.bin
|
||||
```
|
||||
|
||||
|
||||
## Fine-tuning with BERT: running the examples
|
||||
|
||||
We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.
|
||||
@@ -40,7 +44,7 @@ Corpus (MRPC) corpus and runs in less than 10 minutes on a single K-80.
|
||||
```shell
|
||||
export GLUE_DIR=/path/to/glue
|
||||
|
||||
python run_classifier_pytorch.py \
|
||||
python run_classifier.py \
|
||||
--task_name MRPC \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
@@ -53,21 +57,21 @@ python run_classifier_pytorch.py \
|
||||
--train_batch_size 32 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mrpc_output_pytorch/
|
||||
--output_dir /tmp/mrpc_output/
|
||||
```
|
||||
|
||||
The next example fine-tunes `BERT-Base` on the SQuAD question answering task.
|
||||
|
||||
The data for SQuAD can be downloaded with the following links and should be saved in a `$SQUAD_DIR` directory.
|
||||
|
||||
* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
|
||||
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
|
||||
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
|
||||
|
||||
|
||||
```shell
|
||||
export SQUAD_DIR=/path/to/SQUAD
|
||||
|
||||
python run_squad_pytorch.py \
|
||||
python run_squad.py \
|
||||
--vocab_file=$BERT_BASE_DIR/vocab.txt \
|
||||
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
|
||||
--init_checkpoint=$BERT_PYTORCH_DIR/pytorch_model.bin \
|
||||
@@ -83,13 +87,11 @@ python run_squad_pytorch.py \
|
||||
--output_dir=../debug_squad/
|
||||
```
|
||||
|
||||
|
||||
## Comparing TensorFlow and PyTorch models
|
||||
|
||||
We also include [a small Notebook](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/Comparing%20TF%20and%20PT%20models.ipynb) we used to verify that the conversion of the weights to PyTorch are consistent with the original TensorFlow weights.
|
||||
Please follow the instructions in the Notebook to run it.
|
||||
|
||||
|
||||
## Note on pre-training
|
||||
|
||||
The original TensorFlow code also release two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
|
||||
@@ -97,9 +99,15 @@ As the authors notice, pre-training BERT is particularly expensive and requires
|
||||
|
||||
We have decided **not** to port these scripts for now and wait for the TPU support on PyTorch (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
|
||||
|
||||
|
||||
## Requirements
|
||||
|
||||
The main dependencies of this code are:
|
||||
|
||||
- PyTorch (>= 0.4.0)
|
||||
- tqdm
|
||||
- tqdm
|
||||
|
||||
To install the dependencies:
|
||||
|
||||
````bash
|
||||
pip install -r ./requirements.txt
|
||||
````
|
||||
|
||||
Reference in New Issue
Block a user