Merge pull request #1508 from tlkh/master
Added performance enhancements (XLA, AMP) to examples
This commit is contained in:
@@ -5,6 +5,7 @@ similar API between the different models.
|
|||||||
|
|
||||||
| Section | Description |
|
| Section | Description |
|
||||||
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| [TensorFlow 2.0 models on GLUE](#TensorFlow-2.0-Bert-models-on-GLUE) | Examples running BERT TensorFlow 2.0 model on the GLUE tasks.
|
||||||
| [Language Model fine-tuning](#language-model-fine-tuning) | Fine-tuning the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. |
|
| [Language Model fine-tuning](#language-model-fine-tuning) | Fine-tuning the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. |
|
||||||
| [Language Generation](#language-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. |
|
| [Language Generation](#language-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. |
|
||||||
| [GLUE](#glue) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. |
|
| [GLUE](#glue) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. |
|
||||||
@@ -12,6 +13,28 @@ similar API between the different models.
|
|||||||
| [Multiple Choice](#multiple-choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks.
|
| [Multiple Choice](#multiple-choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks.
|
||||||
| [Named Entity Recognition](#named-entity-recognition) | Using BERT for Named Entity Recognition (NER) on the CoNLL 2003 dataset, examples with distributed training. |
|
| [Named Entity Recognition](#named-entity-recognition) | Using BERT for Named Entity Recognition (NER) on the CoNLL 2003 dataset, examples with distributed training. |
|
||||||
|
|
||||||
|
## TensorFlow 2.0 Bert models on GLUE
|
||||||
|
|
||||||
|
Based on the script [`run_tf_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/run_tf_glue.py).
|
||||||
|
|
||||||
|
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: [General Language Understanding Evaluation](https://gluebenchmark.com/).
|
||||||
|
|
||||||
|
This script has an option for mixed precision (Automatic Mixed Precision / AMP) to run models on Tensor Cores (NVIDIA Volta/Turing GPUs) and future hardware and an option for XLA, which uses the XLA compiler to reduce model runtime.
|
||||||
|
Options are toggled using `USE_XLA` or `USE_AMP` variables in the script.
|
||||||
|
These options and the below benchmark are provided by @tlkh.
|
||||||
|
|
||||||
|
Quick benchmarks from the script (no other modifications):
|
||||||
|
|
||||||
|
| GPU | Mode | Time (2nd epoch) | Val Acc (3 runs) |
|
||||||
|
| --------- | -------- | ----------------------- | ----------------------|
|
||||||
|
| Titan V | FP32 | 41s | 0.8438/0.8281/0.8333 |
|
||||||
|
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
|
||||||
|
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
|
||||||
|
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
|
||||||
|
| 1080 Ti | FP32 | 55s | - |
|
||||||
|
|
||||||
|
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
|
||||||
|
|
||||||
## Language model fine-tuning
|
## Language model fine-tuning
|
||||||
|
|
||||||
Based on the script [`run_lm_finetuning.py`](https://github.com/huggingface/transformers/blob/master/examples/run_lm_finetuning.py).
|
Based on the script [`run_lm_finetuning.py`](https://github.com/huggingface/transformers/blob/master/examples/run_lm_finetuning.py).
|
||||||
|
|||||||
@@ -1,40 +1,63 @@
|
|||||||
|
import os
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
import tensorflow_datasets
|
import tensorflow_datasets
|
||||||
from transformers import BertTokenizer, TFBertForSequenceClassification, glue_convert_examples_to_features, BertForSequenceClassification
|
from transformers import BertTokenizer, TFBertForSequenceClassification, glue_convert_examples_to_features, BertForSequenceClassification
|
||||||
|
|
||||||
# Load dataset, tokenizer, model from pretrained model/vocabulary
|
# script parameters
|
||||||
|
BATCH_SIZE = 32
|
||||||
|
EVAL_BATCH_SIZE = BATCH_SIZE * 2
|
||||||
|
USE_XLA = False
|
||||||
|
USE_AMP = False
|
||||||
|
|
||||||
|
tf.config.optimizer.set_jit(USE_XLA)
|
||||||
|
tf.config.optimizer.set_experimental_options({"auto_mixed_precision": USE_AMP})
|
||||||
|
|
||||||
|
# Load tokenizer and model from pretrained model/vocabulary
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
|
||||||
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
|
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
|
||||||
data = tensorflow_datasets.load('glue/mrpc')
|
|
||||||
|
# Load dataset via TensorFlow Datasets
|
||||||
|
data, info = tensorflow_datasets.load('glue/mrpc', with_info=True)
|
||||||
|
train_examples = info.splits['train'].num_examples
|
||||||
|
valid_examples = info.splits['validation'].num_examples
|
||||||
|
|
||||||
# Prepare dataset for GLUE as a tf.data.Dataset instance
|
# Prepare dataset for GLUE as a tf.data.Dataset instance
|
||||||
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, 128, 'mrpc')
|
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, 128, 'mrpc')
|
||||||
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, 128, 'mrpc')
|
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, 128, 'mrpc')
|
||||||
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
|
train_dataset = train_dataset.shuffle(128).batch(BATCH_SIZE).repeat(-1)
|
||||||
valid_dataset = valid_dataset.batch(64)
|
valid_dataset = valid_dataset.batch(EVAL_BATCH_SIZE)
|
||||||
|
|
||||||
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
|
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
|
||||||
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
|
opt = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08)
|
||||||
|
if USE_AMP:
|
||||||
|
# loss scaling is currently required when using mixed precision
|
||||||
|
opt = tf.keras.mixed_precision.experimental.LossScaleOptimizer(opt, 'dynamic')
|
||||||
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
|
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
|
||||||
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
|
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
|
||||||
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
|
model.compile(optimizer=opt, loss=loss, metrics=[metric])
|
||||||
|
|
||||||
# Train and evaluate using tf.keras.Model.fit()
|
# Train and evaluate using tf.keras.Model.fit()
|
||||||
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
|
train_steps = train_examples//BATCH_SIZE
|
||||||
validation_data=valid_dataset, validation_steps=7)
|
valid_steps = valid_examples//EVAL_BATCH_SIZE
|
||||||
|
|
||||||
|
history = model.fit(train_dataset, epochs=2, steps_per_epoch=train_steps,
|
||||||
|
validation_data=valid_dataset, validation_steps=valid_steps)
|
||||||
|
|
||||||
|
# Save TF2 model
|
||||||
|
os.makedirs('./save/', exist_ok=True)
|
||||||
|
model.save_pretrained('./save/')
|
||||||
|
|
||||||
# Load the TensorFlow model in PyTorch for inspection
|
# Load the TensorFlow model in PyTorch for inspection
|
||||||
model.save_pretrained('./save/')
|
|
||||||
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
|
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
|
||||||
|
|
||||||
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
|
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
|
||||||
sentence_0 = "This research was consistent with his findings."
|
sentence_0 = 'This research was consistent with his findings.'
|
||||||
sentence_1 = "His findings were compatible with this research."
|
sentence_1 = 'His findings were compatible with this research.'
|
||||||
sentence_2 = "His findings were not compatible with this research."
|
sentence_2 = 'His findings were not compatible with this research.'
|
||||||
inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
|
inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
|
||||||
inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
|
inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
|
||||||
|
|
||||||
pred_1 = pytorch_model(**inputs_1)[0].argmax().item()
|
pred_1 = pytorch_model(**inputs_1)[0].argmax().item()
|
||||||
pred_2 = pytorch_model(**inputs_2)[0].argmax().item()
|
pred_2 = pytorch_model(**inputs_2)[0].argmax().item()
|
||||||
print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
|
print('sentence_1 is', 'a paraphrase' if pred_1 else 'not a paraphrase', 'of sentence_0')
|
||||||
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
|
print('sentence_2 is', 'a paraphrase' if pred_2 else 'not a paraphrase', 'of sentence_0')
|
||||||
|
|||||||
Reference in New Issue
Block a user