[BIG] pytorch-transformers => transformers
This commit is contained in:
@@ -1,7 +1,7 @@
|
|||||||
version: 2
|
version: 2
|
||||||
jobs:
|
jobs:
|
||||||
build_py3_torch_and_tf:
|
build_py3_torch_and_tf:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
docker:
|
docker:
|
||||||
- image: circleci/python:3.5
|
- image: circleci/python:3.5
|
||||||
resource_class: xlarge
|
resource_class: xlarge
|
||||||
@@ -13,10 +13,10 @@ jobs:
|
|||||||
- run: sudo pip install --progress-bar off .
|
- run: sudo pip install --progress-bar off .
|
||||||
- run: sudo pip install pytest codecov pytest-cov
|
- run: sudo pip install pytest codecov pytest-cov
|
||||||
- run: sudo pip install tensorboardX scikit-learn
|
- run: sudo pip install tensorboardX scikit-learn
|
||||||
- run: python -m pytest -sv ./pytorch_transformers/tests/ --cov
|
- run: python -m pytest -sv ./transformers/tests/ --cov
|
||||||
- run: codecov
|
- run: codecov
|
||||||
build_py3_torch:
|
build_py3_torch:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
docker:
|
docker:
|
||||||
- image: circleci/python:3.5
|
- image: circleci/python:3.5
|
||||||
resource_class: xlarge
|
resource_class: xlarge
|
||||||
@@ -27,11 +27,11 @@ jobs:
|
|||||||
- run: sudo pip install --progress-bar off .
|
- run: sudo pip install --progress-bar off .
|
||||||
- run: sudo pip install pytest codecov pytest-cov
|
- run: sudo pip install pytest codecov pytest-cov
|
||||||
- run: sudo pip install tensorboardX scikit-learn
|
- run: sudo pip install tensorboardX scikit-learn
|
||||||
- run: python -m pytest -sv ./pytorch_transformers/tests/ --cov
|
- run: python -m pytest -sv ./transformers/tests/ --cov
|
||||||
- run: python -m pytest -sv ./examples/
|
- run: python -m pytest -sv ./examples/
|
||||||
- run: codecov
|
- run: codecov
|
||||||
build_py3_tf:
|
build_py3_tf:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
docker:
|
docker:
|
||||||
- image: circleci/python:3.5
|
- image: circleci/python:3.5
|
||||||
resource_class: xlarge
|
resource_class: xlarge
|
||||||
@@ -42,10 +42,10 @@ jobs:
|
|||||||
- run: sudo pip install --progress-bar off .
|
- run: sudo pip install --progress-bar off .
|
||||||
- run: sudo pip install pytest codecov pytest-cov
|
- run: sudo pip install pytest codecov pytest-cov
|
||||||
- run: sudo pip install tensorboardX scikit-learn
|
- run: sudo pip install tensorboardX scikit-learn
|
||||||
- run: python -m pytest -sv ./pytorch_transformers/tests/ --cov
|
- run: python -m pytest -sv ./transformers/tests/ --cov
|
||||||
- run: codecov
|
- run: codecov
|
||||||
build_py2_torch:
|
build_py2_torch:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
resource_class: large
|
resource_class: large
|
||||||
parallelism: 1
|
parallelism: 1
|
||||||
docker:
|
docker:
|
||||||
@@ -55,10 +55,10 @@ jobs:
|
|||||||
- run: sudo pip install torch
|
- run: sudo pip install torch
|
||||||
- run: sudo pip install --progress-bar off .
|
- run: sudo pip install --progress-bar off .
|
||||||
- run: sudo pip install pytest codecov pytest-cov
|
- run: sudo pip install pytest codecov pytest-cov
|
||||||
- run: python -m pytest -sv ./pytorch_transformers/tests/ --cov
|
- run: python -m pytest -sv ./transformers/tests/ --cov
|
||||||
- run: codecov
|
- run: codecov
|
||||||
build_py2_tf:
|
build_py2_tf:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
resource_class: large
|
resource_class: large
|
||||||
parallelism: 1
|
parallelism: 1
|
||||||
docker:
|
docker:
|
||||||
@@ -68,10 +68,10 @@ jobs:
|
|||||||
- run: sudo pip install tensorflow==2.0.0-rc0
|
- run: sudo pip install tensorflow==2.0.0-rc0
|
||||||
- run: sudo pip install --progress-bar off .
|
- run: sudo pip install --progress-bar off .
|
||||||
- run: sudo pip install pytest codecov pytest-cov
|
- run: sudo pip install pytest codecov pytest-cov
|
||||||
- run: python -m pytest -sv ./pytorch_transformers/tests/ --cov
|
- run: python -m pytest -sv ./transformers/tests/ --cov
|
||||||
- run: codecov
|
- run: codecov
|
||||||
deploy_doc:
|
deploy_doc:
|
||||||
working_directory: ~/pytorch-transformers
|
working_directory: ~/transformers
|
||||||
docker:
|
docker:
|
||||||
- image: circleci/python:3.5
|
- image: circleci/python:3.5
|
||||||
steps:
|
steps:
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
[run]
|
[run]
|
||||||
source=pytorch_transformers
|
source=transformers
|
||||||
omit =
|
omit =
|
||||||
# skip convertion scripts from testing for now
|
# skip convertion scripts from testing for now
|
||||||
*/convert_*
|
*/convert_*
|
||||||
|
|||||||
2
.github/ISSUE_TEMPLATE/migration.md
vendored
2
.github/ISSUE_TEMPLATE/migration.md
vendored
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: "\U0001F4DA Migration from PyTorch-pretrained-Bert"
|
name: "\U0001F4DA Migration from PyTorch-pretrained-Bert"
|
||||||
about: Report a problem when migrating from PyTorch-pretrained-Bert to PyTorch-Transformers
|
about: Report a problem when migrating from PyTorch-pretrained-Bert to Transformers
|
||||||
---
|
---
|
||||||
|
|
||||||
## 📚 Migration
|
## 📚 Migration
|
||||||
|
|||||||
48
README.md
48
README.md
@@ -1,8 +1,8 @@
|
|||||||
# 👾 PyTorch-Transformers
|
# 🤗 Transformers
|
||||||
|
|
||||||
[](https://circleci.com/gh/huggingface/pytorch-transformers)
|
[](https://circleci.com/gh/huggingface/transformers)
|
||||||
|
|
||||||
PyTorch-Transformers (formerly known as `pytorch-pretrained-bert`) is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
Transformers (formerly known as `pytorch-pretrained-bert`) is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
||||||
|
|
||||||
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
||||||
|
|
||||||
@@ -13,10 +13,10 @@ The library currently contains PyTorch implementations, pre-trained model weight
|
|||||||
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||||
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||||
7. **[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
7. **[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||||
8. **[DistilBERT](https://github.com/huggingface/pytorch-transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the blogpost [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5
|
8. **[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the blogpost [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5
|
||||||
) by Victor Sanh, Lysandre Debut and Thomas Wolf.
|
) by Victor Sanh, Lysandre Debut and Thomas Wolf.
|
||||||
|
|
||||||
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/pytorch-transformers/examples.html).
|
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
|
||||||
|
|
||||||
| Section | Description |
|
| Section | Description |
|
||||||
|-|-|
|
|-|-|
|
||||||
@@ -24,8 +24,8 @@ These implementations have been tested on several datasets (see the example scri
|
|||||||
| [Online demo](#online-demo) | Experimenting with this repo’s text generation capabilities |
|
| [Online demo](#online-demo) | Experimenting with this repo’s text generation capabilities |
|
||||||
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
|
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
|
||||||
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
|
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
|
||||||
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-pytorch-transformers) | Migrating your code from pytorch-pretrained-bert to pytorch-transformers |
|
| [Migrating from pytorch-pretrained-bert to transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
|
||||||
| [Documentation](https://huggingface.co/pytorch-transformers/) | Full API documentation and more |
|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and more |
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -33,10 +33,10 @@ This repo is tested on Python 2.7 and 3.5+ (examples are tested only on python 3
|
|||||||
|
|
||||||
### With pip
|
### With pip
|
||||||
|
|
||||||
PyTorch-Transformers can be installed by pip as follows:
|
Transformers can be installed by pip as follows:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install pytorch-transformers
|
pip install transformers
|
||||||
```
|
```
|
||||||
|
|
||||||
### From source
|
### From source
|
||||||
@@ -49,14 +49,14 @@ pip install [--editable] .
|
|||||||
|
|
||||||
### Tests
|
### Tests
|
||||||
|
|
||||||
A series of tests is included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/pytorch-transformers/tree/master/pytorch_transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/pytorch-transformers/tree/master/examples).
|
A series of tests is included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
|
||||||
|
|
||||||
These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
||||||
|
|
||||||
You can run the tests from the root of the cloned repository with the commands:
|
You can run the tests from the root of the cloned repository with the commands:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m pytest -sv ./pytorch_transformers/tests/
|
python -m pytest -sv ./transformers/tests/
|
||||||
python -m pytest -sv ./examples/
|
python -m pytest -sv ./examples/
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -80,13 +80,13 @@ You can use it to experiment with completions generated by `GPT2Model`, `Transfo
|
|||||||
|
|
||||||
## Quick tour
|
## Quick tour
|
||||||
|
|
||||||
Let's do a very quick overview of PyTorch-Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/pytorch-transformers/).
|
Let's do a very quick overview of Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import torch
|
import torch
|
||||||
from pytorch_transformers import *
|
from transformers import *
|
||||||
|
|
||||||
# PyTorch-Transformers has a unified API
|
# Transformers has a unified API
|
||||||
# for 7 transformer architectures and 30 pretrained weights.
|
# for 7 transformer architectures and 30 pretrained weights.
|
||||||
# Model | Tokenizer | Pretrained weights shortcut
|
# Model | Tokenizer | Pretrained weights shortcut
|
||||||
MODELS = [(BertModel, BertTokenizer, 'bert-base-uncased'),
|
MODELS = [(BertModel, BertTokenizer, 'bert-base-uncased'),
|
||||||
@@ -299,19 +299,19 @@ python ./examples/run_generation.py \
|
|||||||
--model_name_or_path=gpt2 \
|
--model_name_or_path=gpt2 \
|
||||||
```
|
```
|
||||||
|
|
||||||
## Migrating from pytorch-pretrained-bert to pytorch-transformers
|
## Migrating from pytorch-pretrained-bert to transformers
|
||||||
|
|
||||||
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `pytorch-transformers`
|
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`
|
||||||
|
|
||||||
### Models always output `tuples`
|
### Models always output `tuples`
|
||||||
|
|
||||||
The main breaking change when migrating from `pytorch-pretrained-bert` to `pytorch-transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
|
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
|
||||||
|
|
||||||
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/pytorch-transformers/).
|
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
|
||||||
|
|
||||||
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
|
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
|
||||||
|
|
||||||
Here is a `pytorch-pretrained-bert` to `pytorch-transformers` conversion example for a `BertForSequenceClassification` classification model:
|
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Let's load our model
|
# Let's load our model
|
||||||
@@ -320,11 +320,11 @@ model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
|
|||||||
# If you used to have this line in pytorch-pretrained-bert:
|
# If you used to have this line in pytorch-pretrained-bert:
|
||||||
loss = model(input_ids, labels=labels)
|
loss = model(input_ids, labels=labels)
|
||||||
|
|
||||||
# Now just use this line in pytorch-transformers to extract the loss from the output tuple:
|
# Now just use this line in transformers to extract the loss from the output tuple:
|
||||||
outputs = model(input_ids, labels=labels)
|
outputs = model(input_ids, labels=labels)
|
||||||
loss = outputs[0]
|
loss = outputs[0]
|
||||||
|
|
||||||
# In pytorch-transformers you can also have access to the logits:
|
# In transformers you can also have access to the logits:
|
||||||
loss, logits = outputs[:2]
|
loss, logits = outputs[:2]
|
||||||
|
|
||||||
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
|
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
|
||||||
@@ -339,7 +339,7 @@ Breaking change in the `from_pretrained()`method:
|
|||||||
|
|
||||||
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
|
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
|
||||||
|
|
||||||
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead which can break derived model classes build based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/pytorch-transformers/pull/866) by forwarding the the model `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
|
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead which can break derived model classes build based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
|
||||||
|
|
||||||
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
|
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
|
||||||
|
|
||||||
@@ -396,7 +396,7 @@ for batch in train_data:
|
|||||||
loss.backward()
|
loss.backward()
|
||||||
optimizer.step()
|
optimizer.step()
|
||||||
|
|
||||||
### In PyTorch-Transformers, optimizer and schedules are splitted and instantiated like this:
|
### In Transformers, optimizer and schedules are splitted and instantiated like this:
|
||||||
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
|
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
|
||||||
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
|
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
|
||||||
### and used like this:
|
### and used like this:
|
||||||
@@ -411,4 +411,4 @@ for batch in train_data:
|
|||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
At the moment, there is no paper associated to PyTorch-Transformers but we are working on preparing one. In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.
|
At the moment, there is no paper associated to Transformers but we are working on preparing one. In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.
|
||||||
|
|||||||
@@ -2,6 +2,6 @@ FROM pytorch/pytorch:latest
|
|||||||
|
|
||||||
RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
|
RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
|
||||||
|
|
||||||
RUN pip install pytorch_transformers
|
RUN pip install transformers
|
||||||
|
|
||||||
WORKDIR /workspace
|
WORKDIR /workspace
|
||||||
@@ -16,7 +16,7 @@ function addIcon() {
|
|||||||
function addCustomFooter() {
|
function addCustomFooter() {
|
||||||
const customFooter = document.createElement("div");
|
const customFooter = document.createElement("div");
|
||||||
const questionOrIssue = document.createElement("div");
|
const questionOrIssue = document.createElement("div");
|
||||||
questionOrIssue.innerHTML = "Stuck? Read our <a href='https://medium.com/huggingface'>Blog posts</a> or <a href='https://github.com/huggingface/pytorch_transformers'>Create an issue</a>";
|
questionOrIssue.innerHTML = "Stuck? Read our <a href='https://medium.com/huggingface'>Blog posts</a> or <a href='https://github.com/huggingface/transformers'>Create an issue</a>";
|
||||||
customFooter.appendChild(questionOrIssue);
|
customFooter.appendChild(questionOrIssue);
|
||||||
customFooter.classList.add("footer");
|
customFooter.classList.add("footer");
|
||||||
|
|
||||||
|
|||||||
@@ -15,4 +15,4 @@ In order to help this new field develop, we have included a few additional featu
|
|||||||
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
|
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
|
||||||
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.
|
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.
|
||||||
|
|
||||||
To help you understand and use these features, we have added a specific example script: `bertology.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE.
|
To help you understand and use these features, we have added a specific example script: `bertology.py <https://github.com/huggingface/transformers/blob/master/examples/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE.
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ sys.path.insert(0, os.path.abspath('../..'))
|
|||||||
|
|
||||||
# -- Project information -----------------------------------------------------
|
# -- Project information -----------------------------------------------------
|
||||||
|
|
||||||
project = u'pytorch-transformers'
|
project = u'transformers'
|
||||||
copyright = u'2019, huggingface'
|
copyright = u'2019, huggingface'
|
||||||
author = u'huggingface'
|
author = u'huggingface'
|
||||||
|
|
||||||
@@ -109,7 +109,7 @@ html_static_path = ['_static']
|
|||||||
# -- Options for HTMLHelp output ---------------------------------------------
|
# -- Options for HTMLHelp output ---------------------------------------------
|
||||||
|
|
||||||
# Output file base name for HTML help builder.
|
# Output file base name for HTML help builder.
|
||||||
htmlhelp_basename = 'pytorch-transformersdoc'
|
htmlhelp_basename = 'transformersdoc'
|
||||||
|
|
||||||
|
|
||||||
# -- Options for LaTeX output ------------------------------------------------
|
# -- Options for LaTeX output ------------------------------------------------
|
||||||
@@ -136,7 +136,7 @@ latex_elements = {
|
|||||||
# (source start file, target name, title,
|
# (source start file, target name, title,
|
||||||
# author, documentclass [howto, manual, or own class]).
|
# author, documentclass [howto, manual, or own class]).
|
||||||
latex_documents = [
|
latex_documents = [
|
||||||
(master_doc, 'pytorch-transformers.tex', u'pytorch-transformers Documentation',
|
(master_doc, 'transformers.tex', u'transformers Documentation',
|
||||||
u'huggingface', 'manual'),
|
u'huggingface', 'manual'),
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -146,7 +146,7 @@ latex_documents = [
|
|||||||
# One entry per manual page. List of tuples
|
# One entry per manual page. List of tuples
|
||||||
# (source start file, name, description, authors, manual section).
|
# (source start file, name, description, authors, manual section).
|
||||||
man_pages = [
|
man_pages = [
|
||||||
(master_doc, 'pytorch-transformers', u'pytorch-transformers Documentation',
|
(master_doc, 'transformers', u'transformers Documentation',
|
||||||
[author], 1)
|
[author], 1)
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -157,8 +157,8 @@ man_pages = [
|
|||||||
# (source start file, target name, title, author,
|
# (source start file, target name, title, author,
|
||||||
# dir menu entry, description, category)
|
# dir menu entry, description, category)
|
||||||
texinfo_documents = [
|
texinfo_documents = [
|
||||||
(master_doc, 'pytorch-transformers', u'pytorch-transformers Documentation',
|
(master_doc, 'transformers', u'transformers Documentation',
|
||||||
author, 'pytorch-transformers', 'One line description of project.',
|
author, 'transformers', 'One line description of project.',
|
||||||
'Miscellaneous'),
|
'Miscellaneous'),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ A command-line interface is provided to convert original Bert/GPT/GPT-2/Transfor
|
|||||||
BERT
|
BERT
|
||||||
^^^^
|
^^^^
|
||||||
|
|
||||||
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/pytorch-transformers/blob/master/pytorch_transformers/convert_tf_checkpoint_to_pytorch.py>`_ script.
|
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/transformers/convert_tf_checkpoint_to_pytorch.py>`_ script.
|
||||||
|
|
||||||
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ , `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\ ).
|
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ , `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\ ).
|
||||||
|
|
||||||
@@ -20,7 +20,7 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
|
|||||||
|
|
||||||
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
|
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
|
||||||
|
|
||||||
pytorch_transformers bert \
|
transformers bert \
|
||||||
$BERT_BASE_DIR/bert_model.ckpt \
|
$BERT_BASE_DIR/bert_model.ckpt \
|
||||||
$BERT_BASE_DIR/bert_config.json \
|
$BERT_BASE_DIR/bert_config.json \
|
||||||
$BERT_BASE_DIR/pytorch_model.bin
|
$BERT_BASE_DIR/pytorch_model.bin
|
||||||
@@ -36,7 +36,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
|
|||||||
|
|
||||||
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
|
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
|
||||||
|
|
||||||
pytorch_transformers gpt \
|
transformers gpt \
|
||||||
$OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
|
$OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
|
||||||
$PYTORCH_DUMP_OUTPUT \
|
$PYTORCH_DUMP_OUTPUT \
|
||||||
[OPENAI_GPT_CONFIG]
|
[OPENAI_GPT_CONFIG]
|
||||||
@@ -50,7 +50,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT-2 mode
|
|||||||
|
|
||||||
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
|
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
|
||||||
|
|
||||||
pytorch_transformers gpt2 \
|
transformers gpt2 \
|
||||||
$OPENAI_GPT2_CHECKPOINT_PATH \
|
$OPENAI_GPT2_CHECKPOINT_PATH \
|
||||||
$PYTORCH_DUMP_OUTPUT \
|
$PYTORCH_DUMP_OUTPUT \
|
||||||
[OPENAI_GPT2_CONFIG]
|
[OPENAI_GPT2_CONFIG]
|
||||||
@@ -64,7 +64,7 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
|
|||||||
|
|
||||||
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
|
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
|
||||||
|
|
||||||
pytorch_transformers transfo_xl \
|
transformers transfo_xl \
|
||||||
$TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
|
$TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
|
||||||
$PYTORCH_DUMP_OUTPUT \
|
$PYTORCH_DUMP_OUTPUT \
|
||||||
[TRANSFO_XL_CONFIG]
|
[TRANSFO_XL_CONFIG]
|
||||||
@@ -80,7 +80,7 @@ Here is an example of the conversion process for a pre-trained XLNet model, fine
|
|||||||
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
|
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
|
||||||
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
|
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
|
||||||
|
|
||||||
pytorch_transformers xlnet \
|
transformers xlnet \
|
||||||
$TRANSFO_XL_CHECKPOINT_PATH \
|
$TRANSFO_XL_CHECKPOINT_PATH \
|
||||||
$TRANSFO_XL_CONFIG_PATH \
|
$TRANSFO_XL_CONFIG_PATH \
|
||||||
$PYTORCH_DUMP_OUTPUT \
|
$PYTORCH_DUMP_OUTPUT \
|
||||||
@@ -96,6 +96,6 @@ Here is an example of the conversion process for a pre-trained XLM model:
|
|||||||
|
|
||||||
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
|
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
|
||||||
|
|
||||||
pytorch_transformers xlm \
|
transformers xlm \
|
||||||
$XLM_CHECKPOINT_PATH \
|
$XLM_CHECKPOINT_PATH \
|
||||||
$PYTORCH_DUMP_OUTPUT \
|
$PYTORCH_DUMP_OUTPUT \
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
Pytorch-Transformers
|
Transformers
|
||||||
================================================================================================================================================
|
================================================================================================================================================
|
||||||
|
|
||||||
PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
||||||
|
|
||||||
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
||||||
|
|
||||||
@@ -12,7 +12,7 @@ The library currently contains PyTorch implementations, pre-trained model weight
|
|||||||
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||||
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
|
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
|
||||||
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||||
8. `DistilBERT <https://huggingface.co/pytorch-transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
|
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
Installation
|
Installation
|
||||||
================================================
|
================================================
|
||||||
|
|
||||||
PyTorch-Transformers is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.1.0
|
Transformers is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.1.0
|
||||||
|
|
||||||
With pip
|
With pip
|
||||||
^^^^^^^^
|
^^^^^^^^
|
||||||
@@ -10,7 +10,7 @@ PyTorch Transformers can be installed using pip as follows:
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
pip install pytorch-transformers
|
pip install transformers
|
||||||
|
|
||||||
From source
|
From source
|
||||||
^^^^^^^^^^^
|
^^^^^^^^^^^
|
||||||
@@ -19,15 +19,15 @@ To install from source, clone the repository and install with:
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
git clone https://github.com/huggingface/pytorch-transformers.git
|
git clone https://github.com/huggingface/transformers.git
|
||||||
cd pytorch-transformers
|
cd transformers
|
||||||
pip install [--editable] .
|
pip install [--editable] .
|
||||||
|
|
||||||
|
|
||||||
Tests
|
Tests
|
||||||
^^^^^
|
^^^^^
|
||||||
|
|
||||||
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the `tests folder <https://github.com/huggingface/pytorch-transformers/tree/master/pytorch_transformers/tests>`_ and examples tests in the `examples folder <https://github.com/huggingface/pytorch-transformers/tree/master/examples>`_.
|
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the `tests folder <https://github.com/huggingface/transformers/tree/master/transformers/tests>`_ and examples tests in the `examples folder <https://github.com/huggingface/transformers/tree/master/examples>`_.
|
||||||
|
|
||||||
Tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
Tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
||||||
|
|
||||||
@@ -35,7 +35,7 @@ Run all the tests from the root of the cloned repository with the commands:
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
python -m pytest -sv ./pytorch_transformers/tests/
|
python -m pytest -sv ./transformers/tests/
|
||||||
python -m pytest -sv ./examples/
|
python -m pytest -sv ./examples/
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -6,5 +6,5 @@ The base class ``PretrainedConfig`` implements the common methods for loading/sa
|
|||||||
``PretrainedConfig``
|
``PretrainedConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.PretrainedConfig
|
.. autoclass:: transformers.PretrainedConfig
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -11,5 +11,5 @@ The base class ``PreTrainedModel`` implements the common methods for loading/sav
|
|||||||
``PreTrainedModel``
|
``PreTrainedModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.PreTrainedModel
|
.. autoclass:: transformers.PreTrainedModel
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -9,7 +9,7 @@ The ``.optimization`` module provides:
|
|||||||
``AdamW``
|
``AdamW``
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.AdamW
|
.. autoclass:: transformers.AdamW
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
Schedules
|
Schedules
|
||||||
@@ -18,11 +18,11 @@ Schedules
|
|||||||
Learning Rate Schedules
|
Learning Rate Schedules
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.ConstantLRSchedule
|
.. autoclass:: transformers.ConstantLRSchedule
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.WarmupConstantSchedule
|
.. autoclass:: transformers.WarmupConstantSchedule
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
.. image:: /imgs/warmup_constant_schedule.png
|
.. image:: /imgs/warmup_constant_schedule.png
|
||||||
@@ -30,7 +30,7 @@ Learning Rate Schedules
|
|||||||
:alt:
|
:alt:
|
||||||
|
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.WarmupCosineSchedule
|
.. autoclass:: transformers.WarmupCosineSchedule
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
.. image:: /imgs/warmup_cosine_schedule.png
|
.. image:: /imgs/warmup_cosine_schedule.png
|
||||||
@@ -38,7 +38,7 @@ Learning Rate Schedules
|
|||||||
:alt:
|
:alt:
|
||||||
|
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.WarmupCosineWithHardRestartsSchedule
|
.. autoclass:: transformers.WarmupCosineWithHardRestartsSchedule
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
|
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
|
||||||
@@ -47,7 +47,7 @@ Learning Rate Schedules
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.WarmupLinearSchedule
|
.. autoclass:: transformers.WarmupLinearSchedule
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
.. image:: /imgs/warmup_linear_schedule.png
|
.. image:: /imgs/warmup_linear_schedule.png
|
||||||
|
|||||||
@@ -12,5 +12,5 @@ The base class ``PreTrainedTokenizer`` implements the common methods for loading
|
|||||||
``PreTrainedTokenizer``
|
``PreTrainedTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.PreTrainedTokenizer
|
.. autoclass:: transformers.PreTrainedTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -1,17 +1,17 @@
|
|||||||
# Migrating from pytorch-pretrained-bert
|
# Migrating from pytorch-pretrained-bert
|
||||||
|
|
||||||
|
|
||||||
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `pytorch-transformers`
|
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`
|
||||||
|
|
||||||
### Models always output `tuples`
|
### Models always output `tuples`
|
||||||
|
|
||||||
The main breaking change when migrating from `pytorch-pretrained-bert` to `pytorch-transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
|
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
|
||||||
|
|
||||||
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/pytorch-transformers/).
|
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
|
||||||
|
|
||||||
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
|
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
|
||||||
|
|
||||||
Here is a `pytorch-pretrained-bert` to `pytorch-transformers` conversion example for a `BertForSequenceClassification` classification model:
|
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Let's load our model
|
# Let's load our model
|
||||||
@@ -20,11 +20,11 @@ model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
|
|||||||
# If you used to have this line in pytorch-pretrained-bert:
|
# If you used to have this line in pytorch-pretrained-bert:
|
||||||
loss = model(input_ids, labels=labels)
|
loss = model(input_ids, labels=labels)
|
||||||
|
|
||||||
# Now just use this line in pytorch-transformers to extract the loss from the output tuple:
|
# Now just use this line in transformers to extract the loss from the output tuple:
|
||||||
outputs = model(input_ids, labels=labels)
|
outputs = model(input_ids, labels=labels)
|
||||||
loss = outputs[0]
|
loss = outputs[0]
|
||||||
|
|
||||||
# In pytorch-transformers you can also have access to the logits:
|
# In transformers you can also have access to the logits:
|
||||||
loss, logits = outputs[:2]
|
loss, logits = outputs[:2]
|
||||||
|
|
||||||
# And even the attention weigths if you configure the model to output them (and other outputs too, see the docstrings and documentation)
|
# And even the attention weigths if you configure the model to output them (and other outputs too, see the docstrings and documentation)
|
||||||
@@ -96,7 +96,7 @@ for batch in train_data:
|
|||||||
loss.backward()
|
loss.backward()
|
||||||
optimizer.step()
|
optimizer.step()
|
||||||
|
|
||||||
### In PyTorch-Transformers, optimizer and schedules are splitted and instantiated like this:
|
### In Transformers, optimizer and schedules are splitted and instantiated like this:
|
||||||
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
|
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
|
||||||
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
|
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
|
||||||
### and used like this:
|
### and used like this:
|
||||||
|
|||||||
@@ -11,19 +11,19 @@ Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will di
|
|||||||
``AutoConfig``
|
``AutoConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.AutoConfig
|
.. autoclass:: transformers.AutoConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoModel``
|
``AutoModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.AutoModel
|
.. autoclass:: transformers.AutoModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoTokenizer``
|
``AutoTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.AutoTokenizer
|
.. autoclass:: transformers.AutoTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,69 +4,69 @@ BERT
|
|||||||
``BertConfig``
|
``BertConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertConfig
|
.. autoclass:: transformers.BertConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertTokenizer``
|
``BertTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertTokenizer
|
.. autoclass:: transformers.BertTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertModel``
|
``BertModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertModel
|
.. autoclass:: transformers.BertModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForPreTraining``
|
``BertForPreTraining``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForPreTraining
|
.. autoclass:: transformers.BertForPreTraining
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForMaskedLM``
|
``BertForMaskedLM``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForMaskedLM
|
.. autoclass:: transformers.BertForMaskedLM
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForNextSentencePrediction``
|
``BertForNextSentencePrediction``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForNextSentencePrediction
|
.. autoclass:: transformers.BertForNextSentencePrediction
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForSequenceClassification``
|
``BertForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForSequenceClassification
|
.. autoclass:: transformers.BertForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForMultipleChoice``
|
``BertForMultipleChoice``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForMultipleChoice
|
.. autoclass:: transformers.BertForMultipleChoice
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForTokenClassification``
|
``BertForTokenClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForTokenClassification
|
.. autoclass:: transformers.BertForTokenClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``BertForQuestionAnswering``
|
``BertForQuestionAnswering``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.BertForQuestionAnswering
|
.. autoclass:: transformers.BertForQuestionAnswering
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|||||||
@@ -4,40 +4,40 @@ DistilBERT
|
|||||||
``DistilBertConfig``
|
``DistilBertConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertConfig
|
.. autoclass:: transformers.DistilBertConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``DistilBertTokenizer``
|
``DistilBertTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertTokenizer
|
.. autoclass:: transformers.DistilBertTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``DistilBertModel``
|
``DistilBertModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertModel
|
.. autoclass:: transformers.DistilBertModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``DistilBertForMaskedLM``
|
``DistilBertForMaskedLM``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertForMaskedLM
|
.. autoclass:: transformers.DistilBertForMaskedLM
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``DistilBertForSequenceClassification``
|
``DistilBertForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertForSequenceClassification
|
.. autoclass:: transformers.DistilBertForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``DistilBertForQuestionAnswering``
|
``DistilBertForQuestionAnswering``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.DistilBertForQuestionAnswering
|
.. autoclass:: transformers.DistilBertForQuestionAnswering
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,33 +4,33 @@ OpenAI GPT
|
|||||||
``OpenAIGPTConfig``
|
``OpenAIGPTConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.OpenAIGPTConfig
|
.. autoclass:: transformers.OpenAIGPTConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``OpenAIGPTTokenizer``
|
``OpenAIGPTTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.OpenAIGPTTokenizer
|
.. autoclass:: transformers.OpenAIGPTTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``OpenAIGPTModel``
|
``OpenAIGPTModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.OpenAIGPTModel
|
.. autoclass:: transformers.OpenAIGPTModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``OpenAIGPTLMHeadModel``
|
``OpenAIGPTLMHeadModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.OpenAIGPTLMHeadModel
|
.. autoclass:: transformers.OpenAIGPTLMHeadModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``OpenAIGPTDoubleHeadsModel``
|
``OpenAIGPTDoubleHeadsModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.OpenAIGPTDoubleHeadsModel
|
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,33 +4,33 @@ OpenAI GPT2
|
|||||||
``GPT2Config``
|
``GPT2Config``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.GPT2Config
|
.. autoclass:: transformers.GPT2Config
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``GPT2Tokenizer``
|
``GPT2Tokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.GPT2Tokenizer
|
.. autoclass:: transformers.GPT2Tokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``GPT2Model``
|
``GPT2Model``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.GPT2Model
|
.. autoclass:: transformers.GPT2Model
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``GPT2LMHeadModel``
|
``GPT2LMHeadModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.GPT2LMHeadModel
|
.. autoclass:: transformers.GPT2LMHeadModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``GPT2DoubleHeadsModel``
|
``GPT2DoubleHeadsModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.GPT2DoubleHeadsModel
|
.. autoclass:: transformers.GPT2DoubleHeadsModel
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,33 +4,33 @@ RoBERTa
|
|||||||
``RobertaConfig``
|
``RobertaConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.RobertaConfig
|
.. autoclass:: transformers.RobertaConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``RobertaTokenizer``
|
``RobertaTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.RobertaTokenizer
|
.. autoclass:: transformers.RobertaTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``RobertaModel``
|
``RobertaModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.RobertaModel
|
.. autoclass:: transformers.RobertaModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``RobertaForMaskedLM``
|
``RobertaForMaskedLM``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.RobertaForMaskedLM
|
.. autoclass:: transformers.RobertaForMaskedLM
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``RobertaForSequenceClassification``
|
``RobertaForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.RobertaForSequenceClassification
|
.. autoclass:: transformers.RobertaForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -5,26 +5,26 @@ Transformer XL
|
|||||||
``TransfoXLConfig``
|
``TransfoXLConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.TransfoXLConfig
|
.. autoclass:: transformers.TransfoXLConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``TransfoXLTokenizer``
|
``TransfoXLTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.TransfoXLTokenizer
|
.. autoclass:: transformers.TransfoXLTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``TransfoXLModel``
|
``TransfoXLModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.TransfoXLModel
|
.. autoclass:: transformers.TransfoXLModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``TransfoXLLMHeadModel``
|
``TransfoXLLMHeadModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.TransfoXLLMHeadModel
|
.. autoclass:: transformers.TransfoXLLMHeadModel
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,38 +4,38 @@ XLM
|
|||||||
``XLMConfig``
|
``XLMConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMConfig
|
.. autoclass:: transformers.XLMConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
``XLMTokenizer``
|
``XLMTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMTokenizer
|
.. autoclass:: transformers.XLMTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
``XLMModel``
|
``XLMModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMModel
|
.. autoclass:: transformers.XLMModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLMWithLMHeadModel``
|
``XLMWithLMHeadModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMWithLMHeadModel
|
.. autoclass:: transformers.XLMWithLMHeadModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLMForSequenceClassification``
|
``XLMForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMForSequenceClassification
|
.. autoclass:: transformers.XLMForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLMForQuestionAnswering``
|
``XLMForQuestionAnswering``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLMForQuestionAnswering
|
.. autoclass:: transformers.XLMForQuestionAnswering
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,40 +4,40 @@ XLNet
|
|||||||
``XLNetConfig``
|
``XLNetConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetConfig
|
.. autoclass:: transformers.XLNetConfig
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLNetTokenizer``
|
``XLNetTokenizer``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetTokenizer
|
.. autoclass:: transformers.XLNetTokenizer
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLNetModel``
|
``XLNetModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetModel
|
.. autoclass:: transformers.XLNetModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLNetLMHeadModel``
|
``XLNetLMHeadModel``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetLMHeadModel
|
.. autoclass:: transformers.XLNetLMHeadModel
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLNetForSequenceClassification``
|
``XLNetForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetForSequenceClassification
|
.. autoclass:: transformers.XLNetForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``XLNetForQuestionAnswering``
|
``XLNetForQuestionAnswering``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: pytorch_transformers.XLNetForQuestionAnswering
|
.. autoclass:: transformers.XLNetForQuestionAnswering
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -1,16 +1,16 @@
|
|||||||
Notebooks
|
Notebooks
|
||||||
================================================
|
================================================
|
||||||
|
|
||||||
We include `three Jupyter Notebooks <https://github.com/huggingface/pytorch-transformers/tree/master/notebooks>`_ that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
|
We include `three Jupyter Notebooks <https://github.com/huggingface/transformers/tree/master/notebooks>`_ that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
|
||||||
|
|
||||||
|
|
||||||
*
|
*
|
||||||
The first NoteBook (\ `Comparing-TF-and-PT-models.ipynb <https://github.com/huggingface/pytorch-transformers/blob/master/notebooks/Comparing-TF-and-PT-models.ipynb>`_\ ) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
|
The first NoteBook (\ `Comparing-TF-and-PT-models.ipynb <https://github.com/huggingface/transformers/blob/master/notebooks/Comparing-TF-and-PT-models.ipynb>`_\ ) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
|
||||||
|
|
||||||
*
|
*
|
||||||
The second NoteBook (\ `Comparing-TF-and-PT-models-SQuAD.ipynb <https://github.com/huggingface/pytorch-transformers/blob/master/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb>`_\ ) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the ``BertForQuestionAnswering`` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
|
The second NoteBook (\ `Comparing-TF-and-PT-models-SQuAD.ipynb <https://github.com/huggingface/transformers/blob/master/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb>`_\ ) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the ``BertForQuestionAnswering`` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
|
||||||
|
|
||||||
*
|
*
|
||||||
The third NoteBook (\ `Comparing-TF-and-PT-models-MLM-NSP.ipynb <https://github.com/huggingface/pytorch-transformers/blob/master/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb>`_\ ) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model.
|
The third NoteBook (\ `Comparing-TF-and-PT-models-MLM-NSP.ipynb <https://github.com/huggingface/transformers/blob/master/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb>`_\ ) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model.
|
||||||
|
|
||||||
Please follow the instructions given in the notebooks to run and modify them.
|
Please follow the instructions given in the notebooks to run and modify them.
|
||||||
|
|||||||
@@ -44,15 +44,15 @@ Here is the full list of the currently provided pretrained models together with
|
|||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/pytorch-transformers/tree/master/examples>`__). |
|
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||||
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | OpenAI GPT English model |
|
| | | | OpenAI GPT English model |
|
||||||
@@ -120,4 +120,4 @@ Here is the full list of the currently provided pretrained models together with
|
|||||||
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
|
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
|
||||||
.. <https://huggingface.co/pytorch-transformers/examples.html>`__
|
.. <https://huggingface.co/transformers/examples.html>`__
|
||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
## Philosophy
|
## Philosophy
|
||||||
|
|
||||||
PyTorch-Transformers is an opinionated library built for NLP researchers seeking to use/study/extend large-scale transformers models.
|
Transformers is an opinionated library built for NLP researchers seeking to use/study/extend large-scale transformers models.
|
||||||
|
|
||||||
The library was designed with two strong goals in mind:
|
The library was designed with two strong goals in mind:
|
||||||
|
|
||||||
@@ -39,7 +39,7 @@ The library is build around three type of classes for each models:
|
|||||||
|
|
||||||
All these classes can be instantiated from pretrained instances and saved locally using two methods:
|
All these classes can be instantiated from pretrained instances and saved locally using two methods:
|
||||||
|
|
||||||
- `from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either provided by the library itself (currently 27 models are provided as listed [here](https://huggingface.co/pytorch-transformers/pretrained_models.html)) or stored locally (or on a server) by the user,
|
- `from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either provided by the library itself (currently 27 models are provided as listed [here](https://huggingface.co/transformers/pretrained_models.html)) or stored locally (or on a server) by the user,
|
||||||
- `save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using `from_pretrained()`.
|
- `save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using `from_pretrained()`.
|
||||||
|
|
||||||
We'll finish this quickstart tour by going through a few simple quick-start examples to see how we can instantiate and use these classes. The rest of the documentation is organized in two parts:
|
We'll finish this quickstart tour by going through a few simple quick-start examples to see how we can instantiate and use these classes. The rest of the documentation is organized in two parts:
|
||||||
@@ -59,7 +59,7 @@ Let's start by preparing a tokenized input (a list of token embeddings indices t
|
|||||||
|
|
||||||
```python
|
```python
|
||||||
import torch
|
import torch
|
||||||
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM
|
from transformers import BertTokenizer, BertModel, BertForMaskedLM
|
||||||
|
|
||||||
# OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
|
# OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
|
||||||
import logging
|
import logging
|
||||||
@@ -106,7 +106,7 @@ model.to('cuda')
|
|||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
# See the models docstrings for the detail of the inputs
|
# See the models docstrings for the detail of the inputs
|
||||||
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
|
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
|
||||||
# PyTorch-Transformers models always output tuples.
|
# Transformers models always output tuples.
|
||||||
# See the models docstrings for the detail of all the outputs
|
# See the models docstrings for the detail of all the outputs
|
||||||
# In our case, the first element is the hidden state of the last layer of the Bert model
|
# In our case, the first element is the hidden state of the last layer of the Bert model
|
||||||
encoded_layers = outputs[0]
|
encoded_layers = outputs[0]
|
||||||
@@ -145,7 +145,7 @@ First let's prepare a tokenized input from our text string using `GPT2Tokenizer`
|
|||||||
|
|
||||||
```python
|
```python
|
||||||
import torch
|
import torch
|
||||||
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
||||||
|
|
||||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||||
import logging
|
import logging
|
||||||
|
|||||||
@@ -45,7 +45,7 @@ where
|
|||||||
* ``bert_config.json`` or ``openai_gpt_config.json`` a configuration file for the model, and
|
* ``bert_config.json`` or ``openai_gpt_config.json`` a configuration file for the model, and
|
||||||
* ``pytorch_model.bin`` a PyTorch dump of a pre-trained instance of ``BertForPreTraining``\ , ``OpenAIGPTModel``\ , ``TransfoXLModel``\ , ``GPT2LMHeadModel`` (saved with the usual ``torch.save()``\ )
|
* ``pytorch_model.bin`` a PyTorch dump of a pre-trained instance of ``BertForPreTraining``\ , ``OpenAIGPTModel``\ , ``TransfoXLModel``\ , ``GPT2LMHeadModel`` (saved with the usual ``torch.save()``\ )
|
||||||
|
|
||||||
If ``PRE_TRAINED_MODEL_NAME_OR_PATH`` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links `here <https://github.com/huggingface/pytorch-transformers/blob/master/pytorch_transformers/modeling_bert.py>`__\ ) and stored in a cache folder to avoid future download (the cache folder can be found at ``~/.pytorch_pretrained_bert/``\ ).
|
If ``PRE_TRAINED_MODEL_NAME_OR_PATH`` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links `here <https://github.com/huggingface/transformers/blob/master/transformers/modeling_bert.py>`__\ ) and stored in a cache folder to avoid future download (the cache folder can be found at ``~/.pytorch_pretrained_bert/``\ ).
|
||||||
|
|
||||||
*
|
*
|
||||||
``cache_dir`` can be an optional path to a specific directory to download and cache the pre-trained model weights. This option is useful in particular when you are using distributed training: to avoid concurrent access to the same weights you can set for example ``cache_dir='./pretrained_model_{}'.format(args.local_rank)`` (see the section on distributed training for more information).
|
``cache_dir`` can be an optional path to a specific directory to download and cache the pre-trained model weights. This option is useful in particular when you are using distributed training: to avoid concurrent access to the same weights you can set for example ``cache_dir='./pretrained_model_{}'.format(args.local_rank)`` (see the section on distributed training for more information).
|
||||||
@@ -122,7 +122,7 @@ Here is the recommended way of saving the model, configuration and vocabulary to
|
|||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
from pytorch_transformers import WEIGHTS_NAME, CONFIG_NAME
|
from transformers import WEIGHTS_NAME, CONFIG_NAME
|
||||||
|
|
||||||
output_dir = "./models/"
|
output_dir = "./models/"
|
||||||
|
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ According to Pytorch's documentation: "TorchScript is a way to create serializab
|
|||||||
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
|
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
|
||||||
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
|
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
|
||||||
|
|
||||||
We have provided an interface that allows the export of `pytorch-transformers` models to TorchScript so that they can
|
We have provided an interface that allows the export of `transformers` models to TorchScript so that they can
|
||||||
be reused in a different environment than a Pytorch-based python program. Here we explain how to use our models so that
|
be reused in a different environment than a Pytorch-based python program. Here we explain how to use our models so that
|
||||||
they can be exported, and what to be mindful of when using these models with TorchScript.
|
they can be exported, and what to be mindful of when using these models with TorchScript.
|
||||||
|
|
||||||
@@ -74,7 +74,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
|
|||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
from pytorch_transformers import BertModel, BertTokenizer, BertConfig
|
from transformers import BertModel, BertTokenizer, BertConfig
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
enc = BertTokenizer.from_pretrained("bert-base-uncased")
|
enc = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ similar API between the different models.
|
|||||||
|
|
||||||
## Language model fine-tuning
|
## Language model fine-tuning
|
||||||
|
|
||||||
Based on the script [`run_lm_finetuning.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_lm_finetuning.py).
|
Based on the script [`run_lm_finetuning.py`](https://github.com/huggingface/transformers/blob/master/examples/run_lm_finetuning.py).
|
||||||
|
|
||||||
Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
|
Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
|
||||||
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
|
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
|
||||||
@@ -75,7 +75,7 @@ python run_lm_finetuning.py \
|
|||||||
|
|
||||||
## Language generation
|
## Language generation
|
||||||
|
|
||||||
Based on the script [`run_generation.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_generation.py).
|
Based on the script [`run_generation.py`](https://github.com/huggingface/transformers/blob/master/examples/run_generation.py).
|
||||||
|
|
||||||
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet.
|
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet.
|
||||||
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
|
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
|
||||||
@@ -91,7 +91,7 @@ python run_generation.py \
|
|||||||
|
|
||||||
## GLUE
|
## GLUE
|
||||||
|
|
||||||
Based on the script [`run_glue.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py).
|
Based on the script [`run_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/run_glue.py).
|
||||||
|
|
||||||
Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding
|
Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding
|
||||||
Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
|
Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
|
||||||
@@ -319,7 +319,7 @@ eval_loss = 0.44457291918821606
|
|||||||
|
|
||||||
## SQuAD
|
## SQuAD
|
||||||
|
|
||||||
Based on the script [`run_squad.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_squad.py).
|
Based on the script [`run_squad.py`](https://github.com/huggingface/transformers/blob/master/examples/run_squad.py).
|
||||||
|
|
||||||
#### Fine-tuning on SQuAD
|
#### Fine-tuning on SQuAD
|
||||||
|
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ import torch
|
|||||||
from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
|
from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
|
||||||
TensorDataset)
|
TensorDataset)
|
||||||
|
|
||||||
from pytorch_transformers import (OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer,
|
from transformers import (OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer,
|
||||||
AdamW, cached_path, WEIGHTS_NAME, CONFIG_NAME,
|
AdamW, cached_path, WEIGHTS_NAME, CONFIG_NAME,
|
||||||
WarmupLinearSchedule)
|
WarmupLinearSchedule)
|
||||||
|
|
||||||
|
|||||||
@@ -35,10 +35,10 @@ from tqdm import tqdm, trange
|
|||||||
|
|
||||||
from tensorboardX import SummaryWriter
|
from tensorboardX import SummaryWriter
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
from transformers import (WEIGHTS_NAME, BertConfig,
|
||||||
BertForMultipleChoice, BertTokenizer)
|
BertForMultipleChoice, BertTokenizer)
|
||||||
|
|
||||||
from pytorch_transformers import AdamW, WarmupLinearSchedule
|
from transformers import AdamW, WarmupLinearSchedule
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -365,7 +365,7 @@ def train(args, train_dataset, model, tokenizer):
|
|||||||
# inputs.update({'cls_index': batch[5],
|
# inputs.update({'cls_index': batch[5],
|
||||||
# 'p_mask': batch[6]})
|
# 'p_mask': batch[6]})
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
|
loss = outputs[0] # model outputs are always tuple in transformers (see doc)
|
||||||
|
|
||||||
if args.n_gpu > 1:
|
if args.n_gpu > 1:
|
||||||
loss = loss.mean() # mean() to average on multi-gpu parallel (not distributed) training
|
loss = loss.mean() # mean() to average on multi-gpu parallel (not distributed) training
|
||||||
@@ -647,7 +647,7 @@ def main():
|
|||||||
|
|
||||||
if args.eval_all_checkpoints:
|
if args.eval_all_checkpoints:
|
||||||
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce model loading logs
|
logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce model loading logs
|
||||||
|
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
|
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ import math
|
|||||||
|
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import TransfoXLLMHeadModel, TransfoXLCorpus, TransfoXLTokenizer
|
from transformers import TransfoXLLMHeadModel, TransfoXLCorpus, TransfoXLTokenizer
|
||||||
|
|
||||||
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
||||||
datefmt = '%m/%d/%Y %H:%M:%S',
|
datefmt = '%m/%d/%Y %H:%M:%S',
|
||||||
|
|||||||
@@ -13,11 +13,11 @@ For more information on DistilBERT, please refer to our [detailed blog post](htt
|
|||||||
|
|
||||||
This part of the library has only be tested with Python3.6+. There are few specific dependencies to install before launching a distillation, you can install them with the command `pip install -r requirements.txt`.
|
This part of the library has only be tested with Python3.6+. There are few specific dependencies to install before launching a distillation, you can install them with the command `pip install -r requirements.txt`.
|
||||||
|
|
||||||
**Important note:** The training scripts have been updated to support PyTorch v1.2.0 (there are breakings changes compared to v1.1.0). It is important to note that there is a small internal bug in the current version of PyTorch available on pip that causes a memory leak in our training/distillation. It has been recently fixed and will likely be integrated into the next release. For the moment, we recommend to [compile PyTorch from source](https://github.com/pytorch/pytorch#from-source). Please refer to [issue 1179](https://github.com/huggingface/pytorch-transformers/issues/1179) for more details.
|
**Important note:** The training scripts have been updated to support PyTorch v1.2.0 (there are breakings changes compared to v1.1.0). It is important to note that there is a small internal bug in the current version of PyTorch available on pip that causes a memory leak in our training/distillation. It has been recently fixed and will likely be integrated into the next release. For the moment, we recommend to [compile PyTorch from source](https://github.com/pytorch/pytorch#from-source). Please refer to [issue 1179](https://github.com/huggingface/transformers/issues/1179) for more details.
|
||||||
|
|
||||||
## How to use DistilBERT
|
## How to use DistilBERT
|
||||||
|
|
||||||
PyTorch-Transformers includes two pre-trained DistilBERT models, currently only provided for English (we are investigating the possibility to train and release a multilingual version of DistilBERT):
|
Transformers includes two pre-trained DistilBERT models, currently only provided for English (we are investigating the possibility to train and release a multilingual version of DistilBERT):
|
||||||
|
|
||||||
- `distilbert-base-uncased`: DistilBERT English language model pretrained on the same data used to pretrain Bert (concatenation of the Toronto Book Corpus and full English Wikipedia) using distillation with the supervision of the `bert-base-uncased` version of Bert. The model has 6 layers, 768 dimension and 12 heads, totalizing 66M parameters.
|
- `distilbert-base-uncased`: DistilBERT English language model pretrained on the same data used to pretrain Bert (concatenation of the Toronto Book Corpus and full English Wikipedia) using distillation with the supervision of the `bert-base-uncased` version of Bert. The model has 6 layers, 768 dimension and 12 heads, totalizing 66M parameters.
|
||||||
- `distilbert-base-uncased-distilled-squad`: A finetuned version of `distilbert-base-uncased` finetuned using (a second step of) knwoledge distillation on SQuAD 1.0. This model reaches a F1 score of 86.2 on the dev set (for comparison, Bert `bert-base-uncased` version reaches a 88.5 F1 score).
|
- `distilbert-base-uncased-distilled-squad`: A finetuned version of `distilbert-base-uncased` finetuned using (a second step of) knwoledge distillation on SQuAD 1.0. This model reaches a F1 score of 86.2 on the dev set (for comparison, Bert `bert-base-uncased` version reaches a 88.5 F1 score).
|
||||||
|
|||||||
@@ -26,7 +26,7 @@ import torch
|
|||||||
import torch.nn as nn
|
import torch.nn as nn
|
||||||
import torch.nn.functional as F
|
import torch.nn.functional as F
|
||||||
|
|
||||||
from pytorch_transformers import AdamW, WarmupLinearSchedule
|
from transformers import AdamW, WarmupLinearSchedule
|
||||||
|
|
||||||
from utils import logger
|
from utils import logger
|
||||||
from dataset import Dataset
|
from dataset import Dataset
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ import pickle
|
|||||||
import random
|
import random
|
||||||
import time
|
import time
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from pytorch_transformers import BertTokenizer
|
from transformers import BertTokenizer
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
"""
|
"""
|
||||||
Preprocessing script before training DistilBERT.
|
Preprocessing script before training DistilBERT.
|
||||||
"""
|
"""
|
||||||
from pytorch_transformers import BertForPreTraining
|
from transformers import BertForPreTraining
|
||||||
import torch
|
import torch
|
||||||
import argparse
|
import argparse
|
||||||
|
|
||||||
|
|||||||
@@ -23,8 +23,8 @@ import shutil
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import BertTokenizer, BertForMaskedLM
|
from transformers import BertTokenizer, BertForMaskedLM
|
||||||
from pytorch_transformers import DistilBertForMaskedLM, DistilBertConfig
|
from transformers import DistilBertForMaskedLM, DistilBertConfig
|
||||||
|
|
||||||
from distiller import Distiller
|
from distiller import Distiller
|
||||||
from utils import git_log, logger, init_gpu_params, set_seed
|
from utils import git_log, logger, init_gpu_params, set_seed
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ from torch.utils.data import DataLoader, SequentialSampler, TensorDataset, Subse
|
|||||||
from torch.utils.data.distributed import DistributedSampler
|
from torch.utils.data.distributed import DistributedSampler
|
||||||
from torch.nn import CrossEntropyLoss, MSELoss
|
from torch.nn import CrossEntropyLoss, MSELoss
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME,
|
from transformers import (WEIGHTS_NAME,
|
||||||
BertConfig, BertForSequenceClassification, BertTokenizer,
|
BertConfig, BertForSequenceClassification, BertTokenizer,
|
||||||
XLMConfig, XLMForSequenceClassification, XLMTokenizer,
|
XLMConfig, XLMForSequenceClassification, XLMTokenizer,
|
||||||
XLNetConfig, XLNetForSequenceClassification, XLNetTokenizer)
|
XLNetConfig, XLNetForSequenceClassification, XLNetTokenizer)
|
||||||
|
|||||||
@@ -26,12 +26,12 @@ import torch
|
|||||||
import torch.nn.functional as F
|
import torch.nn.functional as F
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from pytorch_transformers import GPT2Config, OpenAIGPTConfig, XLNetConfig, TransfoXLConfig
|
from transformers import GPT2Config, OpenAIGPTConfig, XLNetConfig, TransfoXLConfig
|
||||||
|
|
||||||
from pytorch_transformers import GPT2LMHeadModel, GPT2Tokenizer
|
from transformers import GPT2LMHeadModel, GPT2Tokenizer
|
||||||
from pytorch_transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer
|
from transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer
|
||||||
from pytorch_transformers import XLNetLMHeadModel, XLNetTokenizer
|
from transformers import XLNetLMHeadModel, XLNetTokenizer
|
||||||
from pytorch_transformers import TransfoXLLMHeadModel, TransfoXLTokenizer
|
from transformers import TransfoXLLMHeadModel, TransfoXLTokenizer
|
||||||
|
|
||||||
|
|
||||||
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
||||||
|
|||||||
@@ -31,7 +31,7 @@ from torch.utils.data.distributed import DistributedSampler
|
|||||||
from tensorboardX import SummaryWriter
|
from tensorboardX import SummaryWriter
|
||||||
from tqdm import tqdm, trange
|
from tqdm import tqdm, trange
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
from transformers import (WEIGHTS_NAME, BertConfig,
|
||||||
BertForSequenceClassification, BertTokenizer,
|
BertForSequenceClassification, BertTokenizer,
|
||||||
RobertaConfig,
|
RobertaConfig,
|
||||||
RobertaForSequenceClassification,
|
RobertaForSequenceClassification,
|
||||||
@@ -44,12 +44,12 @@ from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
|||||||
DistilBertForSequenceClassification,
|
DistilBertForSequenceClassification,
|
||||||
DistilBertTokenizer)
|
DistilBertTokenizer)
|
||||||
|
|
||||||
from pytorch_transformers import AdamW, WarmupLinearSchedule
|
from transformers import AdamW, WarmupLinearSchedule
|
||||||
|
|
||||||
from pytorch_transformers import glue_compute_metrics as compute_metrics
|
from transformers import glue_compute_metrics as compute_metrics
|
||||||
from pytorch_transformers import glue_output_modes as output_modes
|
from transformers import glue_output_modes as output_modes
|
||||||
from pytorch_transformers import glue_processors as processors
|
from transformers import glue_processors as processors
|
||||||
from pytorch_transformers import glue_convert_examples_to_features as convert_examples_to_features
|
from transformers import glue_convert_examples_to_features as convert_examples_to_features
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -137,7 +137,7 @@ def train(args, train_dataset, model, tokenizer):
|
|||||||
'token_type_ids': batch[2] if args.model_type in ['bert', 'xlnet'] else None, # XLM, DistilBERT and RoBERTa don't use segment_ids
|
'token_type_ids': batch[2] if args.model_type in ['bert', 'xlnet'] else None, # XLM, DistilBERT and RoBERTa don't use segment_ids
|
||||||
'labels': batch[3]}
|
'labels': batch[3]}
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
|
loss = outputs[0] # model outputs are always tuple in transformers (see doc)
|
||||||
|
|
||||||
if args.n_gpu > 1:
|
if args.n_gpu > 1:
|
||||||
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
||||||
@@ -483,7 +483,7 @@ def main():
|
|||||||
checkpoints = [args.output_dir]
|
checkpoints = [args.output_dir]
|
||||||
if args.eval_all_checkpoints:
|
if args.eval_all_checkpoints:
|
||||||
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
for checkpoint in checkpoints:
|
for checkpoint in checkpoints:
|
||||||
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
||||||
|
|||||||
@@ -35,7 +35,7 @@ from torch.utils.data.distributed import DistributedSampler
|
|||||||
from tensorboardX import SummaryWriter
|
from tensorboardX import SummaryWriter
|
||||||
from tqdm import tqdm, trange
|
from tqdm import tqdm, trange
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME, AdamW, WarmupLinearSchedule,
|
from transformers import (WEIGHTS_NAME, AdamW, WarmupLinearSchedule,
|
||||||
BertConfig, BertForMaskedLM, BertTokenizer,
|
BertConfig, BertForMaskedLM, BertTokenizer,
|
||||||
GPT2Config, GPT2LMHeadModel, GPT2Tokenizer,
|
GPT2Config, GPT2LMHeadModel, GPT2Tokenizer,
|
||||||
OpenAIGPTConfig, OpenAIGPTLMHeadModel, OpenAIGPTTokenizer,
|
OpenAIGPTConfig, OpenAIGPTLMHeadModel, OpenAIGPTTokenizer,
|
||||||
@@ -188,7 +188,7 @@ def train(args, train_dataset, model, tokenizer):
|
|||||||
labels = labels.to(args.device)
|
labels = labels.to(args.device)
|
||||||
model.train()
|
model.train()
|
||||||
outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
|
outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
|
||||||
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
|
loss = outputs[0] # model outputs are always tuple in transformers (see doc)
|
||||||
|
|
||||||
if args.n_gpu > 1:
|
if args.n_gpu > 1:
|
||||||
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
||||||
@@ -481,7 +481,7 @@ def main():
|
|||||||
checkpoints = [args.output_dir]
|
checkpoints = [args.output_dir]
|
||||||
if args.eval_all_checkpoints:
|
if args.eval_all_checkpoints:
|
||||||
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
for checkpoint in checkpoints:
|
for checkpoint in checkpoints:
|
||||||
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
||||||
|
|||||||
@@ -32,13 +32,13 @@ from torch.utils.data.distributed import DistributedSampler
|
|||||||
from tensorboardX import SummaryWriter
|
from tensorboardX import SummaryWriter
|
||||||
from tqdm import tqdm, trange
|
from tqdm import tqdm, trange
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
from transformers import (WEIGHTS_NAME, BertConfig,
|
||||||
BertForMultipleChoice, BertTokenizer,
|
BertForMultipleChoice, BertTokenizer,
|
||||||
XLNetConfig, XLNetForMultipleChoice,
|
XLNetConfig, XLNetForMultipleChoice,
|
||||||
XLNetTokenizer, RobertaConfig,
|
XLNetTokenizer, RobertaConfig,
|
||||||
RobertaForMultipleChoice, RobertaTokenizer)
|
RobertaForMultipleChoice, RobertaTokenizer)
|
||||||
|
|
||||||
from pytorch_transformers import AdamW, WarmupLinearSchedule
|
from transformers import AdamW, WarmupLinearSchedule
|
||||||
|
|
||||||
from utils_multiple_choice import (convert_examples_to_features, processors)
|
from utils_multiple_choice import (convert_examples_to_features, processors)
|
||||||
|
|
||||||
@@ -141,7 +141,7 @@ def train(args, train_dataset, model, tokenizer):
|
|||||||
'token_type_ids': batch[2] if args.model_type in ['bert', 'xlnet'] else None, # XLM don't use segment_ids
|
'token_type_ids': batch[2] if args.model_type in ['bert', 'xlnet'] else None, # XLM don't use segment_ids
|
||||||
'labels': batch[3]}
|
'labels': batch[3]}
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
|
loss = outputs[0] # model outputs are always tuple in transformers (see doc)
|
||||||
|
|
||||||
if args.n_gpu > 1:
|
if args.n_gpu > 1:
|
||||||
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
loss = loss.mean() # mean() to average on multi-gpu parallel training
|
||||||
@@ -508,7 +508,7 @@ def main():
|
|||||||
checkpoints = [args.output_dir]
|
checkpoints = [args.output_dir]
|
||||||
if args.eval_all_checkpoints:
|
if args.eval_all_checkpoints:
|
||||||
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
for checkpoint in checkpoints:
|
for checkpoint in checkpoints:
|
||||||
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
||||||
@@ -524,7 +524,7 @@ def main():
|
|||||||
checkpoints = [args.output_dir]
|
checkpoints = [args.output_dir]
|
||||||
# if args.eval_all_checkpoints: # can not use this to do test!!
|
# if args.eval_all_checkpoints: # can not use this to do test!!
|
||||||
# checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
# checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
# logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
# logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce logging
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
for checkpoint in checkpoints:
|
for checkpoint in checkpoints:
|
||||||
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
global_step = checkpoint.split('-')[-1] if len(checkpoints) > 1 else ""
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ from tqdm import tqdm, trange
|
|||||||
|
|
||||||
from tensorboardX import SummaryWriter
|
from tensorboardX import SummaryWriter
|
||||||
|
|
||||||
from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
from transformers import (WEIGHTS_NAME, BertConfig,
|
||||||
BertForQuestionAnswering, BertTokenizer,
|
BertForQuestionAnswering, BertTokenizer,
|
||||||
XLMConfig, XLMForQuestionAnswering,
|
XLMConfig, XLMForQuestionAnswering,
|
||||||
XLMTokenizer, XLNetConfig,
|
XLMTokenizer, XLNetConfig,
|
||||||
@@ -40,7 +40,7 @@ from pytorch_transformers import (WEIGHTS_NAME, BertConfig,
|
|||||||
XLNetTokenizer,
|
XLNetTokenizer,
|
||||||
DistilBertConfig, DistilBertForQuestionAnswering, DistilBertTokenizer)
|
DistilBertConfig, DistilBertForQuestionAnswering, DistilBertTokenizer)
|
||||||
|
|
||||||
from pytorch_transformers import AdamW, WarmupLinearSchedule
|
from transformers import AdamW, WarmupLinearSchedule
|
||||||
|
|
||||||
from utils_squad import (read_squad_examples, convert_examples_to_features,
|
from utils_squad import (read_squad_examples, convert_examples_to_features,
|
||||||
RawResult, write_predictions,
|
RawResult, write_predictions,
|
||||||
@@ -142,7 +142,7 @@ def train(args, train_dataset, model, tokenizer):
|
|||||||
inputs.update({'cls_index': batch[5],
|
inputs.update({'cls_index': batch[5],
|
||||||
'p_mask': batch[6]})
|
'p_mask': batch[6]})
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
|
loss = outputs[0] # model outputs are always tuple in transformers (see doc)
|
||||||
|
|
||||||
if args.n_gpu > 1:
|
if args.n_gpu > 1:
|
||||||
loss = loss.mean() # mean() to average on multi-gpu parallel (not distributed) training
|
loss = loss.mean() # mean() to average on multi-gpu parallel (not distributed) training
|
||||||
@@ -510,7 +510,7 @@ def main():
|
|||||||
checkpoints = [args.output_dir]
|
checkpoints = [args.output_dir]
|
||||||
if args.eval_all_checkpoints:
|
if args.eval_all_checkpoints:
|
||||||
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
checkpoints = list(os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + '/**/' + WEIGHTS_NAME, recursive=True)))
|
||||||
logging.getLogger("pytorch_transformers.modeling_utils").setLevel(logging.WARN) # Reduce model loading logs
|
logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN) # Reduce model loading logs
|
||||||
|
|
||||||
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
logger.info("Evaluate the following checkpoints: %s", checkpoints)
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
import tensorflow_datasets
|
import tensorflow_datasets
|
||||||
from pytorch_transformers import *
|
from transformers import *
|
||||||
|
|
||||||
# Load dataset, tokenizer, model from pretrained model/vocabulary
|
# Load dataset, tokenizer, model from pretrained model/vocabulary
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import math
|
|||||||
import collections
|
import collections
|
||||||
from io import open
|
from io import open
|
||||||
|
|
||||||
from pytorch_transformers.tokenization_bert import BasicTokenizer, whitespace_tokenize
|
from transformers.tokenization_bert import BasicTokenizer, whitespace_tokenize
|
||||||
|
|
||||||
# Required by XLNet evaluation method to compute optimal threshold (see write_predictions_extended() method)
|
# Required by XLNet evaluation method to compute optimal threshold (see write_predictions_extended() method)
|
||||||
from utils_squad_evaluate import find_all_best_thresh_v2, make_qid_to_has_ans, get_raw_scores
|
from utils_squad_evaluate import find_all_best_thresh_v2, make_qid_to_has_ans, get_raw_scores
|
||||||
|
|||||||
50
hubconf.py
50
hubconf.py
@@ -1,7 +1,7 @@
|
|||||||
from pytorch_transformers import (
|
from transformers import (
|
||||||
AutoTokenizer, AutoConfig, AutoModel, AutoModelWithLMHead, AutoModelForSequenceClassification, AutoModelForQuestionAnswering
|
AutoTokenizer, AutoConfig, AutoModel, AutoModelWithLMHead, AutoModelForSequenceClassification, AutoModelForQuestionAnswering
|
||||||
)
|
)
|
||||||
from pytorch_transformers.file_utils import add_start_docstrings
|
from transformers.file_utils import add_start_docstrings
|
||||||
|
|
||||||
dependencies = ['torch', 'tqdm', 'boto3', 'requests', 'regex', 'sentencepiece', 'sacremoses']
|
dependencies = ['torch', 'tqdm', 'boto3', 'requests', 'regex', 'sentencepiece', 'sacremoses']
|
||||||
|
|
||||||
@@ -11,12 +11,12 @@ def config(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
config = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased') # Download configuration from S3 and cache.
|
config = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased') # Download configuration from S3 and cache.
|
||||||
config = torch.hub.load('huggingface/pytorch-transformers', 'config', './test/bert_saved_model/') # E.g. config (or model) was saved using `save_pretrained('./test/saved_model/')`
|
config = torch.hub.load('huggingface/transformers', 'config', './test/bert_saved_model/') # E.g. config (or model) was saved using `save_pretrained('./test/saved_model/')`
|
||||||
config = torch.hub.load('huggingface/pytorch-transformers', 'config', './test/bert_saved_model/my_configuration.json')
|
config = torch.hub.load('huggingface/transformers', 'config', './test/bert_saved_model/my_configuration.json')
|
||||||
config = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False)
|
config = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False)
|
||||||
assert config.output_attention == True
|
assert config.output_attention == True
|
||||||
config, unused_kwargs = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False, return_unused_kwargs=True)
|
config, unused_kwargs = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False, return_unused_kwargs=True)
|
||||||
assert config.output_attention == True
|
assert config.output_attention == True
|
||||||
assert unused_kwargs == {'foo': False}
|
assert unused_kwargs == {'foo': False}
|
||||||
|
|
||||||
@@ -31,8 +31,8 @@ def tokenizer(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-uncased') # Download vocabulary from S3 and cache.
|
tokenizer = torch.hub.load('huggingface/transformers', 'tokenizer', 'bert-base-uncased') # Download vocabulary from S3 and cache.
|
||||||
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', './test/bert_saved_model/') # E.g. tokenizer was saved using `save_pretrained('./test/saved_model/')`
|
tokenizer = torch.hub.load('huggingface/transformers', 'tokenizer', './test/bert_saved_model/') # E.g. tokenizer was saved using `save_pretrained('./test/saved_model/')`
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -45,13 +45,13 @@ def model(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
model = torch.hub.load('huggingface/transformers', 'model', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'model', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
model = torch.hub.load('huggingface/transformers', 'model', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
model = torch.hub.load('huggingface/transformers', 'model', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
||||||
assert model.config.output_attention == True
|
assert model.config.output_attention == True
|
||||||
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
||||||
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'model', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
model = torch.hub.load('huggingface/transformers', 'model', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -63,13 +63,13 @@ def modelWithLMHead(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelWithLMHead', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
model = torch.hub.load('huggingface/transformers', 'modelWithLMHead', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelWithLMHead', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
model = torch.hub.load('huggingface/transformers', 'modelWithLMHead', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelWithLMHead', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
model = torch.hub.load('huggingface/transformers', 'modelWithLMHead', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
||||||
assert model.config.output_attention == True
|
assert model.config.output_attention == True
|
||||||
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
||||||
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelWithLMHead', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
model = torch.hub.load('huggingface/transformers', 'modelWithLMHead', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
||||||
|
|
||||||
"""
|
"""
|
||||||
return AutoModelWithLMHead.from_pretrained(*args, **kwargs)
|
return AutoModelWithLMHead.from_pretrained(*args, **kwargs)
|
||||||
@@ -81,13 +81,13 @@ def modelForSequenceClassification(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
||||||
assert model.config.output_attention == True
|
assert model.config.output_attention == True
|
||||||
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
||||||
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -100,13 +100,13 @@ def modelForQuestionAnswering(*args, **kwargs):
|
|||||||
# Using torch.hub !
|
# Using torch.hub !
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'bert-base-uncased') # Download model and configuration from S3 and cache.
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'bert-base-uncased', output_attention=True) # Update configuration during loading
|
||||||
assert model.config.output_attention == True
|
assert model.config.output_attention == True
|
||||||
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
|
||||||
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
|
||||||
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
|
||||||
|
|
||||||
"""
|
"""
|
||||||
return AutoModelForQuestionAnswering.from_pretrained(*args, **kwargs)
|
return AutoModelForQuestionAnswering.from_pretrained(*args, **kwargs)
|
||||||
|
|||||||
10
setup.py
10
setup.py
@@ -25,7 +25,7 @@ To create the package for pypi.
|
|||||||
(pypi suggest using twine as other methods upload files via plaintext.)
|
(pypi suggest using twine as other methods upload files via plaintext.)
|
||||||
|
|
||||||
Check that you can install it in a virtualenv by running:
|
Check that you can install it in a virtualenv by running:
|
||||||
pip install -i https://testpypi.python.org/pypi pytorch-transformers
|
pip install -i https://testpypi.python.org/pypi transformers
|
||||||
|
|
||||||
6. Upload the final version to actual pypi:
|
6. Upload the final version to actual pypi:
|
||||||
twine upload dist/* -r pypi
|
twine upload dist/* -r pypi
|
||||||
@@ -37,8 +37,8 @@ from io import open
|
|||||||
from setuptools import find_packages, setup
|
from setuptools import find_packages, setup
|
||||||
|
|
||||||
setup(
|
setup(
|
||||||
name="pytorch_transformers",
|
name="transformers",
|
||||||
version="1.2.0",
|
version="2.0.0",
|
||||||
author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Google AI Language Team Authors, Open AI team Authors",
|
author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Google AI Language Team Authors, Open AI team Authors",
|
||||||
author_email="thomas@huggingface.co",
|
author_email="thomas@huggingface.co",
|
||||||
description="Repository of pre-trained NLP Transformer models: BERT & RoBERTa, GPT & GPT-2, Transformer-XL, XLNet and XLM",
|
description="Repository of pre-trained NLP Transformer models: BERT & RoBERTa, GPT & GPT-2, Transformer-XL, XLNet and XLM",
|
||||||
@@ -46,7 +46,7 @@ setup(
|
|||||||
long_description_content_type="text/markdown",
|
long_description_content_type="text/markdown",
|
||||||
keywords='NLP deep learning transformer pytorch BERT GPT GPT-2 google openai CMU',
|
keywords='NLP deep learning transformer pytorch BERT GPT GPT-2 google openai CMU',
|
||||||
license='Apache',
|
license='Apache',
|
||||||
url="https://github.com/huggingface/pytorch-transformers",
|
url="https://github.com/huggingface/transformers",
|
||||||
packages=find_packages(exclude=["*.tests", "*.tests.*",
|
packages=find_packages(exclude=["*.tests", "*.tests.*",
|
||||||
"tests.*", "tests"]),
|
"tests.*", "tests"]),
|
||||||
install_requires=['numpy',
|
install_requires=['numpy',
|
||||||
@@ -58,7 +58,7 @@ setup(
|
|||||||
'sacremoses'],
|
'sacremoses'],
|
||||||
entry_points={
|
entry_points={
|
||||||
'console_scripts': [
|
'console_scripts': [
|
||||||
"pytorch_transformers=pytorch_transformers.__main__:main",
|
"transformers=transformers.__main__:main",
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
# python_requires='>=3.5.0',
|
# python_requires='>=3.5.0',
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
__version__ = "1.2.0"
|
__version__ = "2.0.0"
|
||||||
|
|
||||||
# Work around to update TensorFlow's absl.logging threshold which alters the
|
# Work around to update TensorFlow's absl.logging threshold which alters the
|
||||||
# default Python logging output behavior when present.
|
# default Python logging output behavior when present.
|
||||||
@@ -17,7 +17,7 @@ import logging
|
|||||||
logger = logging.getLogger(__name__) # pylint: disable=invalid-name
|
logger = logging.getLogger(__name__) # pylint: disable=invalid-name
|
||||||
|
|
||||||
# Files and general utilities
|
# Files and general utilities
|
||||||
from .file_utils import (PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
|
from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
|
||||||
cached_path, add_start_docstrings, add_end_docstrings,
|
cached_path, add_start_docstrings, add_end_docstrings,
|
||||||
WEIGHTS_NAME, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME, CONFIG_NAME,
|
WEIGHTS_NAME, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME, CONFIG_NAME,
|
||||||
is_tf_available, is_torch_available)
|
is_tf_available, is_torch_available)
|
||||||
@@ -5,25 +5,25 @@ def main():
|
|||||||
print(
|
print(
|
||||||
"This command line utility let you convert original (author released) model checkpoint to pytorch.\n"
|
"This command line utility let you convert original (author released) model checkpoint to pytorch.\n"
|
||||||
"It should be used as one of: \n"
|
"It should be used as one of: \n"
|
||||||
">> pytorch_transformers bert TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT, \n"
|
">> transformers bert TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT, \n"
|
||||||
">> pytorch_transformers gpt OPENAI_GPT_CHECKPOINT_FOLDER_PATH PYTORCH_DUMP_OUTPUT [OPENAI_GPT_CONFIG], \n"
|
">> transformers gpt OPENAI_GPT_CHECKPOINT_FOLDER_PATH PYTORCH_DUMP_OUTPUT [OPENAI_GPT_CONFIG], \n"
|
||||||
">> pytorch_transformers transfo_xl TF_CHECKPOINT_OR_DATASET PYTORCH_DUMP_OUTPUT [TF_CONFIG] or \n"
|
">> transformers transfo_xl TF_CHECKPOINT_OR_DATASET PYTORCH_DUMP_OUTPUT [TF_CONFIG] or \n"
|
||||||
">> pytorch_transformers gpt2 TF_CHECKPOINT PYTORCH_DUMP_OUTPUT [GPT2_CONFIG] or \n"
|
">> transformers gpt2 TF_CHECKPOINT PYTORCH_DUMP_OUTPUT [GPT2_CONFIG] or \n"
|
||||||
">> pytorch_transformers xlnet TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT [FINETUNING_TASK_NAME] or \n"
|
">> transformers xlnet TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT [FINETUNING_TASK_NAME] or \n"
|
||||||
">> pytorch_transformers xlm XLM_CHECKPOINT_PATH PYTORCH_DUMP_OUTPUT")
|
">> transformers xlm XLM_CHECKPOINT_PATH PYTORCH_DUMP_OUTPUT")
|
||||||
else:
|
else:
|
||||||
if sys.argv[1] == "bert":
|
if sys.argv[1] == "bert":
|
||||||
try:
|
try:
|
||||||
from .convert_bert_original_tf_checkpoint_to_pytorch import convert_tf_checkpoint_to_pytorch
|
from .convert_bert_original_tf_checkpoint_to_pytorch import convert_tf_checkpoint_to_pytorch
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print("pytorch_transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
print("transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
"In that case, it requires TensorFlow to be installed. Please see "
|
||||||
"https://www.tensorflow.org/install/ for installation instructions.")
|
"https://www.tensorflow.org/install/ for installation instructions.")
|
||||||
raise
|
raise
|
||||||
|
|
||||||
if len(sys.argv) != 5:
|
if len(sys.argv) != 5:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers bert TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT`")
|
print("Should be used as `transformers bert TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT`")
|
||||||
else:
|
else:
|
||||||
PYTORCH_DUMP_OUTPUT = sys.argv.pop()
|
PYTORCH_DUMP_OUTPUT = sys.argv.pop()
|
||||||
TF_CONFIG = sys.argv.pop()
|
TF_CONFIG = sys.argv.pop()
|
||||||
@@ -33,7 +33,7 @@ def main():
|
|||||||
from .convert_openai_original_tf_checkpoint_to_pytorch import convert_openai_checkpoint_to_pytorch
|
from .convert_openai_original_tf_checkpoint_to_pytorch import convert_openai_checkpoint_to_pytorch
|
||||||
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers gpt OPENAI_GPT_CHECKPOINT_FOLDER_PATH PYTORCH_DUMP_OUTPUT [OPENAI_GPT_CONFIG]`")
|
print("Should be used as `transformers gpt OPENAI_GPT_CHECKPOINT_FOLDER_PATH PYTORCH_DUMP_OUTPUT [OPENAI_GPT_CONFIG]`")
|
||||||
else:
|
else:
|
||||||
OPENAI_GPT_CHECKPOINT_FOLDER_PATH = sys.argv[2]
|
OPENAI_GPT_CHECKPOINT_FOLDER_PATH = sys.argv[2]
|
||||||
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
||||||
@@ -48,13 +48,13 @@ def main():
|
|||||||
try:
|
try:
|
||||||
from .convert_transfo_xl_original_tf_checkpoint_to_pytorch import convert_transfo_xl_checkpoint_to_pytorch
|
from .convert_transfo_xl_original_tf_checkpoint_to_pytorch import convert_transfo_xl_checkpoint_to_pytorch
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print("pytorch_transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
print("transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
"In that case, it requires TensorFlow to be installed. Please see "
|
||||||
"https://www.tensorflow.org/install/ for installation instructions.")
|
"https://www.tensorflow.org/install/ for installation instructions.")
|
||||||
raise
|
raise
|
||||||
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers transfo_xl TF_CHECKPOINT/TF_DATASET_FILE PYTORCH_DUMP_OUTPUT [TF_CONFIG]`")
|
print("Should be used as `transformers transfo_xl TF_CHECKPOINT/TF_DATASET_FILE PYTORCH_DUMP_OUTPUT [TF_CONFIG]`")
|
||||||
else:
|
else:
|
||||||
if 'ckpt' in sys.argv[2].lower():
|
if 'ckpt' in sys.argv[2].lower():
|
||||||
TF_CHECKPOINT = sys.argv[2]
|
TF_CHECKPOINT = sys.argv[2]
|
||||||
@@ -72,14 +72,14 @@ def main():
|
|||||||
try:
|
try:
|
||||||
from .convert_gpt2_original_tf_checkpoint_to_pytorch import convert_gpt2_checkpoint_to_pytorch
|
from .convert_gpt2_original_tf_checkpoint_to_pytorch import convert_gpt2_checkpoint_to_pytorch
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print("pytorch_transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
print("transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
"In that case, it requires TensorFlow to be installed. Please see "
|
||||||
"https://www.tensorflow.org/install/ for installation instructions.")
|
"https://www.tensorflow.org/install/ for installation instructions.")
|
||||||
raise
|
raise
|
||||||
|
|
||||||
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
if len(sys.argv) < 4 or len(sys.argv) > 5:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers gpt2 TF_CHECKPOINT PYTORCH_DUMP_OUTPUT [TF_CONFIG]`")
|
print("Should be used as `transformers gpt2 TF_CHECKPOINT PYTORCH_DUMP_OUTPUT [TF_CONFIG]`")
|
||||||
else:
|
else:
|
||||||
TF_CHECKPOINT = sys.argv[2]
|
TF_CHECKPOINT = sys.argv[2]
|
||||||
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
||||||
@@ -92,14 +92,14 @@ def main():
|
|||||||
try:
|
try:
|
||||||
from .convert_xlnet_original_tf_checkpoint_to_pytorch import convert_xlnet_checkpoint_to_pytorch
|
from .convert_xlnet_original_tf_checkpoint_to_pytorch import convert_xlnet_checkpoint_to_pytorch
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print("pytorch_transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
print("transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
"In that case, it requires TensorFlow to be installed. Please see "
|
||||||
"https://www.tensorflow.org/install/ for installation instructions.")
|
"https://www.tensorflow.org/install/ for installation instructions.")
|
||||||
raise
|
raise
|
||||||
|
|
||||||
if len(sys.argv) < 5 or len(sys.argv) > 6:
|
if len(sys.argv) < 5 or len(sys.argv) > 6:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers xlnet TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT [FINETUNING_TASK_NAME]`")
|
print("Should be used as `transformers xlnet TF_CHECKPOINT TF_CONFIG PYTORCH_DUMP_OUTPUT [FINETUNING_TASK_NAME]`")
|
||||||
else:
|
else:
|
||||||
TF_CHECKPOINT = sys.argv[2]
|
TF_CHECKPOINT = sys.argv[2]
|
||||||
TF_CONFIG = sys.argv[3]
|
TF_CONFIG = sys.argv[3]
|
||||||
@@ -118,7 +118,7 @@ def main():
|
|||||||
|
|
||||||
if len(sys.argv) != 4:
|
if len(sys.argv) != 4:
|
||||||
# pylint: disable=line-too-long
|
# pylint: disable=line-too-long
|
||||||
print("Should be used as `pytorch_transformers xlm XLM_CHECKPOINT_PATH PYTORCH_DUMP_OUTPUT`")
|
print("Should be used as `transformers xlm XLM_CHECKPOINT_PATH PYTORCH_DUMP_OUTPUT`")
|
||||||
else:
|
else:
|
||||||
XLM_CHECKPOINT_PATH = sys.argv[2]
|
XLM_CHECKPOINT_PATH = sys.argv[2]
|
||||||
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
PYTORCH_DUMP_OUTPUT = sys.argv[3]
|
||||||
@@ -31,7 +31,7 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
|
|
||||||
class AutoConfig(object):
|
class AutoConfig(object):
|
||||||
r""":class:`~pytorch_transformers.AutoConfig` is a generic configuration class
|
r""":class:`~transformers.AutoConfig` is a generic configuration class
|
||||||
that will be instantiated as one of the configuration classes of the library
|
that will be instantiated as one of the configuration classes of the library
|
||||||
when created with the `AutoConfig.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `AutoConfig.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -76,7 +76,7 @@ class AutoConfig(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing a configuration file saved using the :func:`~pytorch_transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing a configuration file saved using the :func:`~transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
|
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
@@ -45,7 +45,7 @@ BERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
|||||||
|
|
||||||
class BertConfig(PretrainedConfig):
|
class BertConfig(PretrainedConfig):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.BertConfig` is the configuration class to store the configuration of a
|
:class:`~transformers.BertConfig` is the configuration class to store the configuration of a
|
||||||
`BertModel`.
|
`BertModel`.
|
||||||
|
|
||||||
|
|
||||||
@@ -59,7 +59,7 @@ class PretrainedConfig(object):
|
|||||||
|
|
||||||
def save_pretrained(self, save_directory):
|
def save_pretrained(self, save_directory):
|
||||||
""" Save a configuration object to the directory `save_directory`, so that it
|
""" Save a configuration object to the directory `save_directory`, so that it
|
||||||
can be re-loaded using the :func:`~pytorch_transformers.PretrainedConfig.from_pretrained` class method.
|
can be re-loaded using the :func:`~transformers.PretrainedConfig.from_pretrained` class method.
|
||||||
"""
|
"""
|
||||||
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
||||||
|
|
||||||
@@ -71,13 +71,13 @@ class PretrainedConfig(object):
|
|||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
|
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
|
||||||
r""" Instantiate a :class:`~pytorch_transformers.PretrainedConfig` (or a derived class) from a pre-trained model configuration.
|
r""" Instantiate a :class:`~transformers.PretrainedConfig` (or a derived class) from a pre-trained model configuration.
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing a configuration file saved using the :func:`~pytorch_transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing a configuration file saved using the :func:`~transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
|
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
@@ -21,7 +21,7 @@ from __future__ import print_function
|
|||||||
import argparse
|
import argparse
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import BertConfig, BertForPreTraining, load_tf_weights_in_bert
|
from transformers import BertConfig, BertForPreTraining, load_tf_weights_in_bert
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
logging.basicConfig(level=logging.INFO)
|
logging.basicConfig(level=logging.INFO)
|
||||||
@@ -20,7 +20,7 @@ import argparse
|
|||||||
import torch
|
import torch
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertModel
|
from transformers import BertModel
|
||||||
|
|
||||||
|
|
||||||
def convert_pytorch_checkpoint_to_tf(model:BertModel, ckpt_dir:str, model_name:str):
|
def convert_pytorch_checkpoint_to_tf(model:BertModel, ckpt_dir:str, model_name:str):
|
||||||
@@ -21,7 +21,7 @@ from io import open
|
|||||||
|
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
from transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
||||||
GPT2Config,
|
GPT2Config,
|
||||||
GPT2Model,
|
GPT2Model,
|
||||||
load_tf_weights_in_gpt2)
|
load_tf_weights_in_gpt2)
|
||||||
@@ -21,7 +21,7 @@ from io import open
|
|||||||
|
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
from transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
||||||
OpenAIGPTConfig,
|
OpenAIGPTConfig,
|
||||||
OpenAIGPTModel,
|
OpenAIGPTModel,
|
||||||
load_tf_weights_in_openai_gpt)
|
load_tf_weights_in_openai_gpt)
|
||||||
@@ -22,9 +22,9 @@ import os
|
|||||||
import argparse
|
import argparse
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
|
|
||||||
from pytorch_transformers import is_torch_available, cached_path
|
from transformers import is_torch_available, cached_path
|
||||||
|
|
||||||
from pytorch_transformers import (BertConfig, TFBertForPreTraining, TFBertForQuestionAnswering, TFBertForSequenceClassification, load_bert_pt_weights_in_tf2, BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
from transformers import (BertConfig, TFBertForPreTraining, TFBertForQuestionAnswering, TFBertForSequenceClassification, load_bert_pt_weights_in_tf2, BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
GPT2Config, TFGPT2LMHeadModel, load_gpt2_pt_weights_in_tf2, GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
GPT2Config, TFGPT2LMHeadModel, load_gpt2_pt_weights_in_tf2, GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
XLNetConfig, TFXLNetLMHeadModel, load_xlnet_pt_weights_in_tf2, XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
XLNetConfig, TFXLNetLMHeadModel, load_xlnet_pt_weights_in_tf2, XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
XLMConfig, TFXLMWithLMHeadModel, load_xlm_pt_weights_in_tf2, XLM_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
XLMConfig, TFXLMWithLMHeadModel, load_xlm_pt_weights_in_tf2, XLM_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
@@ -36,7 +36,7 @@ from pytorch_transformers import (BertConfig, TFBertForPreTraining, TFBertForQue
|
|||||||
if is_torch_available():
|
if is_torch_available():
|
||||||
import torch
|
import torch
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from pytorch_transformers import (BertForPreTraining, BertForQuestionAnswering, BertForSequenceClassification, BERT_PRETRAINED_MODEL_ARCHIVE_MAP,
|
from transformers import (BertForPreTraining, BertForQuestionAnswering, BertForSequenceClassification, BERT_PRETRAINED_MODEL_ARCHIVE_MAP,
|
||||||
GPT2LMHeadModel, GPT2_PRETRAINED_MODEL_ARCHIVE_MAP,
|
GPT2LMHeadModel, GPT2_PRETRAINED_MODEL_ARCHIVE_MAP,
|
||||||
XLNetLMHeadModel, XLNET_PRETRAINED_MODEL_ARCHIVE_MAP,
|
XLNetLMHeadModel, XLNET_PRETRAINED_MODEL_ARCHIVE_MAP,
|
||||||
XLMWithLMHeadModel, XLM_PRETRAINED_MODEL_ARCHIVE_MAP,
|
XLMWithLMHeadModel, XLM_PRETRAINED_MODEL_ARCHIVE_MAP,
|
||||||
@@ -23,12 +23,12 @@ import torch
|
|||||||
|
|
||||||
from fairseq.models.roberta import RobertaModel as FairseqRobertaModel
|
from fairseq.models.roberta import RobertaModel as FairseqRobertaModel
|
||||||
from fairseq.modules import TransformerSentenceEncoderLayer
|
from fairseq.modules import TransformerSentenceEncoderLayer
|
||||||
from pytorch_transformers import (BertConfig, BertEncoder,
|
from transformers import (BertConfig, BertEncoder,
|
||||||
BertIntermediate, BertLayer,
|
BertIntermediate, BertLayer,
|
||||||
BertModel, BertOutput,
|
BertModel, BertOutput,
|
||||||
BertSelfAttention,
|
BertSelfAttention,
|
||||||
BertSelfOutput)
|
BertSelfOutput)
|
||||||
from pytorch_transformers import (RobertaEmbeddings,
|
from transformers import (RobertaEmbeddings,
|
||||||
RobertaForMaskedLM,
|
RobertaForMaskedLM,
|
||||||
RobertaForSequenceClassification,
|
RobertaForSequenceClassification,
|
||||||
RobertaModel)
|
RobertaModel)
|
||||||
@@ -23,12 +23,12 @@ from io import open
|
|||||||
|
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
import pytorch_transformers.tokenization_transfo_xl as data_utils
|
import transformers.tokenization_transfo_xl as data_utils
|
||||||
|
|
||||||
from pytorch_transformers import CONFIG_NAME, WEIGHTS_NAME
|
from transformers import CONFIG_NAME, WEIGHTS_NAME
|
||||||
from pytorch_transformers import (TransfoXLConfig, TransfoXLLMHeadModel,
|
from transformers import (TransfoXLConfig, TransfoXLLMHeadModel,
|
||||||
load_tf_weights_in_transfo_xl)
|
load_tf_weights_in_transfo_xl)
|
||||||
from pytorch_transformers.tokenization_transfo_xl import (CORPUS_NAME, VOCAB_FILES_NAMES)
|
from transformers.tokenization_transfo_xl import (CORPUS_NAME, VOCAB_FILES_NAMES)
|
||||||
|
|
||||||
if sys.version_info[0] == 2:
|
if sys.version_info[0] == 2:
|
||||||
import cPickle as pickle
|
import cPickle as pickle
|
||||||
@@ -23,8 +23,8 @@ from io import open
|
|||||||
import torch
|
import torch
|
||||||
import numpy
|
import numpy
|
||||||
|
|
||||||
from pytorch_transformers import CONFIG_NAME, WEIGHTS_NAME
|
from transformers import CONFIG_NAME, WEIGHTS_NAME
|
||||||
from pytorch_transformers.tokenization_xlm import VOCAB_FILES_NAMES
|
from transformers.tokenization_xlm import VOCAB_FILES_NAMES
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
logging.basicConfig(level=logging.INFO)
|
logging.basicConfig(level=logging.INFO)
|
||||||
@@ -22,7 +22,7 @@ import os
|
|||||||
import argparse
|
import argparse
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
from pytorch_transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
from transformers import (CONFIG_NAME, WEIGHTS_NAME,
|
||||||
XLNetConfig,
|
XLNetConfig,
|
||||||
XLNetLMHeadModel, XLNetForQuestionAnswering,
|
XLNetLMHeadModel, XLNetForQuestionAnswering,
|
||||||
XLNetForSequenceClassification,
|
XLNetForSequenceClassification,
|
||||||
@@ -48,7 +48,7 @@ except ImportError:
|
|||||||
torch_cache_home = os.path.expanduser(
|
torch_cache_home = os.path.expanduser(
|
||||||
os.getenv('TORCH_HOME', os.path.join(
|
os.getenv('TORCH_HOME', os.path.join(
|
||||||
os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
|
os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
|
||||||
default_cache_path = os.path.join(torch_cache_home, 'pytorch_transformers')
|
default_cache_path = os.path.join(torch_cache_home, 'transformers')
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
@@ -65,6 +65,7 @@ except (AttributeError, ImportError):
|
|||||||
default_cache_path))
|
default_cache_path))
|
||||||
|
|
||||||
PYTORCH_TRANSFORMERS_CACHE = PYTORCH_PRETRAINED_BERT_CACHE # Kept for backward compatibility
|
PYTORCH_TRANSFORMERS_CACHE = PYTORCH_PRETRAINED_BERT_CACHE # Kept for backward compatibility
|
||||||
|
TRANSFORMERS_CACHE = PYTORCH_PRETRAINED_BERT_CACHE # Kept for backward compatibility
|
||||||
|
|
||||||
WEIGHTS_NAME = "pytorch_model.bin"
|
WEIGHTS_NAME = "pytorch_model.bin"
|
||||||
TF2_WEIGHTS_NAME = 'tf_model.h5'
|
TF2_WEIGHTS_NAME = 'tf_model.h5'
|
||||||
@@ -131,7 +132,7 @@ def filename_to_url(filename, cache_dir=None):
|
|||||||
Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
|
Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
|
||||||
"""
|
"""
|
||||||
if cache_dir is None:
|
if cache_dir is None:
|
||||||
cache_dir = PYTORCH_TRANSFORMERS_CACHE
|
cache_dir = TRANSFORMERS_CACHE
|
||||||
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
||||||
cache_dir = str(cache_dir)
|
cache_dir = str(cache_dir)
|
||||||
|
|
||||||
@@ -162,7 +163,7 @@ def cached_path(url_or_filename, cache_dir=None, force_download=False, proxies=N
|
|||||||
force_download: if True, re-dowload the file even if it's already cached in the cache dir.
|
force_download: if True, re-dowload the file even if it's already cached in the cache dir.
|
||||||
"""
|
"""
|
||||||
if cache_dir is None:
|
if cache_dir is None:
|
||||||
cache_dir = PYTORCH_TRANSFORMERS_CACHE
|
cache_dir = TRANSFORMERS_CACHE
|
||||||
if sys.version_info[0] == 3 and isinstance(url_or_filename, Path):
|
if sys.version_info[0] == 3 and isinstance(url_or_filename, Path):
|
||||||
url_or_filename = str(url_or_filename)
|
url_or_filename = str(url_or_filename)
|
||||||
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
||||||
@@ -251,7 +252,7 @@ def get_from_cache(url, cache_dir=None, force_download=False, proxies=None):
|
|||||||
If it's not there, download it. Then return the path to the cached file.
|
If it's not there, download it. Then return the path to the cached file.
|
||||||
"""
|
"""
|
||||||
if cache_dir is None:
|
if cache_dir is None:
|
||||||
cache_dir = PYTORCH_TRANSFORMERS_CACHE
|
cache_dir = TRANSFORMERS_CACHE
|
||||||
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
|
||||||
cache_dir = str(cache_dir)
|
cache_dir = str(cache_dir)
|
||||||
if sys.version_info[0] == 2 and not isinstance(cache_dir, str):
|
if sys.version_info[0] == 2 and not isinstance(cache_dir, str):
|
||||||
@@ -36,7 +36,7 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
class AutoModel(object):
|
class AutoModel(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.AutoModel` is a generic model class
|
:class:`~transformers.AutoModel` is a generic model class
|
||||||
that will be instantiated as one of the base model classes of the library
|
that will be instantiated as one of the base model classes of the library
|
||||||
when created with the `AutoModel.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `AutoModel.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -84,23 +84,23 @@ class AutoModel(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -120,7 +120,7 @@ class AutoModel(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -157,7 +157,7 @@ class AutoModel(object):
|
|||||||
|
|
||||||
class AutoModelWithLMHead(object):
|
class AutoModelWithLMHead(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.AutoModelWithLMHead` is a generic model class
|
:class:`~transformers.AutoModelWithLMHead` is a generic model class
|
||||||
that will be instantiated as one of the language modeling model classes of the library
|
that will be instantiated as one of the language modeling model classes of the library
|
||||||
when created with the `AutoModelWithLMHead.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `AutoModelWithLMHead.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -208,23 +208,23 @@ class AutoModelWithLMHead(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -244,7 +244,7 @@ class AutoModelWithLMHead(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -281,7 +281,7 @@ class AutoModelWithLMHead(object):
|
|||||||
|
|
||||||
class AutoModelForSequenceClassification(object):
|
class AutoModelForSequenceClassification(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.AutoModelForSequenceClassification` is a generic model class
|
:class:`~transformers.AutoModelForSequenceClassification` is a generic model class
|
||||||
that will be instantiated as one of the sequence classification model classes of the library
|
that will be instantiated as one of the sequence classification model classes of the library
|
||||||
when created with the `AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -326,23 +326,23 @@ class AutoModelForSequenceClassification(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -362,7 +362,7 @@ class AutoModelForSequenceClassification(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -392,7 +392,7 @@ class AutoModelForSequenceClassification(object):
|
|||||||
|
|
||||||
class AutoModelForQuestionAnswering(object):
|
class AutoModelForQuestionAnswering(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.AutoModelForQuestionAnswering` is a generic model class
|
:class:`~transformers.AutoModelForQuestionAnswering` is a generic model class
|
||||||
that will be instantiated as one of the question answering model classes of the library
|
that will be instantiated as one of the question answering model classes of the library
|
||||||
when created with the `AutoModelForQuestionAnswering.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `AutoModelForQuestionAnswering.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -435,23 +435,23 @@ class AutoModelForQuestionAnswering(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -471,7 +471,7 @@ class AutoModelForQuestionAnswering(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -486,9 +486,9 @@ BERT_START_DOCSTRING = r""" The BERT model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.BertConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.BertConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
BERT_INPUTS_DOCSTRING = r"""
|
BERT_INPUTS_DOCSTRING = r"""
|
||||||
@@ -512,9 +512,9 @@ BERT_INPUTS_DOCSTRING = r"""
|
|||||||
Bert is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
Bert is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BertTokenizer`.
|
Indices can be obtained using :class:`transformers.BertTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -372,9 +372,9 @@ DISTILBERT_START_DOCSTRING = r"""
|
|||||||
https://medium.com/huggingface/distilbert-8cf3380435b5
|
https://medium.com/huggingface/distilbert-8cf3380435b5
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.DistilBertConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.DistilBertConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
DISTILBERT_INPUTS_DOCSTRING = r"""
|
DISTILBERT_INPUTS_DOCSTRING = r"""
|
||||||
@@ -280,9 +280,9 @@ GPT2_START_DOCSTRING = r""" OpenAI GPT-2 model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.GPT2Config`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.GPT2Config`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
||||||
@@ -290,9 +290,9 @@ GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.GPT2Tokenizer`.
|
Indices can be obtained using :class:`transformers.GPT2Tokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
@@ -493,7 +493,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import torch
|
import torch
|
||||||
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
||||||
|
|
||||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||||
model = GPT2LMHeadModel.from_pretrained('gpt2')
|
model = GPT2LMHeadModel.from_pretrained('gpt2')
|
||||||
@@ -589,7 +589,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import torch
|
import torch
|
||||||
from pytorch_transformers import GPT2Tokenizer, GPT2DoubleHeadsModel
|
from transformers import GPT2Tokenizer, GPT2DoubleHeadsModel
|
||||||
|
|
||||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||||
model = GPT2DoubleHeadsModel.from_pretrained('gpt2')
|
model = GPT2DoubleHeadsModel.from_pretrained('gpt2')
|
||||||
@@ -294,9 +294,9 @@ OPENAI_GPT_START_DOCSTRING = r""" OpenAI GPT model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.OpenAIGPTConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.OpenAIGPTConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
||||||
@@ -304,9 +304,9 @@ OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BPT2Tokenizer`.
|
Indices can be obtained using :class:`transformers.BPT2Tokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -77,9 +77,9 @@ ROBERTA_START_DOCSTRING = r""" The RoBERTa model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.RobertaConfig`): Model configuration class with all the parameters of the
|
config (:class:`~transformers.RobertaConfig`): Model configuration class with all the parameters of the
|
||||||
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
|
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
ROBERTA_INPUTS_DOCSTRING = r"""
|
ROBERTA_INPUTS_DOCSTRING = r"""
|
||||||
@@ -102,8 +102,8 @@ ROBERTA_INPUTS_DOCSTRING = r"""
|
|||||||
RoBERTa is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
RoBERTa is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -361,9 +361,9 @@ class RobertaForMultipleChoice(BertPreTrainedModel):
|
|||||||
|
|
||||||
``token_type_ids: 0 0 0 0 0 0 0``
|
``token_type_ids: 0 0 0 0 0 0 0``
|
||||||
|
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BertTokenizer`.
|
Indices can be obtained using :class:`transformers.BertTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, num_choices, sequence_length)``:
|
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, num_choices, sequence_length)``:
|
||||||
Segment token indices to indicate first and second portions of the inputs.
|
Segment token indices to indicate first and second portions of the inputs.
|
||||||
The second dimension of the input (`num_choices`) indicates the number of choices to score.
|
The second dimension of the input (`num_choices`) indicates the number of choices to score.
|
||||||
@@ -34,7 +34,7 @@ logger = logging.getLogger(__name__)
|
|||||||
|
|
||||||
class TFAutoModel(object):
|
class TFAutoModel(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.TFAutoModel` is a generic model class
|
:class:`~transformers.TFAutoModel` is a generic model class
|
||||||
that will be instantiated as one of the base model classes of the library
|
that will be instantiated as one of the base model classes of the library
|
||||||
when created with the `TFAutoModel.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `TFAutoModel.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -79,7 +79,7 @@ class TFAutoModel(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
||||||
|
|
||||||
from_pt: (`Optional`) Boolean
|
from_pt: (`Optional`) Boolean
|
||||||
@@ -88,17 +88,17 @@ class TFAutoModel(object):
|
|||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -118,7 +118,7 @@ class TFAutoModel(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -155,7 +155,7 @@ class TFAutoModel(object):
|
|||||||
|
|
||||||
class TFAutoModelWithLMHead(object):
|
class TFAutoModelWithLMHead(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.TFAutoModelWithLMHead` is a generic model class
|
:class:`~transformers.TFAutoModelWithLMHead` is a generic model class
|
||||||
that will be instantiated as one of the language modeling model classes of the library
|
that will be instantiated as one of the language modeling model classes of the library
|
||||||
when created with the `TFAutoModelWithLMHead.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `TFAutoModelWithLMHead.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -203,7 +203,7 @@ class TFAutoModelWithLMHead(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
||||||
|
|
||||||
from_pt: (`Optional`) Boolean
|
from_pt: (`Optional`) Boolean
|
||||||
@@ -212,17 +212,17 @@ class TFAutoModelWithLMHead(object):
|
|||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -242,7 +242,7 @@ class TFAutoModelWithLMHead(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -279,7 +279,7 @@ class TFAutoModelWithLMHead(object):
|
|||||||
|
|
||||||
class TFAutoModelForSequenceClassification(object):
|
class TFAutoModelForSequenceClassification(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.TFAutoModelForSequenceClassification` is a generic model class
|
:class:`~transformers.TFAutoModelForSequenceClassification` is a generic model class
|
||||||
that will be instantiated as one of the sequence classification model classes of the library
|
that will be instantiated as one of the sequence classification model classes of the library
|
||||||
when created with the `TFAutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `TFAutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -324,7 +324,7 @@ class TFAutoModelForSequenceClassification(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
||||||
|
|
||||||
from_pt: (`Optional`) Boolean
|
from_pt: (`Optional`) Boolean
|
||||||
@@ -333,17 +333,17 @@ class TFAutoModelForSequenceClassification(object):
|
|||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -363,7 +363,7 @@ class TFAutoModelForSequenceClassification(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -393,7 +393,7 @@ class TFAutoModelForSequenceClassification(object):
|
|||||||
|
|
||||||
class TFAutoModelForQuestionAnswering(object):
|
class TFAutoModelForQuestionAnswering(object):
|
||||||
r"""
|
r"""
|
||||||
:class:`~pytorch_transformers.TFAutoModelForQuestionAnswering` is a generic model class
|
:class:`~transformers.TFAutoModelForQuestionAnswering` is a generic model class
|
||||||
that will be instantiated as one of the question answering model classes of the library
|
that will be instantiated as one of the question answering model classes of the library
|
||||||
when created with the `TFAutoModelForQuestionAnswering.from_pretrained(pretrained_model_name_or_path)`
|
when created with the `TFAutoModelForQuestionAnswering.from_pretrained(pretrained_model_name_or_path)`
|
||||||
class method.
|
class method.
|
||||||
@@ -436,7 +436,7 @@ class TFAutoModelForQuestionAnswering(object):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
- a path or url to a `PyTorch, TF 1.X or TF 2.0 checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In the case of a PyTorch checkpoint, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument.
|
||||||
|
|
||||||
from_pt: (`Optional`) Boolean
|
from_pt: (`Optional`) Boolean
|
||||||
@@ -445,17 +445,17 @@ class TFAutoModelForQuestionAnswering(object):
|
|||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -475,7 +475,7 @@ class TFAutoModelForQuestionAnswering(object):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -581,9 +581,9 @@ BERT_START_DOCSTRING = r""" The BERT model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.BertConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.BertConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
BERT_INPUTS_DOCSTRING = r"""
|
BERT_INPUTS_DOCSTRING = r"""
|
||||||
@@ -607,9 +607,9 @@ BERT_INPUTS_DOCSTRING = r"""
|
|||||||
Bert is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
Bert is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BertTokenizer`.
|
Indices can be obtained using :class:`transformers.BertTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -653,7 +653,7 @@ class TFBertModel(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertModel
|
from transformers import BertTokenizer, TFBertModel
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertModel.from_pretrained('bert-base-uncased')
|
model = TFBertModel.from_pretrained('bert-base-uncased')
|
||||||
@@ -692,7 +692,7 @@ class TFBertForPreTraining(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForPreTraining
|
from transformers import BertTokenizer, TFBertForPreTraining
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForPreTraining.from_pretrained('bert-base-uncased')
|
model = TFBertForPreTraining.from_pretrained('bert-base-uncased')
|
||||||
@@ -738,7 +738,7 @@ class TFBertForMaskedLM(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForMaskedLM
|
from transformers import BertTokenizer, TFBertForMaskedLM
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')
|
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')
|
||||||
@@ -782,7 +782,7 @@ class TFBertForNextSentencePrediction(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForNextSentencePrediction
|
from transformers import BertTokenizer, TFBertForNextSentencePrediction
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForNextSentencePrediction.from_pretrained('bert-base-uncased')
|
model = TFBertForNextSentencePrediction.from_pretrained('bert-base-uncased')
|
||||||
@@ -827,7 +827,7 @@ class TFBertForSequenceClassification(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForSequenceClassification
|
from transformers import BertTokenizer, TFBertForSequenceClassification
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
|
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
|
||||||
@@ -879,7 +879,7 @@ class TFBertForMultipleChoice(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForMultipleChoice
|
from transformers import BertTokenizer, TFBertForMultipleChoice
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForMultipleChoice.from_pretrained('bert-base-uncased')
|
model = TFBertForMultipleChoice.from_pretrained('bert-base-uncased')
|
||||||
@@ -958,7 +958,7 @@ class TFBertForTokenClassification(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForTokenClassification
|
from transformers import BertTokenizer, TFBertForTokenClassification
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
|
model = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
|
||||||
@@ -1011,7 +1011,7 @@ class TFBertForQuestionAnswering(TFBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFBertForQuestionAnswering
|
from transformers import BertTokenizer, TFBertForQuestionAnswering
|
||||||
|
|
||||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||||
model = TFBertForQuestionAnswering.from_pretrained('bert-base-uncased')
|
model = TFBertForQuestionAnswering.from_pretrained('bert-base-uncased')
|
||||||
@@ -500,9 +500,9 @@ DISTILBERT_START_DOCSTRING = r"""
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.DistilBertConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.DistilBertConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
DISTILBERT_INPUTS_DOCSTRING = r"""
|
DISTILBERT_INPUTS_DOCSTRING = r"""
|
||||||
@@ -540,7 +540,7 @@ class TFDistilBertModel(TFDistilBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import DistilBertTokenizer, TFDistilBertModel
|
from transformers import DistilBertTokenizer, TFDistilBertModel
|
||||||
|
|
||||||
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
||||||
model = TFDistilBertModel.from_pretrained('distilbert-base-uncased')
|
model = TFDistilBertModel.from_pretrained('distilbert-base-uncased')
|
||||||
@@ -598,7 +598,7 @@ class TFDistilBertForMaskedLM(TFDistilBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import DistilBertTokenizer, TFDistilBertForMaskedLM
|
from transformers import DistilBertTokenizer, TFDistilBertForMaskedLM
|
||||||
|
|
||||||
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
||||||
model = TFDistilBertForMaskedLM.from_pretrained('distilbert-base-uncased')
|
model = TFDistilBertForMaskedLM.from_pretrained('distilbert-base-uncased')
|
||||||
@@ -653,7 +653,7 @@ class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFDistilBertForSequenceClassification
|
from transformers import BertTokenizer, TFDistilBertForSequenceClassification
|
||||||
|
|
||||||
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
||||||
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
|
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
|
||||||
@@ -710,7 +710,7 @@ class TFDistilBertForQuestionAnswering(TFDistilBertPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import BertTokenizer, TFDistilBertForQuestionAnswering
|
from transformers import BertTokenizer, TFDistilBertForQuestionAnswering
|
||||||
|
|
||||||
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
|
||||||
model = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased')
|
model = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased')
|
||||||
@@ -385,9 +385,9 @@ GPT2_START_DOCSTRING = r""" OpenAI GPT-2 model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.GPT2Config`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.GPT2Config`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
||||||
@@ -395,9 +395,9 @@ GPT2_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BPT2Tokenizer`.
|
Indices can be obtained using :class:`transformers.BPT2Tokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**past**:
|
**past**:
|
||||||
list of ``Numpy array`` or ``tf.Tensor`` (one for each layer):
|
list of ``Numpy array`` or ``tf.Tensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
@@ -441,7 +441,7 @@ class TFGPT2Model(TFGPT2PreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import GPT2Tokenizer, TFGPT2Model
|
from transformers import GPT2Tokenizer, TFGPT2Model
|
||||||
|
|
||||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||||
model = TFGPT2Model.from_pretrained('gpt2')
|
model = TFGPT2Model.from_pretrained('gpt2')
|
||||||
@@ -481,7 +481,7 @@ class TFGPT2LMHeadModel(TFGPT2PreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import GPT2Tokenizer, TFGPT2LMHeadModel
|
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel
|
||||||
|
|
||||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||||
model = TFGPT2LMHeadModel.from_pretrained('gpt2')
|
model = TFGPT2LMHeadModel.from_pretrained('gpt2')
|
||||||
@@ -537,7 +537,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import GPT2Tokenizer, TFGPT2DoubleHeadsModel
|
from transformers import GPT2Tokenizer, TFGPT2DoubleHeadsModel
|
||||||
|
|
||||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||||
model = TFGPT2DoubleHeadsModel.from_pretrained('gpt2')
|
model = TFGPT2DoubleHeadsModel.from_pretrained('gpt2')
|
||||||
@@ -371,9 +371,9 @@ OPENAI_GPT_START_DOCSTRING = r""" OpenAI GPT model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.OpenAIGPTConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.OpenAIGPTConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
||||||
@@ -381,9 +381,9 @@ OPENAI_GPT_INPUTS_DOCSTRING = r""" Inputs:
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.BPT2Tokenizer`.
|
Indices can be obtained using :class:`transformers.BPT2Tokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -419,7 +419,7 @@ class TFOpenAIGPTModel(TFOpenAIGPTPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import OpenAIGPTTokenizer, TFOpenAIGPTModel
|
from transformers import OpenAIGPTTokenizer, TFOpenAIGPTModel
|
||||||
|
|
||||||
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
||||||
model = TFOpenAIGPTModel.from_pretrained('openai-gpt')
|
model = TFOpenAIGPTModel.from_pretrained('openai-gpt')
|
||||||
@@ -455,7 +455,7 @@ class TFOpenAIGPTLMHeadModel(TFOpenAIGPTPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import OpenAIGPTTokenizer, TFOpenAIGPTLMHeadModel
|
from transformers import OpenAIGPTTokenizer, TFOpenAIGPTLMHeadModel
|
||||||
|
|
||||||
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
||||||
model = TFOpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
|
model = TFOpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
|
||||||
@@ -506,7 +506,7 @@ class TFOpenAIGPTDoubleHeadsModel(TFOpenAIGPTPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import OpenAIGPTTokenizer, TFOpenAIGPTDoubleHeadsModel
|
from transformers import OpenAIGPTTokenizer, TFOpenAIGPTDoubleHeadsModel
|
||||||
|
|
||||||
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
||||||
model = TFOpenAIGPTDoubleHeadsModel.from_pretrained('openai-gpt')
|
model = TFOpenAIGPTDoubleHeadsModel.from_pretrained('openai-gpt')
|
||||||
@@ -189,14 +189,14 @@ def load_tf2_checkpoint_in_pytorch_model(pt_model, tf_checkpoint_path, tf_inputs
|
|||||||
"https://pytorch.org/ and https://www.tensorflow.org/install/ for installation instructions.")
|
"https://pytorch.org/ and https://www.tensorflow.org/install/ for installation instructions.")
|
||||||
raise e
|
raise e
|
||||||
|
|
||||||
import pytorch_transformers
|
import transformers
|
||||||
|
|
||||||
tf_path = os.path.abspath(tf_checkpoint_path)
|
tf_path = os.path.abspath(tf_checkpoint_path)
|
||||||
logger.info("Loading TensorFlow weights from {}".format(tf_checkpoint_path))
|
logger.info("Loading TensorFlow weights from {}".format(tf_checkpoint_path))
|
||||||
|
|
||||||
# Instantiate and load the associated TF 2.0 model
|
# Instantiate and load the associated TF 2.0 model
|
||||||
tf_model_class_name = "TF" + pt_model.__class__.__name__ # Add "TF" at the beggining
|
tf_model_class_name = "TF" + pt_model.__class__.__name__ # Add "TF" at the beggining
|
||||||
tf_model_class = getattr(pytorch_transformers, tf_model_class_name)
|
tf_model_class = getattr(transformers, tf_model_class_name)
|
||||||
tf_model = tf_model_class(pt_model.config)
|
tf_model = tf_model_class(pt_model.config)
|
||||||
|
|
||||||
if tf_inputs is None:
|
if tf_inputs is None:
|
||||||
@@ -137,9 +137,9 @@ ROBERTA_START_DOCSTRING = r""" The RoBERTa model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.RobertaConfig`): Model configuration class with all the parameters of the
|
config (:class:`~transformers.RobertaConfig`): Model configuration class with all the parameters of the
|
||||||
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
|
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
ROBERTA_INPUTS_DOCSTRING = r"""
|
ROBERTA_INPUTS_DOCSTRING = r"""
|
||||||
@@ -162,8 +162,8 @@ ROBERTA_INPUTS_DOCSTRING = r"""
|
|||||||
RoBERTa is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
RoBERTa is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -209,7 +209,7 @@ class TFRobertaModel(TFRobertaPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import RobertaTokenizer, TFRobertaModel
|
from transformers import RobertaTokenizer, TFRobertaModel
|
||||||
|
|
||||||
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
|
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
|
||||||
model = TFRobertaModel.from_pretrained('roberta-base')
|
model = TFRobertaModel.from_pretrained('roberta-base')
|
||||||
@@ -286,7 +286,7 @@ class TFRobertaForMaskedLM(TFRobertaPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import RobertaTokenizer, TFRobertaForMaskedLM
|
from transformers import RobertaTokenizer, TFRobertaForMaskedLM
|
||||||
|
|
||||||
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
|
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
|
||||||
model = TFRobertaForMaskedLM.from_pretrained('roberta-base')
|
model = TFRobertaForMaskedLM.from_pretrained('roberta-base')
|
||||||
@@ -354,7 +354,7 @@ class TFRobertaForSequenceClassification(TFRobertaPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import RobertaTokenizer, TFRobertaForSequenceClassification
|
from transformers import RobertaTokenizer, TFRobertaForSequenceClassification
|
||||||
|
|
||||||
tokenizer = RoertaTokenizer.from_pretrained('roberta-base')
|
tokenizer = RoertaTokenizer.from_pretrained('roberta-base')
|
||||||
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')
|
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')
|
||||||
@@ -614,9 +614,9 @@ TRANSFO_XL_START_DOCSTRING = r""" The Transformer-XL model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.TransfoXLConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.TransfoXLConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
||||||
@@ -625,9 +625,9 @@ TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
Transformer-XL is a model with relative position embeddings so you can either pad the inputs on
|
Transformer-XL is a model with relative position embeddings so you can either pad the inputs on
|
||||||
the right or on the left.
|
the right or on the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.TransfoXLTokenizer`.
|
Indices can be obtained using :class:`transformers.TransfoXLTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**mems**: (`optional`)
|
**mems**: (`optional`)
|
||||||
list of ``Numpy array`` or ``tf.Tensor`` (one for each layer):
|
list of ``Numpy array`` or ``tf.Tensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
@@ -660,7 +660,7 @@ class TFTransfoXLModel(TFTransfoXLPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import TransfoXLTokenizer, TFTransfoXLModel
|
from transformers import TransfoXLTokenizer, TFTransfoXLModel
|
||||||
|
|
||||||
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
|
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
|
||||||
model = TFTransfoXLModel.from_pretrained('transfo-xl-wt103')
|
model = TFTransfoXLModel.from_pretrained('transfo-xl-wt103')
|
||||||
@@ -702,7 +702,7 @@ class TFTransfoXLLMHeadModel(TFTransfoXLPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import TransfoXLTokenizer, TFTransfoXLLMHeadModel
|
from transformers import TransfoXLTokenizer, TFTransfoXLLMHeadModel
|
||||||
|
|
||||||
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
|
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
|
||||||
model = TFTransfoXLLMHeadModel.from_pretrained('transfo-xl-wt103')
|
model = TFTransfoXLLMHeadModel.from_pretrained('transfo-xl-wt103')
|
||||||
@@ -32,16 +32,16 @@ logger = logging.getLogger(__name__)
|
|||||||
class TFPreTrainedModel(tf.keras.Model):
|
class TFPreTrainedModel(tf.keras.Model):
|
||||||
r""" Base class for all TF models.
|
r""" Base class for all TF models.
|
||||||
|
|
||||||
:class:`~pytorch_transformers.TFPreTrainedModel` takes care of storing the configuration of the models and handles methods for loading/downloading/saving models
|
:class:`~transformers.TFPreTrainedModel` takes care of storing the configuration of the models and handles methods for loading/downloading/saving models
|
||||||
as well as a few methods commons to all models to (i) resize the input embeddings and (ii) prune heads in the self-attention heads.
|
as well as a few methods commons to all models to (i) resize the input embeddings and (ii) prune heads in the self-attention heads.
|
||||||
|
|
||||||
Class attributes (overridden by derived classes):
|
Class attributes (overridden by derived classes):
|
||||||
- ``config_class``: a class derived from :class:`~pytorch_transformers.PretrainedConfig` to use as configuration class for this model architecture.
|
- ``config_class``: a class derived from :class:`~transformers.PretrainedConfig` to use as configuration class for this model architecture.
|
||||||
- ``pretrained_model_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained weights as values.
|
- ``pretrained_model_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained weights as values.
|
||||||
- ``load_tf_weights``: a python ``method`` for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:
|
- ``load_tf_weights``: a python ``method`` for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:
|
||||||
|
|
||||||
- ``model``: an instance of the relevant subclass of :class:`~pytorch_transformers.PreTrainedModel`,
|
- ``model``: an instance of the relevant subclass of :class:`~transformers.PreTrainedModel`,
|
||||||
- ``config``: an instance of the relevant subclass of :class:`~pytorch_transformers.PretrainedConfig`,
|
- ``config``: an instance of the relevant subclass of :class:`~transformers.PretrainedConfig`,
|
||||||
- ``path``: a path (string) to the TensorFlow checkpoint.
|
- ``path``: a path (string) to the TensorFlow checkpoint.
|
||||||
|
|
||||||
- ``base_model_prefix``: a string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
|
- ``base_model_prefix``: a string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
|
||||||
@@ -123,7 +123,7 @@ class TFPreTrainedModel(tf.keras.Model):
|
|||||||
|
|
||||||
def save_pretrained(self, save_directory):
|
def save_pretrained(self, save_directory):
|
||||||
""" Save a model and its configuration file to a directory, so that it
|
""" Save a model and its configuration file to a directory, so that it
|
||||||
can be re-loaded using the `:func:`~pytorch_transformers.PreTrainedModel.from_pretrained`` class method.
|
can be re-loaded using the `:func:`~transformers.PreTrainedModel.from_pretrained`` class method.
|
||||||
"""
|
"""
|
||||||
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
||||||
|
|
||||||
@@ -151,17 +151,17 @@ class TFPreTrainedModel(tf.keras.Model):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `PyTorch state_dict save file` (e.g. `./pt_model/pytorch_model.bin`). In this case, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the PyTorch checkpoint in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
|
- a path or url to a `PyTorch state_dict save file` (e.g. `./pt_model/pytorch_model.bin`). In this case, ``from_pt`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the PyTorch checkpoint in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
from_pt: (`optional`) boolean, default False:
|
from_pt: (`optional`) boolean, default False:
|
||||||
@@ -182,7 +182,7 @@ class TFPreTrainedModel(tf.keras.Model):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -484,9 +484,9 @@ XLM_START_DOCSTRING = r""" The XLM model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.XLMConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XLMConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
XLM_INPUTS_DOCSTRING = r"""
|
XLM_INPUTS_DOCSTRING = r"""
|
||||||
@@ -497,9 +497,9 @@ XLM_INPUTS_DOCSTRING = r"""
|
|||||||
XLM is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
XLM is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
Indices can be obtained using :class:`pytorch_transformers.XLMTokenizer`.
|
Indices can be obtained using :class:`transformers.XLMTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -550,7 +550,7 @@ class TFXLMModel(TFXLMPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLMTokenizer, TFXLMModel
|
from transformers import XLMTokenizer, TFXLMModel
|
||||||
|
|
||||||
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
||||||
model = TFXLMModel.from_pretrained('xlm-mlm-en-2048')
|
model = TFXLMModel.from_pretrained('xlm-mlm-en-2048')
|
||||||
@@ -623,7 +623,7 @@ class TFXLMWithLMHeadModel(TFXLMPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLMTokenizer, TFXLMWithLMHeadModel
|
from transformers import XLMTokenizer, TFXLMWithLMHeadModel
|
||||||
|
|
||||||
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
||||||
model = TFXLMWithLMHeadModel.from_pretrained('xlm-mlm-en-2048')
|
model = TFXLMWithLMHeadModel.from_pretrained('xlm-mlm-en-2048')
|
||||||
@@ -667,7 +667,7 @@ class TFXLMForSequenceClassification(TFXLMPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLMTokenizer, TFXLMForSequenceClassification
|
from transformers import XLMTokenizer, TFXLMForSequenceClassification
|
||||||
|
|
||||||
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
||||||
model = TFXLMForSequenceClassification.from_pretrained('xlm-mlm-en-2048')
|
model = TFXLMForSequenceClassification.from_pretrained('xlm-mlm-en-2048')
|
||||||
@@ -715,7 +715,7 @@ class TFXLMForQuestionAnsweringSimple(TFXLMPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLMTokenizer, TFXLMForQuestionAnsweringSimple
|
from transformers import XLMTokenizer, TFXLMForQuestionAnsweringSimple
|
||||||
|
|
||||||
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
|
||||||
model = TFXLMForQuestionAnsweringSimple.from_pretrained('xlm-mlm-en-2048')
|
model = TFXLMForQuestionAnsweringSimple.from_pretrained('xlm-mlm-en-2048')
|
||||||
@@ -716,9 +716,9 @@ XLNET_START_DOCSTRING = r""" The XLNet model was proposed in
|
|||||||
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
XLNET_INPUTS_DOCSTRING = r"""
|
XLNET_INPUTS_DOCSTRING = r"""
|
||||||
@@ -727,9 +727,9 @@ XLNET_INPUTS_DOCSTRING = r"""
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
XLNet is a model with relative position embeddings so you can either pad the inputs on
|
XLNet is a model with relative position embeddings so you can either pad the inputs on
|
||||||
the right or on the left.
|
the right or on the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.XLNetTokenizer`.
|
Indices can be obtained using :class:`transformers.XLNetTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``Numpy array`` or ``tf.Tensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -793,7 +793,7 @@ class TFXLNetModel(TFXLNetPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLNetTokenizer, TFXLNetModel
|
from transformers import XLNetTokenizer, TFXLNetModel
|
||||||
|
|
||||||
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
||||||
model = TFXLNetModel.from_pretrained('xlnet-large-cased')
|
model = TFXLNetModel.from_pretrained('xlnet-large-cased')
|
||||||
@@ -835,7 +835,7 @@ class TFXLNetLMHeadModel(TFXLNetPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLNetTokenizer, TFXLNetLMHeadModel
|
from transformers import XLNetTokenizer, TFXLNetLMHeadModel
|
||||||
|
|
||||||
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
||||||
model = TFXLNetLMHeadModel.from_pretrained('xlnet-large-cased')
|
model = TFXLNetLMHeadModel.from_pretrained('xlnet-large-cased')
|
||||||
@@ -890,7 +890,7 @@ class TFXLNetForSequenceClassification(TFXLNetPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLNetTokenizer, TFXLNetForSequenceClassification
|
from transformers import XLNetTokenizer, TFXLNetForSequenceClassification
|
||||||
|
|
||||||
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
|
||||||
model = TFXLNetForSequenceClassification.from_pretrained('xlnet-large-cased')
|
model = TFXLNetForSequenceClassification.from_pretrained('xlnet-large-cased')
|
||||||
@@ -943,7 +943,7 @@ class TFXLNetForQuestionAnsweringSimple(TFXLNetPreTrainedModel):
|
|||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from pytorch_transformers import XLNetTokenizer, TFXLNetForQuestionAnsweringSimple
|
from transformers import XLNetTokenizer, TFXLNetForQuestionAnsweringSimple
|
||||||
|
|
||||||
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
|
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
|
||||||
model = TFXLNetForQuestionAnsweringSimple.from_pretrained('xlnet-base-cased')
|
model = TFXLNetForQuestionAnsweringSimple.from_pretrained('xlnet-base-cased')
|
||||||
@@ -531,9 +531,9 @@ TRANSFO_XL_START_DOCSTRING = r""" The Transformer-XL model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.TransfoXLConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.TransfoXLConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
||||||
@@ -542,9 +542,9 @@ TRANSFO_XL_INPUTS_DOCSTRING = r"""
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
Transformer-XL is a model with relative position embeddings so you can either pad the inputs on
|
Transformer-XL is a model with relative position embeddings so you can either pad the inputs on
|
||||||
the right or on the left.
|
the right or on the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.TransfoXLTokenizer`.
|
Indices can be obtained using :class:`transformers.TransfoXLTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**mems**: (`optional`)
|
**mems**: (`optional`)
|
||||||
list of ``torch.FloatTensor`` (one for each layer):
|
list of ``torch.FloatTensor`` (one for each layer):
|
||||||
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
|
||||||
@@ -52,16 +52,16 @@ except ImportError:
|
|||||||
class PreTrainedModel(nn.Module):
|
class PreTrainedModel(nn.Module):
|
||||||
r""" Base class for all models.
|
r""" Base class for all models.
|
||||||
|
|
||||||
:class:`~pytorch_transformers.PreTrainedModel` takes care of storing the configuration of the models and handles methods for loading/downloading/saving models
|
:class:`~transformers.PreTrainedModel` takes care of storing the configuration of the models and handles methods for loading/downloading/saving models
|
||||||
as well as a few methods commons to all models to (i) resize the input embeddings and (ii) prune heads in the self-attention heads.
|
as well as a few methods commons to all models to (i) resize the input embeddings and (ii) prune heads in the self-attention heads.
|
||||||
|
|
||||||
Class attributes (overridden by derived classes):
|
Class attributes (overridden by derived classes):
|
||||||
- ``config_class``: a class derived from :class:`~pytorch_transformers.PretrainedConfig` to use as configuration class for this model architecture.
|
- ``config_class``: a class derived from :class:`~transformers.PretrainedConfig` to use as configuration class for this model architecture.
|
||||||
- ``pretrained_model_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained weights as values.
|
- ``pretrained_model_archive_map``: a python ``dict`` of with `short-cut-names` (string) as keys and `url` (string) of associated pretrained weights as values.
|
||||||
- ``load_tf_weights``: a python ``method`` for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:
|
- ``load_tf_weights``: a python ``method`` for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:
|
||||||
|
|
||||||
- ``model``: an instance of the relevant subclass of :class:`~pytorch_transformers.PreTrainedModel`,
|
- ``model``: an instance of the relevant subclass of :class:`~transformers.PreTrainedModel`,
|
||||||
- ``config``: an instance of the relevant subclass of :class:`~pytorch_transformers.PretrainedConfig`,
|
- ``config``: an instance of the relevant subclass of :class:`~transformers.PretrainedConfig`,
|
||||||
- ``path``: a path (string) to the TensorFlow checkpoint.
|
- ``path``: a path (string) to the TensorFlow checkpoint.
|
||||||
|
|
||||||
- ``base_model_prefix``: a string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
|
- ``base_model_prefix``: a string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.
|
||||||
@@ -189,7 +189,7 @@ class PreTrainedModel(nn.Module):
|
|||||||
|
|
||||||
def save_pretrained(self, save_directory):
|
def save_pretrained(self, save_directory):
|
||||||
""" Save a model and its configuration file to a directory, so that it
|
""" Save a model and its configuration file to a directory, so that it
|
||||||
can be re-loaded using the `:func:`~pytorch_transformers.PreTrainedModel.from_pretrained`` class method.
|
can be re-loaded using the `:func:`~transformers.PreTrainedModel.from_pretrained`` class method.
|
||||||
"""
|
"""
|
||||||
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
assert os.path.isdir(save_directory), "Saving path should be a directory where the model and configuration can be saved"
|
||||||
|
|
||||||
@@ -220,24 +220,24 @@ class PreTrainedModel(nn.Module):
|
|||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
|
||||||
- None if you are both providing the configuration and state dictionary (resp. with keyword arguments ``config`` and ``state_dict``)
|
- None if you are both providing the configuration and state dictionary (resp. with keyword arguments ``config`` and ``state_dict``)
|
||||||
|
|
||||||
model_args: (`optional`) Sequence of positional arguments:
|
model_args: (`optional`) Sequence of positional arguments:
|
||||||
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
All remaning positional arguments will be passed to the underlying model's ``__init__`` method
|
||||||
|
|
||||||
config: (`optional`) instance of a class derived from :class:`~pytorch_transformers.PretrainedConfig`:
|
config: (`optional`) instance of a class derived from :class:`~transformers.PretrainedConfig`:
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
|
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|
||||||
state_dict: (`optional`) dict:
|
state_dict: (`optional`) dict:
|
||||||
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
an optional state dictionnary for the model to use instead of a state dictionary loaded from saved weights file.
|
||||||
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
This option can be used if you want to create a model from a pretrained configuration but load your own weights.
|
||||||
In this case though, you should check if using :func:`~pytorch_transformers.PreTrainedModel.save_pretrained` and :func:`~pytorch_transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
In this case though, you should check if using :func:`~transformers.PreTrainedModel.save_pretrained` and :func:`~transformers.PreTrainedModel.from_pretrained` is not a simpler option.
|
||||||
|
|
||||||
cache_dir: (`optional`) string:
|
cache_dir: (`optional`) string:
|
||||||
Path to a directory in which a downloaded pre-trained model
|
Path to a directory in which a downloaded pre-trained model
|
||||||
@@ -257,7 +257,7 @@ class PreTrainedModel(nn.Module):
|
|||||||
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attention=True``). Behave differently depending on whether a `config` is provided or automatically loaded:
|
||||||
|
|
||||||
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
- If a configuration is provided with ``config``, ``**kwargs`` will be directly passed to the underlying model's ``__init__`` method (we assume all relevant updates to the configuration have already been done)
|
||||||
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~pytorch_transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
- If a configuration is not provided, ``kwargs`` will be first passed to the configuration class initialization function (:func:`~transformers.PretrainedConfig.from_pretrained`). Each key of ``kwargs`` that corresponds to a configuration attribute will be used to override said attribute with the supplied ``kwargs`` value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model's ``__init__`` function.
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -355,7 +355,7 @@ class PreTrainedModel(nn.Module):
|
|||||||
else:
|
else:
|
||||||
# Load from our TensorFlow 2.0 checkpoints
|
# Load from our TensorFlow 2.0 checkpoints
|
||||||
try:
|
try:
|
||||||
from pytorch_transformers import load_tf2_checkpoint_in_pytorch_model
|
from transformers import load_tf2_checkpoint_in_pytorch_model
|
||||||
model = load_tf2_checkpoint_in_pytorch_model(model, resolved_archive_file, allow_missing_keys=True)
|
model = load_tf2_checkpoint_in_pytorch_model(model, resolved_archive_file, allow_missing_keys=True)
|
||||||
except ImportError as e:
|
except ImportError as e:
|
||||||
logger.error("Loading a TensorFlow model in PyTorch, requires both PyTorch and TensorFlow to be installed. Please see "
|
logger.error("Loading a TensorFlow model in PyTorch, requires both PyTorch and TensorFlow to be installed. Please see "
|
||||||
@@ -554,7 +554,7 @@ class SQuADHead(nn.Module):
|
|||||||
r""" A SQuAD head inspired by XLNet.
|
r""" A SQuAD head inspired by XLNet.
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
||||||
|
|
||||||
Inputs:
|
Inputs:
|
||||||
**hidden_states**: ``torch.FloatTensor`` of shape ``(batch_size, seq_len, hidden_size)``
|
**hidden_states**: ``torch.FloatTensor`` of shape ``(batch_size, seq_len, hidden_size)``
|
||||||
@@ -63,7 +63,7 @@ def gelu(x):
|
|||||||
GELU activation
|
GELU activation
|
||||||
https://arxiv.org/abs/1606.08415
|
https://arxiv.org/abs/1606.08415
|
||||||
https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/model_pytorch.py#L14
|
https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/model_pytorch.py#L14
|
||||||
https://github.com/huggingface/pytorch-transformers/blob/master/modeling.py
|
https://github.com/huggingface/transformers/blob/master/modeling.py
|
||||||
"""
|
"""
|
||||||
# return 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
|
# return 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
|
||||||
return 0.5 * x * (1.0 + torch.erf(x / math.sqrt(2.0)))
|
return 0.5 * x * (1.0 + torch.erf(x / math.sqrt(2.0)))
|
||||||
@@ -265,9 +265,9 @@ XLM_START_DOCSTRING = r""" The XLM model was proposed in
|
|||||||
https://github.com/facebookresearch/XLM
|
https://github.com/facebookresearch/XLM
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.XLMConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XLMConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
XLM_INPUTS_DOCSTRING = r"""
|
XLM_INPUTS_DOCSTRING = r"""
|
||||||
@@ -278,9 +278,9 @@ XLM_INPUTS_DOCSTRING = r"""
|
|||||||
XLM is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
XLM is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||||
the right rather than the left.
|
the right rather than the left.
|
||||||
|
|
||||||
Indices can be obtained using :class:`pytorch_transformers.XLMTokenizer`.
|
Indices can be obtained using :class:`transformers.XLMTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -488,9 +488,9 @@ XLNET_START_DOCSTRING = r""" The XLNet model was proposed in
|
|||||||
https://pytorch.org/docs/stable/nn.html#module
|
https://pytorch.org/docs/stable/nn.html#module
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~pytorch_transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.XLNetConfig`): Model configuration class with all the parameters of the model.
|
||||||
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
Initializing with a config file does not load the weights associated with the model, only the configuration.
|
||||||
Check out the :meth:`~pytorch_transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
XLNET_INPUTS_DOCSTRING = r"""
|
XLNET_INPUTS_DOCSTRING = r"""
|
||||||
@@ -499,9 +499,9 @@ XLNET_INPUTS_DOCSTRING = r"""
|
|||||||
Indices of input sequence tokens in the vocabulary.
|
Indices of input sequence tokens in the vocabulary.
|
||||||
XLNet is a model with relative position embeddings so you can either pad the inputs on
|
XLNet is a model with relative position embeddings so you can either pad the inputs on
|
||||||
the right or on the left.
|
the right or on the left.
|
||||||
Indices can be obtained using :class:`pytorch_transformers.XLNetTokenizer`.
|
Indices can be obtained using :class:`transformers.XLNetTokenizer`.
|
||||||
See :func:`pytorch_transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`pytorch_transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
|
**token_type_ids**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
|
||||||
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
A parallel sequence of tokens (can be used to indicate various portions of the inputs).
|
||||||
The type indices in XLNet are NOT selected in the vocabulary, they can be arbitrary numbers and
|
The type indices in XLNet are NOT selected in the vocabulary, they can be arbitrary numbers and
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user