Merge branch 'master' into pr/1383
This commit is contained in:
@@ -50,7 +50,7 @@ make html
|
||||
---
|
||||
**NOTE**
|
||||
|
||||
If you are adding/removing elements from the toc-tree or from any strutural item, it is recommended to clean the build
|
||||
If you are adding/removing elements from the toc-tree or from any structural item, it is recommended to clean the build
|
||||
directory before rebuilding. Run the following command to clean and build:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -1,5 +1,3 @@
|
||||
huggingface.css
|
||||
|
||||
/* The literal code blocks */
|
||||
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
|
||||
color: #6670FF;
|
||||
@@ -44,11 +42,11 @@ huggingface.css
|
||||
/* The text items on the toc tree */
|
||||
.wy-menu-vertical a {
|
||||
color: #FFFFDD;
|
||||
font-family: Calibre-Light;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
}
|
||||
.wy-menu-vertical header, .wy-menu-vertical p.caption{
|
||||
color: white;
|
||||
font-family: Calibre-Light;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
}
|
||||
|
||||
/* The color inside the selected toc tree block */
|
||||
@@ -85,7 +83,7 @@ a {
|
||||
border-right: solid 2px #FB8D68;
|
||||
border-left: solid 2px #FB8D68;
|
||||
color: #FB8D68;
|
||||
font-family: Calibre-Light;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
border-top: none;
|
||||
font-style: normal !important;
|
||||
}
|
||||
@@ -136,14 +134,14 @@ a {
|
||||
|
||||
/* class and method names in doc */
|
||||
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) code.descclassname{
|
||||
font-family: Calibre;
|
||||
font-family: Calibre, sans-serif;
|
||||
font-size: 20px !important;
|
||||
}
|
||||
|
||||
/* class name in doc*/
|
||||
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname{
|
||||
margin-right: 10px;
|
||||
font-family: Calibre-Medium;
|
||||
font-family: Calibre-Medium, sans-serif;
|
||||
}
|
||||
|
||||
/* Method and class parameters */
|
||||
@@ -160,17 +158,17 @@ a {
|
||||
|
||||
/* FONTS */
|
||||
body{
|
||||
font-family: Calibre;
|
||||
font-family: Calibre, sans-serif;
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
h1 {
|
||||
font-family: Calibre-Thin;
|
||||
font-family: Calibre-Thin, sans-serif;
|
||||
font-size: 70px;
|
||||
}
|
||||
|
||||
h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
|
||||
font-family: Calibre-Medium;
|
||||
font-family: Calibre-Medium, sans-serif;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
@@ -196,4 +194,3 @@ h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
|
||||
src: url(./Calibre-Thin.otf);
|
||||
font-weight:400;
|
||||
}
|
||||
|
||||
|
||||
@@ -46,8 +46,7 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
|
||||
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
|
||||
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
|
||||
|
||||
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
@@ -63,6 +62,7 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
|
||||
migration
|
||||
bertology
|
||||
torchscript
|
||||
multilingual
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
58
docs/source/installation.md
Normal file
58
docs/source/installation.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Installation
|
||||
|
||||
Transformers is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.1.0
|
||||
|
||||
## With pip
|
||||
|
||||
PyTorch Transformers can be installed using pip as follows:
|
||||
|
||||
``` bash
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
## From source
|
||||
|
||||
To install from source, clone the repository and install with:
|
||||
|
||||
``` bash
|
||||
git clone https://github.com/huggingface/transformers.git
|
||||
cd transformers
|
||||
pip install [--editable] .
|
||||
```
|
||||
|
||||
## Tests
|
||||
|
||||
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
|
||||
|
||||
Tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
||||
|
||||
Run all the tests from the root of the cloned repository with the commands:
|
||||
|
||||
``` bash
|
||||
python -m pytest -sv ./transformers/tests/
|
||||
python -m pytest -sv ./examples/
|
||||
```
|
||||
|
||||
## OpenAI GPT original tokenization workflow
|
||||
|
||||
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install `ftfy` (use version 4.4.3 if you are using Python 2) and `SpaCy`:
|
||||
|
||||
``` bash
|
||||
pip install spacy ftfy==4.4.3
|
||||
python -m spacy download en
|
||||
```
|
||||
|
||||
If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
## Note on model downloads (Continuous Integration or large-scale deployments)
|
||||
|
||||
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
|
||||
|
||||
## Do you want to run a Transformer model on a mobile device?
|
||||
|
||||
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
|
||||
|
||||
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
|
||||
|
||||
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
|
||||
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!
|
||||
@@ -1,71 +0,0 @@
|
||||
Installation
|
||||
================================================
|
||||
|
||||
Transformers is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.1.0
|
||||
|
||||
With pip
|
||||
^^^^^^^^
|
||||
|
||||
PyTorch Transformers can be installed using pip as follows:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install transformers
|
||||
|
||||
From source
|
||||
^^^^^^^^^^^
|
||||
|
||||
To install from source, clone the repository and install with:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/huggingface/transformers.git
|
||||
cd transformers
|
||||
pip install [--editable] .
|
||||
|
||||
|
||||
Tests
|
||||
^^^^^
|
||||
|
||||
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the `tests folder <https://github.com/huggingface/transformers/tree/master/transformers/tests>`_ and examples tests in the `examples folder <https://github.com/huggingface/transformers/tree/master/examples>`_.
|
||||
|
||||
Tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
||||
|
||||
Run all the tests from the root of the cloned repository with the commands:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m pytest -sv ./transformers/tests/
|
||||
python -m pytest -sv ./examples/
|
||||
|
||||
|
||||
OpenAI GPT original tokenization workflow
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If you want to reproduce the original tokenization process of the ``OpenAI GPT`` paper, you will need to install ``ftfy`` (use version 4.4.3 if you are using Python 2) and ``SpaCy`` :
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install spacy ftfy==4.4.3
|
||||
python -m spacy download en
|
||||
|
||||
If you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
|
||||
Note on model downloads (Continuous Integration or large-scale deployments)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
|
||||
|
||||
|
||||
Do you want to run a Transformer model on a mobile device?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
You should check out our `swift-coreml-transformers <https://github.com/huggingface/swift-coreml-transformers>`_ repo.
|
||||
|
||||
It contains an example of a conversion script from a Pytorch trained Transformer model (here, ``GPT-2``) to a CoreML model that runs on iOS devices.
|
||||
|
||||
It also contains an implementation of BERT for Question answering.
|
||||
|
||||
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
|
||||
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!
|
||||
103
docs/source/multilingual.rst
Normal file
103
docs/source/multilingual.rst
Normal file
@@ -0,0 +1,103 @@
|
||||
Multi-lingual models
|
||||
================================================
|
||||
|
||||
Most of the models available in this library are mono-lingual models (English, Chinese and German). A few
|
||||
multi-lingual models are available and have a different mechanisms than mono-lingual models.
|
||||
This page details the usage of these models.
|
||||
|
||||
The two models that currently support multiple languages are BERT and XLM.
|
||||
|
||||
XLM
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
XLM has a total of 10 different checkpoints, only one of which is mono-lingual. The 9 remaining model checkpoints can
|
||||
be split in two categories: the checkpoints that make use of language embeddings, and those that don't
|
||||
|
||||
XLM & Language Embeddings
|
||||
------------------------------------------------
|
||||
|
||||
This section concerns the following checkpoints:
|
||||
|
||||
- ``xlm-mlm-ende-1024`` (Masked language modeling, English-German)
|
||||
- ``xlm-mlm-enfr-1024`` (Masked language modeling, English-French)
|
||||
- ``xlm-mlm-enro-1024`` (Masked language modeling, English-Romanian)
|
||||
- ``xlm-mlm-xnli15-1024`` (Masked language modeling, XNLI languages)
|
||||
- ``xlm-mlm-tlm-xnli15-1024`` (Masked language modeling + Translation, XNLI languages)
|
||||
- ``xlm-clm-enfr-1024`` (Causal language modeling, English-French)
|
||||
- ``xlm-clm-ende-1024`` (Causal language modeling, English-German)
|
||||
|
||||
These checkpoints require language embeddings that will specify the language used at inference time. These language
|
||||
embeddings are represented as a tensor that is of the same shape as the input ids passed to the model. The values in
|
||||
these tensors depend on the language used and are identifiable using the ``lang2id`` and ``id2lang`` attributes
|
||||
from the tokenizer.
|
||||
|
||||
Here is an example using the ``xlm-clm-enfr-1024`` checkpoint (Causal language modeling, English-French):
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
import torch
|
||||
from transformers import XLMTokenizer, XLMWithLMHeadModel
|
||||
|
||||
tokenizer = XLMTokenizer.from_pretrained("xlm-clm-1024-enfr")
|
||||
|
||||
|
||||
The different languages this model/tokenizer handles, as well as the ids of these languages are visible using the
|
||||
``lang2id`` attribute:
|
||||
|
||||
.. code-block::
|
||||
|
||||
print(tokenizer.lang2id) # {'en': 0, 'fr': 1}
|
||||
|
||||
|
||||
These ids should be used when passing a language parameter during a model pass. Let's define our inputs:
|
||||
|
||||
.. code-block::
|
||||
|
||||
input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1
|
||||
|
||||
|
||||
We should now define the language embedding by using the previously defined language id. We want to create a tensor
|
||||
filled with the appropriate language ids, of the same size as input_ids. For english, the id is 0:
|
||||
|
||||
.. code-block::
|
||||
|
||||
language_id = tokenizer.lang2id['en'] # 0
|
||||
langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
|
||||
|
||||
# We reshape it to be of size (batch_size, sequence_length)
|
||||
langs = langs.view(1, -1) # is now of shape [1, sequence_length] (we have a batch size of 1)
|
||||
|
||||
|
||||
You can then feed it all as input to your model:
|
||||
|
||||
.. code-block::
|
||||
|
||||
outputs = model(input_ids, langs=langs)
|
||||
|
||||
|
||||
The example `run_generation.py <https://github.com/huggingface/transformers/blob/master/examples/run_generation.py>`__
|
||||
can generate text using the CLM checkpoints from XLM, using the language embeddings.
|
||||
|
||||
XLM without Language Embeddings
|
||||
------------------------------------------------
|
||||
|
||||
This section concerns the following checkpoints:
|
||||
|
||||
- ``xlm-mlm-17-1280`` (Masked language modeling, 17 languages)
|
||||
- ``xlm-mlm-100-1280`` (Masked language modeling, 100 languages)
|
||||
|
||||
These checkpoints do not require language embeddings at inference time. These models are used to have generic
|
||||
sentence representations, differently from previously-mentioned XLM checkpoints.
|
||||
|
||||
|
||||
BERT
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
BERT has two checkpoints that can be used for multi-lingual tasks:
|
||||
|
||||
- ``bert-base-multilingual-uncased`` (Masked language modeling + Next sentence prediction, 102 languages)
|
||||
- ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages)
|
||||
|
||||
These checkpoints do not require language embeddings at inference time. They should identify the language
|
||||
used in the context and infer accordingly.
|
||||
@@ -98,6 +98,12 @@ Here is the full list of the currently provided pretrained models together with
|
||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
|
||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
||||
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
|
||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
||||
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
|
||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
||||
| | | | RoBERTa using the BERT-base architecture |
|
||||
@@ -113,11 +119,15 @@ Here is the full list of the currently provided pretrained models together with
|
||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
|
||||
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
|
||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
|
||||
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
|
||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
||||
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
|
||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
|
||||
| | | | Salesforce's Large-sized CTRL English model |
|
||||
|
||||
Reference in New Issue
Block a user