Compare commits

...

77 Commits

Author SHA1 Message Date
Lysandre
10d72390c0 Revert #4446 Since it introduces a new dependency
Some checks failed
GitHub-hosted runner / check_code_quality (push) Has been cancelled
2020-05-22 10:49:45 -04:00
Lysandre
e0db6bbd65 Release: v2.10.0 2020-05-22 10:37:44 -04:00
Frankie Liuzzi
bd6e301832 added functionality for electra classification head (#4257)
* added functionality for electra classification head

* unneeded dropout

* Test ELECTRA for sequence classification

* Style

Co-authored-by: Frankie <frankie@frase.io>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-22 09:48:21 -04:00
Lysandre
a086527727 Unused Union should not be imported 2020-05-21 09:42:47 -04:00
Lysandre Debut
9d2ce253de TPU hangs when saving optimizer/scheduler (#4467)
* TPU hangs when saving optimizer/scheduler

* Style

* ParallelLoader is not a DataLoader

* Style

* Addressing @julien-c's comments
2020-05-21 09:18:27 -04:00
Zhangyx
49296533ca Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
Tobias Lee
271bedb485 [examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
Julien Chaumond
865d4d595e [ci] Close #4481 2020-05-20 18:27:42 -04:00
Julien Chaumond
a3af8e86cb Update test_trainer_distributed.py 2020-05-20 18:26:51 -04:00
Cola
eacea530c1 🚨 Remove warning of deprecation (#4477)
Remove warning of deprecated overload of addcdiv_

Fix #4451
2020-05-20 16:48:29 -04:00
Julien Plu
fa2fbed3e5 Better None gradients handling in TF Trainer (#4469)
* Better None gradients handling

* Apply Style

* Apply Style
2020-05-20 16:46:21 -04:00
Oliver Åstrand
e708bb75bf Correct TF formatting to exclude LayerNorms from weight decay (#4448)
* Exclude LayerNorms from weight decay

* Include both formats of layer norm
2020-05-20 16:45:59 -04:00
Rens
49c06132df pass on tokenizer to pipeline (#4489) 2020-05-20 22:23:21 +02:00
Nathan Cooper
cacb654c7f Add Fine-tune DialoGPT on new datasets notebook (#4473) 2020-05-20 16:17:52 -04:00
Timo Moeller
30a09f3827 Adjust german bert model card, add new model card (#4488) 2020-05-20 16:08:29 -04:00
Lysandre Debut
14cb5b35fa Fix slow gpu tests lysandre (#4487)
* There is one missing key in BERT

* Correct device for CamemBERT model

* RoBERTa tokenization adding prefix space

* Style
2020-05-20 11:59:45 -04:00
Manuel Romero
6dc52c78d8 Create README.md (#4482) 2020-05-20 09:45:50 -04:00
Manuel Romero
ed5456daf4 Model card for RuPERTa-base fine-tuned for NER (#4466) 2020-05-20 09:45:24 -04:00
Oleksandr Bushkovskyi
c76450e20c Model card for Tereveni-AI/gpt2-124M-uk-fiction (#4470)
Create model card for "Tereveni-AI/gpt2-124M-uk-fiction" model
2020-05-20 09:44:26 -04:00
Hu Xu
9907dc523a add BERT trained from review corpus. (#4405)
* add model_cards for BERT trained on reviews.

* add link to repository.

* refine README.md for each review model
2020-05-20 09:42:35 -04:00
Sam Shleifer
efbc1c5a9d [MarianTokenizer] implement save_vocabulary and other common methods (#4389) 2020-05-19 19:45:49 -04:00
Sam Shleifer
956c4c4eb4 [gpu slow tests] fix mbart-large-enro gpu tests (#4472) 2020-05-19 19:45:31 -04:00
Patrick von Platen
48c3a70b4e [Longformer] Docs and clean API (#4464)
* add longformer docs

* improve docs
2020-05-19 21:52:36 +02:00
Patrick von Platen
aa925a52fa [Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
Suraj Patil
5856999a9f add T5 fine-tuning notebook [Community notebooks] (#4462)
* add T5 fine-tuning notebook [Community notebooks]

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-19 18:26:28 +02:00
Sam Shleifer
07dd7c2fd8 [cleanup] test_tokenization_common.py (#4390) 2020-05-19 10:46:55 -04:00
Iz Beltagy
8f1d047148 Longformer (#4352)
* first commit

* bug fixes

* better examples

* undo padding

* remove wrong VOCAB_FILES_NAMES

* License

* make style

* make isort happy

* unit tests

* integration test

* make `black` happy by undoing `isort` changes!!

* lint

* no need for the padding value

* batch_size not bsz

* remove unused type casting

* seqlen not seq_len

* staticmethod

* `bert` selfattention instead of `n2`

* uint8 instead of bool + lints

* pad inputs_embeds using embeddings not a constant

* black

* unit test with padding

* fix unit tests

* remove redundant unit test

* upload model weights

* resolve todo

* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_

* increase unittest coverage
2020-05-19 16:04:43 +02:00
Girishkumar
31eedff5a0 Refactored the README.md file (#4427) 2020-05-19 09:56:24 -04:00
Shaoyen
384f0eb2f9 Map optimizer to correct device after loading from checkpoint. (#4403)
* Map optimizer to correct device after loading from checkpoint.

* Make style test pass

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 23:16:05 -04:00
Julien Chaumond
bf14ef75f1 [Trainer] move model to device before setting optimizer (#4450) 2020-05-18 23:13:33 -04:00
Julien Chaumond
5e7fe8b585 Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
Julien Chaumond
4c06893610 Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300)
* Test case for #3936

* multigpu tests pass on pytorch 1.4.0

* Fixup

* multigpu tests pass on pytorch 1.5.0

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* rename multigpu to require_multigpu

* mode doc
2020-05-18 20:34:50 -04:00
Rakesh Chada
9de4afa897 Make get_last_lr in trainer backward compatible (#4446)
* makes fetching last learning late in trainer backward compatible

* split comment to multiple lines

* fixes black styling issue

* uses version to create a more explicit logic
2020-05-18 20:17:36 -04:00
Stefan Dumitrescu
42e8fbfc51 Added model cards for Romanian BERT models (#4437)
* Create README.md

* Create README.md

* Update README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:48:56 -04:00
Oliver Guhr
54065d68b8 added model card for german-sentiment-bert (#4435) 2020-05-18 18:44:41 -04:00
Martin Müller
e28b7e2311 Create README.md (#4433) 2020-05-18 18:41:34 -04:00
sy-wada
09b933f19d Update README.md (model_card) (#4424)
- add a citation.
- modify the table of the BLUE benchmark.

The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
2020-05-18 18:18:17 -04:00
Manuel Romero
235777ccc9 Modify example of usage (#4413)
I followed the google example of usage for its electra small model but i have seen it is not meaningful, so i created a better example
2020-05-18 18:17:33 -04:00
Suraj Patil
9ddd3a6548 add model card for t5-base-squad (#4409)
* add model card for t5-base-squad

* Update model_cards/valhalla/t5-base-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:17:14 -04:00
HUSEIN ZOLKEPLI
c5aa114392 Added README huseinzol05/t5-base-bahasa-cased (#4377)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny

* added electra model

* added gpt2 117m bahasa readme

* added gpt2 345m bahasa readme

* added t5-base-bahasa

* fix readme

* Update model_cards/huseinzol05/t5-base-bahasa-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:10:23 -04:00
Funtowicz Morgan
ca4a3f4da9 Adding optimizations block from ONNXRuntime. (#4431)
* Adding optimizations block from ONNXRuntime.

* Turn off external data format by default for PyTorch export.

* Correct the way use_external_format is passed through the cmdline args.
2020-05-18 20:32:33 +02:00
Patrick von Platen
24538df919 [Community notebooks] General notebooks (#4441)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-05-18 20:23:57 +02:00
Sam Shleifer
a699525d25 [test_pipelines] Mark tests > 10s @slow, small speedups (#4421) 2020-05-18 12:23:21 -04:00
Boris Dayma
d9ece8233d fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
Patrick von Platen
d39bf0ac2d better naming in tf t5 (#4401) 2020-05-18 11:34:00 -04:00
Patrick von Platen
590adb130b improve docstring (#4422) 2020-05-18 11:31:35 -04:00
Patrick von Platen
026a5d0888 [T5 fp16] Fix fp16 in T5 (#4436)
* fix fp16 in t5

* make style

* refactor invert_attention_mask fn

* fix typo
2020-05-18 17:25:58 +02:00
Soham Chatterjee
fa6113f9a0 Fixed spelling of training (#4416) 2020-05-18 11:23:29 -04:00
Julien Chaumond
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
Patrick von Platen
a27c795908 fix (#4419) 2020-05-18 15:51:40 +02:00
Funtowicz Morgan
31c799a0c9 Tag onnx export tests as slow (#4432) 2020-05-18 09:24:41 -04:00
Mehrad Moradshahi
8581a670e3 [MbartTokenizer] save to sentencepiece.bpe.model (#4335) 2020-05-18 08:54:04 -04:00
Lorenzo Ampil
18d233d525 Allow the creation of "entity groups" for NerPipeline #3548 (#3957)
* Add index to be returned by NerPipeline to allow for the creation of

* Add entity groups

* Convert entity list to dict

* Add entity to entity_group_disagg atfter updating entity gorups

* Change 'group' parameter to 'grouped_entities'

* Add unit tests for grouped NER pipeline case

* Correct variable name typo for NER_FINETUNED_MODELS

* Sync grouped tests to recent test updates
2020-05-17 09:25:17 +02:00
Julien Chaumond
3e0f062106 Fix addcmul_ 2020-05-15 17:44:17 -04:00
Julien Chaumond
fc2a4c88ce Fix: one more try 2020-05-15 17:38:48 -04:00
Julien Chaumond
55bda52555 Same fix for addcmul_ 2020-05-15 17:23:48 -04:00
Julien Chaumond
ad02c961c6 Fix UserWarning: This overload of add_ is deprecated in pytorch==1.5.0 2020-05-15 17:09:11 -04:00
Julien Chaumond
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
Nikita
62427d0815 rerun notebook 02-transformers (#4341) 2020-05-15 10:33:08 -04:00
Jared T Nielsen
34706ba050 Allow for None gradients in GradientAccumulator. (#4372) 2020-05-15 09:52:00 -04:00
Lysandre Debut
edf9ac11d4 Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00
Funtowicz Morgan
b908f2e9dd Attempt to unpin torch version for Github Action. (#4384) 2020-05-15 15:47:15 +02:00
Julien Chaumond
af2e6bf87c [examples] Streamline doc 2020-05-14 20:34:31 -04:00
Lysandre Debut
7defc6670f p_mask in SQuAD pre-processing (#4049)
* Better p_mask building

* Adressing @mfuntowicz comments
2020-05-14 17:07:52 -04:00
Morgan Funtowicz
84894974bd Updated ONNX notebook link in README. 2020-05-14 22:40:59 +02:00
Funtowicz Morgan
db0076a9df Conversion script to export transformers models to ONNX IR. (#4253)
* Added generic ONNX conversion script for PyTorch model.

* WIP initial TF support.

* TensorFlow/Keras ONNX export working.

* Print framework version info

* Add possibility to check the model is correctly loading on ONNX runtime.

* Remove quantization option.

* Specify ONNX opset version when exporting.

* Formatting.

* Remove unused imports.

* Make functions more generally reusable from other part of the code.

* isort happy.

* flake happy

* Export only feature-extraction for now

* Correctly check inputs order / filter before export.

* Removed task variable

* Fix invalid args call in load_graph_from_args.

* Fix invalid args call in convert.

* Fix invalid args call in infer_shapes.

* Raise exception and catch in caller function instead of exit.

* Add 04-onnx-export.ipynb notebook

* More WIP on the notebook

* Remove unused imports

* Simplify & remove unused constants.

* Export with constant_folding in PyTorch

* Let's try to put function args in the right order this time ...

* Disable external_data_format temporary

* ONNX notebook draft ready.

* Updated notebooks charts + wording

* Correct error while exporting last chart in notebook.

* Adressing @LysandreJik comment.

* Set ONNX opset to 11 as default value.

* Set opset param mandatory

* Added ONNX export unittests

* Quality.

* flake8 happy

* Add keras2onnx dependency on extras["tf"]

* Pin keras2onnx on github master to v1.6.5

* Second attempt.

* Third attempt.

* Use the right repo URL this time ...

* Do the same for onnxconverter-common

* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2

* Correct commit hash.

* Addressing PR review: Optimization are enabled by default.

* Addressing PR review: small changes in the notebook

* setup.py comment about keras2onnx versioning.
2020-05-14 16:35:52 -04:00
Suraj Patil
2d05480174 Fix trainer evaluation (#4363)
* fix loss calculation in evaluation

* fix evaluation on TPU when prediction_loss_only is True
2020-05-14 14:39:44 -04:00
Savaş Yıldırım
035678efdb Create README.md (#4359)
* Create README.md

* Update model_cards/savasy/bert-base-turkish-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-14 14:07:32 -04:00
sy-wada
b9c9e05381 Create README.md (#4357) 2020-05-14 14:06:10 -04:00
Sam Shleifer
9535bf1977 Tokenizer.batch_decode convenience method (#4159) 2020-05-14 13:50:47 -04:00
Sam Shleifer
7822cd38a0 [tests] make pipelines tests faster with smaller models (#4238)
covers torch and tf. Also fixes a failing @slow test
2020-05-14 13:36:02 -04:00
Julien Chaumond
448c467256 Fix: unpin flake8 and fix cs errors (#4367)
* Fix: unpin flake8 and fix cs errors

* Ok we still need to quote those
2020-05-14 13:14:26 -04:00
Julien Chaumond
c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
Julien Chaumond
015f7812ed [ci skip] Pin isort 2020-05-14 10:12:18 -04:00
Lysandre Debut
ef46ccb05c TPU needs a rendezvous (#4339) 2020-05-14 08:59:52 -04:00
Viktor Alm
94cb73c2d2 Add image and metadata (#4345)
Unfortunately i accidentally orphaned my other PR
2020-05-13 20:05:15 -04:00
Manuel Romero
a0eebdc404 Add link to W&B to see whole training logs (#4348) 2020-05-13 20:04:57 -04:00
118 changed files with 4560 additions and 939 deletions

View File

@@ -21,7 +21,7 @@ jobs:
- name: Install dependencies
run: |
pip install torch
pip install numpy tokenizers filelock requests tqdm regex sentencepiece sacremoses
pip install numpy tokenizers filelock requests tqdm regex sentencepiece sacremoses packaging
- name: Torch hub list
run: |

View File

@@ -35,7 +35,7 @@ jobs:
- name: Install dependencies
run: |
source .env/bin/activate
pip install torch==1.4.0
pip install torch
pip install .[sklearn,testing]
- name: Are GPUs recognized by our DL frameworks

View File

@@ -198,11 +198,12 @@ Follow these steps to start contributing:
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality test, no merge.
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run them.
6. All public methods must have informative docstrings;
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an example.
### Tests

View File

@@ -165,8 +165,9 @@ At some point in the future, you'll be able to seamlessly move from pre-training
18. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
21. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
22. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).

View File

@@ -26,7 +26,7 @@ author = u'huggingface'
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'2.9.1'
release = u'2.10.0'
# -- General configuration ---------------------------------------------------

View File

@@ -109,3 +109,4 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
model_doc/dialogpt
model_doc/reformer
model_doc/marian
model_doc/longformer

View File

@@ -6,7 +6,7 @@ Overview
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups

View File

@@ -0,0 +1,69 @@
Longformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview
~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer <https://arxiv.org/pdf/2004.05150.pdf>`_ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Here the abstract:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.*
The Authors' code can be found `here <https://github.com/allenai/longformer>`_ .
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer.
A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`.
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices.
Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally" attending tokens so that global attention is *symmetric*.
The user can define which tokens are masked, which tokens attend "locally" and which tokens attend "globally" by setting the `config.attention_mask` `torch.Tensor` appropriately. In contrast to other models `Longformer` accepts the following values in `config.attention_mask`: `0` - the token is masked and not attended at all (as is done in other models), `1` - the token attends "locally", `2` - token attends "globally". For more information please also refer to :func:`~transformers.LongformerModel.forward` method.
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of "locally" attending tokens.
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`_ .
Training
~~~~~~~~~~~~~~~~~~~~
``LongformerForMaskedLM`` is trained the exact same way, ``RobertaForMaskedLM`` is trained and
should be used as follows:
::
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
LongformerConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerConfig
:members:
LongformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizer
:members:
LongformerModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerModel
:members:
LongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMaskedLM
:members:

View File

@@ -305,3 +305,9 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Longformer | ``longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+

View File

@@ -1,6 +1,6 @@
# Examples
## Examples
Version 2.9 of `transformers` introduces a new `Trainer` class for PyTorch, and its equivalent `TFTrainer` for TF 2.
Version 2.9 of `transformers` introduces a new [`Trainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) class for PyTorch, and its equivalent [`TFTrainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer_tf.py) for TF 2.
Here is the list of all our examples:
- **grouped by task** (all official examples work for multiple models)
@@ -12,32 +12,24 @@ Here is the list of all our examples:
This is still a work-in-progress in particular documentation is still sparse so please **contribute improvements/pull requests.**
## Tasks built on Trainer
# The Big Table of Tasks
| Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab | One-click Deploy to Azure (wip) |
|---|---|:---:|:---:|:---:|:---:|:---:|
| [`language-modeling`](./language-modeling) | Raw text | ✅ | - | - | - | - |
| [`text-classification`](./text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb) | [![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-storage-account-create%2Fazuredeploy.json) |
| [`token-classification`](./token-classification) | CoNLL NER | ✅ | ✅ | ✅ | - | - |
| [`multiple-choice`](./multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb) | - |
| [`question-answering`](./question-answering) | SQuAD | - | ✅ | - | - | - |
| Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab
|---|---|:---:|:---:|:---:|:---:|
| [**`language-modeling`**](./language-modeling) | Raw text | ✅ | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
| [**`text-classification`**](./text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
| [**`token-classification`**](./token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
| [**`multiple-choice`**](./multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
| [**`question-answering`**](./question-answering) | SQuAD | - | | - | -
| [**`text-generation`**](./text-generation) | - | - | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
| [**`distillation`**](./distillation) | All | - | - | - | -
| [**`summarization`**](./summarization) | CNN/Daily Mail | - | - | - | -
| [**`translation`**](./translation) | WMT | - | - | - | -
| [**`bertology`**](./bertology) | - | - | - | - | -
| [**`adversarial`**](./adversarial) | HANS | - | - | - | -
## Other examples and how-to's
| Section | Description |
|---|---|
| [TensorFlow 2.0 models on GLUE](./text-classification) | Examples running BERT TensorFlow 2.0 model on the GLUE tasks. |
| [Running on TPUs](#running-on-tpus) | Examples on running fine-tuning tasks on Google TPUs to accelerate workloads. |
| [Language Model training](./language-modeling) | Fine-tuning (or training from scratch) the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. |
| [Language Generation](./text-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. |
| [GLUE](./text-classification) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. |
| [SQuAD](./question-answering) | Using BERT/RoBERTa/XLNet/XLM for question answering, examples with distributed training. |
| [Multiple Choice](./multiple-choice) | Examples running BERT/XLNet/RoBERTa on the SWAG/RACE/ARC tasks. |
| [Named Entity Recognition](./token-classification) | Using BERT for Named Entity Recognition (NER) on the CoNLL 2003 dataset, examples with distributed training. |
| [XNLI](./text-classification) | Examples running BERT/XLM on the XNLI benchmark. |
| [Adversarial evaluation of model performances](./adversarial) | Testing a model with adversarial evaluation of natural language inference on the Heuristic Analysis for NLI Systems (HANS) dataset (McCoy et al., 2019.) |
<br>
## Important note
@@ -52,6 +44,12 @@ pip install .
pip install -r ./examples/requirements.txt
```
## One-click Deploy to Cloud (wip)
#### Azure
[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-storage-account-create%2Fazuredeploy.json)
## Running on TPUs
When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Strategy`.

View File

@@ -478,7 +478,7 @@ def _compute_pytorch(
dictionary[model_name]["memory"][batch_size][slice_size] = "N/A"
if not no_speed:
print_fn("Going through model with sequence of shape".format(sequence.shape))
print_fn("Going through model with sequence of shape {}".format(sequence.shape))
runtimes = timeit.repeat(lambda: inference(sequence), repeat=average_over, number=3)
average_time = sum(runtimes) / float(len(runtimes)) / 3.0
dictionary[model_name]["time"][batch_size][slice_size] = average_time

View File

@@ -64,7 +64,7 @@ def print_2d_tensor(tensor):
def compute_heads_importance(
args, model, eval_dataloader, compute_entropy=True, compute_importance=True, head_mask=None
args, model, eval_dataloader, compute_entropy=True, compute_importance=True, head_mask=None, actually_pruned=False
):
""" This method shows how to compute:
- head attention entropy
@@ -77,7 +77,12 @@ def compute_heads_importance(
if head_mask is None:
head_mask = torch.ones(n_layers, n_heads).to(args.device)
head_mask.requires_grad_(requires_grad=True)
# If actually pruned attention multi-head, set head mask to None to avoid shape mismatch
if actually_pruned:
head_mask = None
preds = None
labels = None
tot_tokens = 0.0
@@ -172,6 +177,7 @@ def mask_heads(args, model, eval_dataloader):
new_head_mask = new_head_mask.view(-1)
new_head_mask[current_heads_to_mask] = 0.0
new_head_mask = new_head_mask.view_as(head_mask)
new_head_mask = new_head_mask.clone().detach()
print_2d_tensor(new_head_mask)
# Compute metric and head importance again
@@ -181,7 +187,7 @@ def mask_heads(args, model, eval_dataloader):
preds = np.argmax(preds, axis=1) if args.output_mode == "classification" else np.squeeze(preds)
current_score = glue_compute_metrics(args.task_name, preds, labels)[args.metric_name]
logger.info(
"Masking: current score: %f, remaning heads %d (%.1f percents)",
"Masking: current score: %f, remaining heads %d (%.1f percents)",
current_score,
new_head_mask.sum(),
new_head_mask.sum() / new_head_mask.numel() * 100,
@@ -209,14 +215,23 @@ def prune_heads(args, model, eval_dataloader, head_mask):
original_time = datetime.now() - before_time
original_num_params = sum(p.numel() for p in model.parameters())
heads_to_prune = dict((layer, (1 - head_mask[layer].long()).nonzero().tolist()) for layer in range(len(head_mask)))
heads_to_prune = dict(
(layer, (1 - head_mask[layer].long()).nonzero().squeeze().tolist()) for layer in range(len(head_mask))
)
assert sum(len(h) for h in heads_to_prune.values()) == (1 - head_mask.long()).sum().item()
model.prune_heads(heads_to_prune)
pruned_num_params = sum(p.numel() for p in model.parameters())
before_time = datetime.now()
_, _, preds, labels = compute_heads_importance(
args, model, eval_dataloader, compute_entropy=False, compute_importance=False, head_mask=None
args,
model,
eval_dataloader,
compute_entropy=False,
compute_importance=False,
head_mask=None,
actually_pruned=True,
)
preds = np.argmax(preds, axis=1) if args.output_mode == "classification" else np.squeeze(preds)
score_pruning = glue_compute_metrics(args.task_name, preds, labels)[args.metric_name]
@@ -404,7 +419,7 @@ def main():
logger.info("Training/evaluation parameters %s", args)
# Prepare dataset for the GLUE task
eval_dataset = GlueDataset(args, tokenizer=tokenizer, evaluate=True)
eval_dataset = GlueDataset(args, tokenizer=tokenizer, mode="dev")
if args.data_subset > 0:
eval_dataset = Subset(eval_dataset, list(range(min(args.data_subset, len(eval_dataset)))))
eval_sampler = SequentialSampler(eval_dataset) if args.local_rank == -1 else DistributedSampler(eval_dataset)

View File

@@ -80,7 +80,7 @@ def main():
# Load a pre-trained model
model = TransfoXLLMHeadModel.from_pretrained(args.model_name)
model = model.to(device)
model.to(device)
logger.info(
"Evaluating with bsz {} tgt_len {} ext_len {} mem_len {} clamp_len {}".format(

View File

@@ -80,7 +80,7 @@ class Distiller:
self.mlm = params.mlm
if self.mlm:
logger.info(f"Using MLM loss for LM step.")
logger.info("Using MLM loss for LM step.")
self.mlm_mask_prop = params.mlm_mask_prop
assert 0.0 <= self.mlm_mask_prop <= 1.0
assert params.word_mask + params.word_keep + params.word_rand == 1.0
@@ -91,7 +91,7 @@ class Distiller:
self.pred_probs = self.pred_probs.half()
self.token_probs = self.token_probs.half()
else:
logger.info(f"Using CLM loss for LM step.")
logger.info("Using CLM loss for LM step.")
self.epoch = 0
self.n_iter = 0
@@ -365,8 +365,8 @@ class Distiller:
self.end_epoch()
if self.is_master:
logger.info(f"Save very last checkpoint as `pytorch_model.bin`.")
self.save_checkpoint(checkpoint_name=f"pytorch_model.bin")
logger.info("Save very last checkpoint as `pytorch_model.bin`.")
self.save_checkpoint(checkpoint_name="pytorch_model.bin")
logger.info("Training is finished")
def step(self, input_ids: torch.tensor, attention_mask: torch.tensor, lm_labels: torch.tensor):

View File

@@ -60,7 +60,7 @@ def main():
with open(args.file_path, "r", encoding="utf8") as fp:
data = fp.readlines()
logger.info(f"Start encoding")
logger.info("Start encoding")
logger.info(f"{len(data)} examples to process.")
rslt = []

View File

@@ -93,7 +93,7 @@ if __name__ == "__main__":
elif args.model_type == "gpt2":
for w in ["weight", "bias"]:
compressed_sd[f"{prefix}.ln_f.{w}"] = state_dict[f"{prefix}.ln_f.{w}"]
compressed_sd[f"lm_head.weight"] = state_dict[f"lm_head.weight"]
compressed_sd["lm_head.weight"] = state_dict["lm_head.weight"]
print(f"N layers selected for distillation: {std_idx}")
print(f"Number of params transfered for distillation: {len(compressed_sd.keys())}")

View File

@@ -37,7 +37,7 @@ if __name__ == "__main__":
model = BertForMaskedLM.from_pretrained(args.model_name)
prefix = "bert"
else:
raise ValueError(f'args.model_type should be "bert".')
raise ValueError('args.model_type should be "bert".')
state_dict = model.state_dict()
compressed_sd = {}
@@ -78,8 +78,8 @@ if __name__ == "__main__":
]
std_idx += 1
compressed_sd[f"vocab_projector.weight"] = state_dict[f"cls.predictions.decoder.weight"]
compressed_sd[f"vocab_projector.bias"] = state_dict[f"cls.predictions.bias"]
compressed_sd["vocab_projector.weight"] = state_dict["cls.predictions.decoder.weight"]
compressed_sd["vocab_projector.bias"] = state_dict["cls.predictions.bias"]
if args.vocab_transform:
for w in ["weight", "bias"]:
compressed_sd[f"vocab_transform.{w}"] = state_dict[f"cls.predictions.transform.dense.{w}"]

View File

@@ -273,7 +273,7 @@ def main():
token_probs = None
train_lm_seq_dataset = LmSeqsDataset(params=args, data=data)
logger.info(f"Data loader created.")
logger.info("Data loader created.")
# STUDENT #
logger.info(f"Loading student config from {args.student_config}")
@@ -288,7 +288,7 @@ def main():
if args.n_gpu > 0:
student.to(f"cuda:{args.local_rank}")
logger.info(f"Student loaded.")
logger.info("Student loaded.")
# TEACHER #
teacher = teacher_model_class.from_pretrained(args.teacher_name, output_hidden_states=True)

View File

@@ -115,15 +115,13 @@ class DataTrainingArguments:
)
def get_dataset(args: DataTrainingArguments, tokenizer: PreTrainedTokenizer, evaluate=False, local_rank=-1):
def get_dataset(args: DataTrainingArguments, tokenizer: PreTrainedTokenizer, evaluate=False):
file_path = args.eval_data_file if evaluate else args.train_data_file
if args.line_by_line:
return LineByLineTextDataset(
tokenizer=tokenizer, file_path=file_path, block_size=args.block_size, local_rank=local_rank
)
return LineByLineTextDataset(tokenizer=tokenizer, file_path=file_path, block_size=args.block_size)
else:
return TextDataset(
tokenizer=tokenizer, file_path=file_path, block_size=args.block_size, local_rank=local_rank,
tokenizer=tokenizer, file_path=file_path, block_size=args.block_size, overwrite_cache=args.overwrite_cache
)
@@ -220,16 +218,9 @@ def main():
data_args.block_size = min(data_args.block_size, tokenizer.max_len)
# Get datasets
train_dataset = (
get_dataset(data_args, tokenizer=tokenizer, local_rank=training_args.local_rank)
if training_args.do_train
else None
)
eval_dataset = (
get_dataset(data_args, tokenizer=tokenizer, local_rank=training_args.local_rank, evaluate=True)
if training_args.do_eval
else None
)
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
eval_dataset = get_dataset(data_args, tokenizer=tokenizer, evaluate=True) if training_args.do_eval else None
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer, mlm=data_args.mlm, mlm_probability=data_args.mlm_probability
)
@@ -260,7 +251,7 @@ def main():
# Evaluation
results = {}
if training_args.do_eval and training_args.local_rank in [-1, 0]:
if training_args.do_eval:
logger.info("*** Evaluate ***")
eval_output = trainer.evaluate()
@@ -269,11 +260,12 @@ def main():
result = {"perplexity": perplexity}
output_eval_file = os.path.join(training_args.output_dir, "eval_results_lm.txt")
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key in sorted(result.keys()):
logger.info(" %s = %s", key, str(result[key]))
writer.write("%s = %s\n" % (key, str(result[key])))
if trainer.is_world_master():
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key in sorted(result.keys()):
logger.info(" %s = %s", key, str(result[key]))
writer.write("%s = %s\n" % (key, str(result[key])))
results.update(result)

View File

@@ -159,7 +159,6 @@ def main():
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
mode=Split.train,
local_rank=training_args.local_rank,
)
if training_args.do_train
else None
@@ -172,7 +171,6 @@ def main():
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
mode=Split.dev,
local_rank=training_args.local_rank,
)
if training_args.do_eval
else None
@@ -204,19 +202,20 @@ def main():
# Evaluation
results = {}
if training_args.do_eval and training_args.local_rank in [-1, 0]:
if training_args.do_eval:
logger.info("*** Evaluate ***")
result = trainer.evaluate()
output_eval_file = os.path.join(training_args.output_dir, "eval_results.txt")
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key, value in result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
if trainer.is_world_master():
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key, value in result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
results.update(result)
results.update(result)
return results

View File

@@ -26,6 +26,7 @@ from enum import Enum
from typing import List, Optional
import tqdm
from filelock import FileLock
from transformers import PreTrainedTokenizer, is_tf_available, is_torch_available
@@ -77,7 +78,6 @@ class Split(Enum):
if is_torch_available():
import torch
from torch.utils.data.dataset import Dataset
from transformers import torch_distributed_zero_first
class MultipleChoiceDataset(Dataset):
"""
@@ -95,7 +95,6 @@ if is_torch_available():
max_seq_length: Optional[int] = None,
overwrite_cache=False,
mode: Split = Split.train,
local_rank=-1,
):
processor = processors[task]()
@@ -103,9 +102,11 @@ if is_torch_available():
data_dir,
"cached_{}_{}_{}_{}".format(mode.value, tokenizer.__class__.__name__, str(max_seq_length), task,),
)
with torch_distributed_zero_first(local_rank):
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
lock_path = cached_features_file + ".lock"
with FileLock(lock_path):
if os.path.exists(cached_features_file) and not overwrite_cache:
logger.info(f"Loading features from cached file {cached_features_file}")
@@ -130,9 +131,8 @@ if is_torch_available():
pad_token=tokenizer.pad_token_id,
pad_token_segment_id=tokenizer.pad_token_type_id,
)
if local_rank in [-1, 0]:
logger.info("Saving features into cached file %s", cached_features_file)
torch.save(self.features, cached_features_file)
logger.info("Saving features into cached file %s", cached_features_file)
torch.save(self.features, cached_features_file)
def __len__(self):
return len(self.features)
@@ -535,7 +535,12 @@ def convert_examples_to_features(
text_b = example.question + " " + ending
inputs = tokenizer.encode_plus(
text_a, text_b, add_special_tokens=True, max_length=max_length, pad_to_max_length=True,
text_a,
text_b,
add_special_tokens=True,
max_length=max_length,
pad_to_max_length=True,
return_overflowing_tokens=True,
)
if "num_truncated_tokens" in inputs and inputs["num_truncated_tokens"] > 0:
logger.info(

View File

@@ -135,7 +135,8 @@ def main():
# Get datasets
train_dataset = GlueDataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
eval_dataset = GlueDataset(data_args, tokenizer=tokenizer, evaluate=True) if training_args.do_eval else None
eval_dataset = GlueDataset(data_args, tokenizer=tokenizer, mode="dev") if training_args.do_eval else None
test_dataset = GlueDataset(data_args, tokenizer=tokenizer, mode="test") if training_args.do_predict else None
def compute_metrics(p: EvalPrediction) -> Dict:
if output_mode == "classification":
@@ -165,31 +166,57 @@ def main():
tokenizer.save_pretrained(training_args.output_dir)
# Evaluation
results = {}
if training_args.do_eval and training_args.local_rank in [-1, 0]:
eval_results = {}
if training_args.do_eval:
logger.info("*** Evaluate ***")
# Loop to handle MNLI double evaluation (matched, mis-matched)
eval_datasets = [eval_dataset]
if data_args.task_name == "mnli":
mnli_mm_data_args = dataclasses.replace(data_args, task_name="mnli-mm")
eval_datasets.append(GlueDataset(mnli_mm_data_args, tokenizer=tokenizer, evaluate=True))
eval_datasets.append(GlueDataset(mnli_mm_data_args, tokenizer=tokenizer, mode="dev"))
for eval_dataset in eval_datasets:
result = trainer.evaluate(eval_dataset=eval_dataset)
eval_result = trainer.evaluate(eval_dataset=eval_dataset)
output_eval_file = os.path.join(
training_args.output_dir, f"eval_results_{eval_dataset.args.task_name}.txt"
)
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results {} *****".format(eval_dataset.args.task_name))
for key, value in result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
if trainer.is_world_master():
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results {} *****".format(eval_dataset.args.task_name))
for key, value in eval_result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
results.update(result)
eval_results.update(eval_result)
return results
if training_args.do_predict:
logging.info("*** Test ***")
test_datasets = [test_dataset]
if data_args.task_name == "mnli":
mnli_mm_data_args = dataclasses.replace(data_args, task_name="mnli-mm")
test_datasets.append(GlueDataset(mnli_mm_data_args, tokenizer=tokenizer, mode="test"))
for test_dataset in test_datasets:
predictions = trainer.predict(test_dataset=test_dataset).predictions
if output_mode == "classification":
predictions = np.argmax(predictions, axis=1)
output_test_file = os.path.join(
training_args.output_dir, f"test_results_{test_dataset.args.task_name}.txt"
)
if trainer.is_world_master():
with open(output_test_file, "w") as writer:
logger.info("***** Test results {} *****".format(test_dataset.args.task_name))
writer.write("index\tprediction\n")
for index, item in enumerate(predictions):
if output_mode == "regression":
writer.write("%d\t%3.3f\n" % (index, item))
else:
item = test_dataset.get_labels()[item]
writer.write("%d\t%s\n" % (index, item))
return eval_results
def _mp_fn(index):

View File

@@ -171,7 +171,6 @@ def main():
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
mode=Split.train,
local_rank=training_args.local_rank,
)
if training_args.do_train
else None
@@ -185,7 +184,6 @@ def main():
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
mode=Split.dev,
local_rank=training_args.local_rank,
)
if training_args.do_eval
else None
@@ -237,22 +235,23 @@ def main():
# Evaluation
results = {}
if training_args.do_eval and training_args.local_rank in [-1, 0]:
if training_args.do_eval:
logger.info("*** Evaluate ***")
result = trainer.evaluate()
output_eval_file = os.path.join(training_args.output_dir, "eval_results.txt")
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key, value in result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
if trainer.is_world_master():
with open(output_eval_file, "w") as writer:
logger.info("***** Eval results *****")
for key, value in result.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
results.update(result)
# Predict
if training_args.do_predict and training_args.local_rank in [-1, 0]:
if training_args.do_predict:
test_dataset = NerDataset(
data_dir=data_args.data_dir,
tokenizer=tokenizer,
@@ -261,33 +260,36 @@ def main():
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
mode=Split.test,
local_rank=training_args.local_rank,
)
predictions, label_ids, metrics = trainer.predict(test_dataset)
preds_list, _ = align_predictions(predictions, label_ids)
output_test_results_file = os.path.join(training_args.output_dir, "test_results.txt")
with open(output_test_results_file, "w") as writer:
for key, value in metrics.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
if trainer.is_world_master():
with open(output_test_results_file, "w") as writer:
for key, value in metrics.items():
logger.info(" %s = %s", key, value)
writer.write("%s = %s\n" % (key, value))
# Save predictions
output_test_predictions_file = os.path.join(training_args.output_dir, "test_predictions.txt")
with open(output_test_predictions_file, "w") as writer:
with open(os.path.join(data_args.data_dir, "test.txt"), "r") as f:
example_id = 0
for line in f:
if line.startswith("-DOCSTART-") or line == "" or line == "\n":
writer.write(line)
if not preds_list[example_id]:
example_id += 1
elif preds_list[example_id]:
output_line = line.split()[0] + " " + preds_list[example_id].pop(0) + "\n"
writer.write(output_line)
else:
logger.warning("Maximum sequence length exceeded: No prediction for '%s'.", line.split()[0])
if trainer.is_world_master():
with open(output_test_predictions_file, "w") as writer:
with open(os.path.join(data_args.data_dir, "test.txt"), "r") as f:
example_id = 0
for line in f:
if line.startswith("-DOCSTART-") or line == "" or line == "\n":
writer.write(line)
if not preds_list[example_id]:
example_id += 1
elif preds_list[example_id]:
output_line = line.split()[0] + " " + preds_list[example_id].pop(0) + "\n"
writer.write(output_line)
else:
logger.warning(
"Maximum sequence length exceeded: No prediction for '%s'.", line.split()[0]
)
return results

View File

@@ -22,6 +22,8 @@ from dataclasses import dataclass
from enum import Enum
from typing import List, Optional, Union
from filelock import FileLock
from transformers import PreTrainedTokenizer, is_tf_available, is_torch_available
@@ -68,7 +70,6 @@ if is_torch_available():
import torch
from torch import nn
from torch.utils.data.dataset import Dataset
from transformers import torch_distributed_zero_first
class NerDataset(Dataset):
"""
@@ -90,16 +91,16 @@ if is_torch_available():
max_seq_length: Optional[int] = None,
overwrite_cache=False,
mode: Split = Split.train,
local_rank=-1,
):
# Load data features from cache or dataset file
cached_features_file = os.path.join(
data_dir, "cached_{}_{}_{}".format(mode.value, tokenizer.__class__.__name__, str(max_seq_length)),
)
with torch_distributed_zero_first(local_rank):
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
lock_path = cached_features_file + ".lock"
with FileLock(lock_path):
if os.path.exists(cached_features_file) and not overwrite_cache:
logger.info(f"Loading features from cached file {cached_features_file}")
@@ -125,9 +126,8 @@ if is_torch_available():
pad_token_segment_id=tokenizer.pad_token_type_id,
pad_token_label_id=self.pad_token_label_id,
)
if local_rank in [-1, 0]:
logger.info(f"Saving features into cached file {cached_features_file}")
torch.save(self.features, cached_features_file)
logger.info(f"Saving features into cached file {cached_features_file}")
torch.save(self.features, cached_features_file)
def __len__(self):
return len(self.features)

View File

@@ -0,0 +1,23 @@
Note: **default code snippet above won't work** because we are using `AlbertTokenizer` with `GPT2LMHeadModel`, see [issue](https://github.com/huggingface/transformers/issues/4285).
## GPT2 124M Trained on Ukranian Fiction
Example usage:
```python
from transformers import AlbertTokenizer, GPT2LMHeadModel
tokenizer = AlbertTokenizer.from_pretrained("Tereveni-AI/gpt2-124M-uk-fiction")
model = GPT2LMHeadModel.from_pretrained("Tereveni-AI/gpt2-124M-uk-fiction")
input_ids = tokenizer.encode('Но зла Юнона, суча дочка,', add_special_tokens=False, return_tensors='pt')
outputs = model.generate(
input_ids,
do_sample=True,
num_return_sequences=3,
max_length=50
)
for i, out in enumerate(outputs):
print('{}: {}'.format(i, tokenizer.decode(out)))
```

View File

@@ -1,19 +1,25 @@
---
language: norwegian
thumbnail: https://i.imgur.com/QqSEC5I.png
---
# Norwegian Electra
Image incoming, im going to have som fun with this one.
![Image of norwegian electra](https://i.imgur.com/QqSEC5I.png)
Trained on Oscar + wikipedia + opensubtitles + some other data I had with the awesome power of TPUs(V3-8)
Use with caution. I have no downstream tasks in Norwegian to test on so I have no idea of its performance yet.
# Model
## Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning
- https://openreview.net/pdf?id=r1xMH1BtvB
- https://github.com/google-research/electra
# Acknowledgments
### TensorFlow Research Cloud
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️
- https://www.tensorflow.org/tfrc
#### OSCAR corpus
- https://oscar-corpus.com/
#### OPUS
- http://opus.nlpl.eu/
- http://www.opensubtitles.org/

View File

@@ -0,0 +1,43 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`.
## Model Description
The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`.
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_laptop")
model = AutoModel.from_pretrained("activebus/BERT-DK_laptop")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -0,0 +1,41 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.
## Model Description
The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_rest")
model = AutoModel.from_pretrained("activebus/BERT-DK_rest")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -0,0 +1,41 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`.
`BERT-PT_*` addtionally uses SQuAD 1.1.
## Model Description
The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_laptop")
model = AutoModel.from_pretrained("activebus/BERT-PT_laptop")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -0,0 +1,42 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.
`BERT-PT_*` addtionally uses SQuAD 1.1.
## Model Description
The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_rest")
model = AutoModel.from_pretrained("activebus/BERT-PT_rest")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -0,0 +1,44 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
Please visit https://github.com/howardhsu/BERT-for-RRC-ABSA for details.
`BERT-XD_Review` is a cross-domain (beyond just `laptop` and `restaurant`) language model, where each example is from a single product / restaurant with the same rating, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
## Model Description
The original model is from `BERT-base-uncased`.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-XD_Review")
model = AutoModel.from_pretrained("activebus/BERT-XD_Review")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -0,0 +1,44 @@
# ReviewBERT
BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
`BERT_Review` is cross-domain (beyond just `laptop` and `restaurant`) language model with one example from randomly mixed domains, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
## Model Description
The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.
Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).
## Instructions
Loading the post-trained weights are as simple as, e.g.,
```python
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("activebus/BERT_Review")
model = AutoModel.from_pretrained("activebus/BERT_Review")
```
## Evaluation Results
Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf)
`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
## Citation
If you find this work useful, please cite as following.
```
@inproceedings{xu_bert2019,
title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
month = "jun",
year = "2019",
}
```

View File

@@ -18,13 +18,16 @@ tags:
**Eval data:** Conll03 (NER), GermEval14 (NER), GermEval18 (Classification), GNAD (Classification)
**Infrastructure**: 1x TPU v2
**Published**: Jun 14th, 2019
**Update April 3rd, 2020**: we updated the vocabulary file on deepset's s3 to conform with the default tokenization of punctuation tokens.
For details see the related [FARM issue](https://github.com/deepset-ai/FARM/issues/60). If you want to use the old vocab we have also uploaded a ["deepset/bert-base-german-cased-oldvocab"](https://huggingface.co/deepset/bert-base-german-cased-oldvocab) model.
## Details
- We trained using Google's Tensorflow code on a single cloud TPU v2 with standard settings.
- We trained 810k steps with a batch size of 1024 for sequence length 128 and 30k steps with sequence length 512. Training took about 9 days.
- As training data we used the latest German Wikipedia dump (6GB of raw txt files), the OpenLegalData dump (2.4 GB) and news articles (3.6 GB).
- We cleaned the data dumps with tailored scripts and segmented sentences with spacy v2.1. To create tensorflow records we used the recommended sentencepiece library for creating the word piece vocabulary and tensorflow scripts to convert the text to data usable by BERT.
- Update April 3rd, 2020: updated the vocab file on deepset s3 to adjust tokenization of punctuation.
See https://deepset.ai/german-bert for more details

View File

@@ -0,0 +1,28 @@
---
language: german
thumbnail: https://static.tildacdn.com/tild6438-3730-4164-b266-613634323466/german_bert.png
tags:
- exbert
---
<a href="https://huggingface.co/exbert/?model=bert-base-german-cased">
<img width="300px" src="https://hf-dinosaur.huggingface.co/exbert/button.png">
</a>
# German BERT with old vocabulary
For details see the related [FARM issue](https://github.com/deepset-ai/FARM/issues/60).
## About us
![deepset logo](https://raw.githubusercontent.com/deepset-ai/FARM/master/docs/img/deepset_logo.png)
We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.
Some of our work:
- [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
- [FARM](https://github.com/deepset-ai/FARM)
- [Haystack](https://github.com/deepset-ai/haystack/)
Get in touch:
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Website](https://deepset.ai)

View File

@@ -0,0 +1,18 @@
# COVID-Twitter-BERT (CT-BERT)
BERT-large-uncased model, pretrained on a corpus of messages from Twitter about COVID-19
## Overview
This model was trained on 160M tweets collected between January 12 and April 16, 2020 containing at least one of the keywords "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2". These tweets were filtered and preprocessed to reach a final sample of 22.5M tweets (containing 40.7M sentences and 633M tokens) which were used for training.
This model was evaluated based on downstream classification tasks, but it could be used for any other NLP task which can leverage contextual embeddings.
In order to achieve best results, make sure to use the same text preprocessing as we did for pretraining. This involves replacing user mentions, urls and emojis. You can find a script on our projects [GitHub repo](https://github.com/digitalepidemiologylab/covid-twitter-bert).
## Example usage
```python
tokenizer = AutoTokenizer.from_pretrained("digitalepidemiologylab/covid-twitter-bert")
model = TFAutoModel.from_pretrained("digitalepidemiologylab/covid-twitter-bert")
```
## References
[1] Martin Müller, Marcel Salaté, Per E Kummervold. "COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter" arXiv preprint arXiv:2005.07503 (2020).

View File

@@ -0,0 +1,48 @@
---
language: romanian
---
# bert-base-romanian-cased-v1
The BERT **base**, **cased** model for Romanian, trained on a 15GB corpus, version ![v1.0](https://img.shields.io/badge/v1.0-21%20Apr%202020-ff6666)
### How to use
```python
from transformers import AutoTokenizer, AutoModel
import torch
# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dumitrescustefan/bert-base-romanian-cased-v1")
model = AutoModel.from_pretrained("dumitrescustefan/bert-base-romanian-cased-v1")
# tokenize a sentence and run through the model
input_ids = torch.tensor(tokenizer.encode("Acesta este un test.", add_special_tokens=True)).unsqueeze(0) # Batch size 1
outputs = model(input_ids)
# get encoding
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
```
### Evaluation
Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).
The baseline is the [Multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) model ``bert-base-multilingual-(un)cased``, as at the time of writing it was the only available BERT model that works on Romanian.
| Model | UPOS | XPOS | NER | LAS |
|--------------------------------|:-----:|:------:|:-----:|:-----:|
| bert-base-multilingual-cased | 97.87 | 96.16 | 84.13 | 88.04 |
| bert-base-romanian-cased-v1 | **98.00** | **96.46** | **85.88** | **89.69** |
### Corpus
The model is trained on the following corpora (stats in the table below are after cleaning):
| Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
|----------- |:--------: |:--------: |:--------: |:--------: |
| OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
| OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
| Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
| **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
#### Acknowledgements
- We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!

View File

@@ -0,0 +1,51 @@
---
language: romanian
---
# bert-base-romanian-uncased-v1
The BERT **base**, **uncased** model for Romanian, trained on a 15GB corpus, version ![v1.0](https://img.shields.io/badge/v1.0-21%20Apr%202020-ff6666)
### How to use
```python
from transformers import AutoTokenizer, AutoModel
import torch
# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1", do_lower_case=True)
model = AutoModel.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1")
# tokenize a sentence and run through the model
input_ids = torch.tensor(tokenizer.encode("Acesta este un test.", add_special_tokens=True)).unsqueeze(0) # Batch size 1
outputs = model(input_ids)
# get encoding
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
```
### Evaluation
Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).
The baseline is the [Multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) model ``bert-base-multilingual-(un)cased``, as at the time of writing it was the only available BERT model that works on Romanian.
| Model | UPOS | XPOS | NER | LAS |
|--------------------------------|:-----:|:------:|:-----:|:-----:|
| bert-base-multilingual-uncased | 97.65 | 95.72 | 83.91 | 87.65 |
| bert-base-romanian-uncased-v1 | **98.18** | **96.84** | **85.26** | **89.61** |
### Corpus
The model is trained on the following corpora (stats in the table below are after cleaning):
| Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
|----------- |:--------: |:--------: |:--------: |:--------: |
| OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
| OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
| Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
| **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
#### Acknowledgements
- We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!

View File

@@ -0,0 +1,74 @@
---
language: malay
---
# Bahasa T5 Model
Pretrained T5 base language model for Malay and Indonesian.
## Pretraining Corpus
`t5-base-bahasa-cased` model was pretrained on multiple tasks. Below is list of tasks we trained on,
1. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [local Wikipedia](https://github.com/huseinzol05/Malaya-Dataset#wikipedia-1).
2. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [local news](https://github.com/huseinzol05/Malaya-Dataset#public-news).
3. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [local parliament text](https://github.com/huseinzol05/Malaya-Dataset#parliament).
4. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [IIUM Confession](https://github.com/huseinzol05/Malaya-Dataset#iium-confession).
5. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [Wattpad](https://github.com/huseinzol05/Malaya-Dataset#wattpad).
6. [Unsupervised](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1875) on [Academia PDF](https://github.com/huseinzol05/Malaya-Dataset#academia-pdf).
7. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [local Wikipedia](https://github.com/huseinzol05/Malaya-Dataset#wikipedia-1).
8. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [local news](https://github.com/huseinzol05/Malaya-Dataset#public-news).
9. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [local parliament text](https://github.com/huseinzol05/Malaya-Dataset#parliament).
10. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [IIUM Confession](https://github.com/huseinzol05/Malaya-Dataset#iium-confession).
11. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [Wattpad](https://github.com/huseinzol05/Malaya-Dataset#wattpad).
12. [Next sentence prediction](https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py#L1129) on [Academia PDF](https://github.com/huseinzol05/Malaya-Dataset#academia-pdf).
13. [Bahasa SNLI](https://github.com/huseinzol05/Malaya-Dataset#snli).
14. [Bahasa Question Quora](https://github.com/huseinzol05/Malaya-Dataset#quora).
15. [Bahasa Natural Questions](https://github.com/huseinzol05/Malaya-Dataset#natural-questions).
16. [News title summarization](https://github.com/huseinzol05/Malaya-Dataset#crawled-news).
17. [Stemming to original wikipedia](https://github.com/huseinzol05/Malaya/blob/master/pretrained-model/t5/generate-stemming.ipynb).
18. [Synonym to original wikipedia](https://github.com/huseinzol05/Malaya/blob/master/pretrained-model/t5/generate-synonym.ipynb).
Preprocessing steps can reproduce from here, [Malaya/pretrained-model/preprocess](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/preprocess).
## Pretraining details
- This model was trained using Google T5's github [repository](https://github.com/google-research/text-to-text-transfer-transformer) on v3-8 TPU.
- All steps can reproduce from here, [Malaya/pretrained-model/t5](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/t5).
## Load Pretrained Model
You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
```python
from transformers import T5Tokenizer, T5Model
model = T5Model.from_pretrained('huseinzol05/t5-base-bahasa-cased')
tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased')
```
## Example using T5ForConditionalGeneration
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased')
model = T5ForConditionalGeneration.from_pretrained('huseinzol05/t5-base-bahasa-cased')
input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
Output is,
```
'Mahathir Mohamad'
```
## Results
For further details on the model performance, simply checkout accuracy page from Malaya, https://malaya.readthedocs.io/en/latest/Accuracy.html, we compared with traditional models.
## Acknowledgement
Thanks to [Im Big](https://www.facebook.com/imbigofficial/), [LigBlou](https://www.facebook.com/ligblou), [Mesolitica](https://mesolitica.com/) and [KeyReply](https://www.keyreply.com/) for sponsoring AWS, Google and GPU clouds to train T5 for Bahasa.

View File

@@ -0,0 +1,92 @@
---
language: spanish
thumbnail:
---
# RuPERTa-base (Spanish RoBERTa) + NER 🎃🏷
This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corpora) version of [RuPERTa-base](https://huggingface.co/mrm8488/RuPERTa-base) for **NER** downstream task.
## Details of the downstream task (NER) - Dataset
- [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 📚
| Dataset | # Examples |
| ---------------------- | ----- |
| Train | 329 K |
| Dev | 40 K |
- [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner.py)
- Labels covered:
```
B-LOC
B-MISC
B-ORG
B-PER
I-LOC
I-MISC
I-ORG
I-PER
O
```
## Metrics on evaluation set 🧾
| Metric | # score |
| :------------------------------------------------------------------------------------: | :-------: |
| F1 | **77.55**
| Precision | **75.53** |
| Recall | **79.68** |
## Model in action 🔨
Example of usage:
```python
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
id2label = {
"0": "B-LOC",
"1": "B-MISC",
"2": "B-ORG",
"3": "B-PER",
"4": "I-LOC",
"5": "I-MISC",
"6": "I-ORG",
"7": "I-PER",
"8": "O"
}
text ="Julien, CEO de HF, nació en Francia."
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
last_hidden_states = outputs[0]
for m in last_hidden_states:
for index, n in enumerate(m):
if(index > 0 and index <= len(text.split(" "))):
print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
'''
Output:
--------
Julien,: I-PER
CEO: O
de: O
HF,: B-ORG
nació: I-PER
en: I-PER
Francia.: I-LOC
'''
```
Yeah! Not too bad 🎉
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain

View File

@@ -0,0 +1,111 @@
---
language: spanish
thumbnail:
---
# RuPERTa-base (Spanish RoBERTa) + POS 🎃🏷
This model is a fine-tuned on [CONLL CORPORA](https://www.kaggle.com/nltkdata/conll-corpora) version of [RuPERTa-base](https://huggingface.co/mrm8488/RuPERTa-base) for **POS** downstream task.
## Details of the downstream task (POS) - Dataset
- [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 📚
| Dataset | # Examples |
| ---------------------- | ----- |
| Train | 445 K |
| Dev | 55 K |
- [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner.py)
- Labels covered:
```
ADJ
ADP
ADV
AUX
CCONJ
DET
INTJ
NOUN
NUM
PART
PRON
PROPN
PUNCT
SCONJ
SYM
VERB
```
## Metrics on evaluation set 🧾
| Metric | # score |
| :------------------------------------------------------------------------------------: | :-------: |
| F1 | **97.39**
| Precision | **97.47** |
| Recall | **9732** |
## Model in action 🔨
Example of usage
```python
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mrm8488/RuPERTa-base-finetuned-pos')
model = AutoModelForTokenClassification.from_pretrained('mrm8488/RuPERTa-base-finetuned-pos')
id2label = {
"0": "O",
"1": "ADJ",
"2": "ADP",
"3": "ADV",
"4": "AUX",
"5": "CCONJ",
"6": "DET",
"7": "INTJ",
"8": "NOUN",
"9": "NUM",
"10": "PART",
"11": "PRON",
"12": "PROPN",
"13": "PUNCT",
"14": "SCONJ",
"15": "SYM",
"16": "VERB"
}
text ="Mis amigos están pensando viajar a Londres este verano."
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
last_hidden_states = outputs[0]
for m in last_hidden_states:
for index, n in enumerate(m):
if(index > 0 and index <= len(text.split(" "))):
print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
'''
Output:
--------
Mis: NUM
amigos: PRON
están: AUX
pensando: ADV
viajar: VERB
a: ADP
Londres: PROPN
este: DET
verano..: NOUN
'''
```
Yeah! Not too bad 🎉
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain

View File

@@ -43,8 +43,8 @@ import torch
discriminator = ElectraForPreTraining.from_pretrained("mrm8488/electricidad-small-discriminator")
tokenizer = ElectraTokenizerFast.from_pretrained("mrm8488/electricidad-small-discriminator")
sentence = "El rápido zorro marrón salta sobre el perro perezoso"
fake_sentence = "El rápido zorro marrón falsea sobre el perro perezoso"
sentence = "el zorro rojo es muy rápido"
fake_sentence = "el zorro rojo es muy ser"
fake_tokens = tokenizer.tokenize(sentence)
fake_inputs = tokenizer.encode(sentence, return_tensors="pt")
@@ -53,9 +53,16 @@ predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
[print("%7s" % token, end="") for token in fake_tokens]
[print("%7s" % prediction, end="") for prediction in predictions.tolist()]
[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()[1:-1]]
# Output:
'''
el zorro rojo es muy ser 0 0 0 0 0 1[None, None, None, None, None, None]
'''
```
As you can see there is a **1** in the place where the model detected the fake token (**ser**). So, it works! 🎉
## Acknowledgments
I thank [🤗/transformers team](https://github.com/huggingface/transformers) for answering my doubts and Google for helping me with the [TensorFlow Research Cloud](https://www.tensorflow.org/tfrc) program.

View File

@@ -20,6 +20,9 @@ A few examples of the model response to a query before and after optimisation:
|I have watched 3 episodes |with this guy and he is such a talented actor...| but the show is just plain awful and there ne...| 2.681171| -4.512792|
|We know that firefighters and| police officers are forced to become populari...| other chains have going to get this disaster ...| 1.367811| -3.34017|
## Training logs and metrics <img src="https://gblobscdn.gitbook.com/spaces%2F-Lqya5RvLedGEWPhtkjU%2Favatar.png?alt=media" width="25" height="25">
Watch the whole training logs and metrics on [W&B](https://app.wandb.ai/mrm8488/gpt2-sentiment-negative?workspace=user-mrm8488)
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)

View File

@@ -0,0 +1,125 @@
# German Sentiment Classification with Bert
This model was trained for sentiment classification of German language texts. To achieve the best results all model inputs needs to be preprocessed with the same procedure, that was applied during the training. To simplify the usage of the model,
we provide a Python package that bundles the code need for the preprocessing and inferencing.
The model uses the Googles Bert architecture and was trained on 1.834 million German-language samples. The training data contains texts from various domains like Twitter, Facebook and movie, app and hotel reviews.
You can find more information about the dataset and the training process in the [paper](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.201.pdf).
## Using the Python package
To get started install the package from [pypi](https://pypi.org/project/germansentiment/):
```bash
pip install germansentiment
```
```python
from germansentiment import SentimentModel
model = SentimentModel()
texts = [
"Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
"Total awesome!","nicht so schlecht wie erwartet",
"Der Test verlief positiv.","Sie fährt ein grünes Auto."]
result = model.predict_sentiment(texts)
print(result)
```
The code above will output following list:
```python
["negative","negative","positive","positive","neutral", "neutral"]
```
## minimal working Sample
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from typing import List
import torch
import re
class SentimentModel():
def __init__(self, model_name: str):
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.clean_chars = re.compile(r'[^A-Za-züöäÖÜÄß ]', re.MULTILINE)
self.clean_http_urls = re.compile(r'https*\S+', re.MULTILINE)
self.clean_at_mentions = re.compile(r'@\S+', re.MULTILINE)
def predict_sentiment(self, texts: List[str])-> List[str]:
texts = [self.clean_text(text) for text in texts]
# Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
input_ids = self.tokenizer.batch_encode_plus(texts,pad_to_max_length=True, add_special_tokens=True)
input_ids = torch.tensor(input_ids["input_ids"])
with torch.no_grad():
logits = self.model(input_ids)
label_ids = torch.argmax(logits[0], axis=1)
labels = [self.model.config.id2label[label_id] for label_id in label_ids.tolist()]
return labels
def replace_numbers(self,text: str) -> str:
return text.replace("0"," null").replace("1"," eins").replace("2"," zwei").replace("3"," drei").replace("4"," vier").replace("5"," fünf").replace("6"," sechs").replace("7"," sieben").replace("8"," acht").replace("9"," neun")
def clean_text(self,text: str)-> str:
text = text.replace("\n", " ")
text = self.clean_http_urls.sub('',text)
text = self.clean_at_mentions.sub('',text)
text = self.replace_numbers(text)
text = self.clean_chars.sub('', text) # use only text chars
text = ' '.join(text.split()) # substitute multiple whitespace with single whitespace
text = text.strip().lower()
return text
texts = ["Mit keinem guten Ergebniss","Das war unfair", "Das ist gar nicht mal so gut",
"Total awesome!","nicht so schlecht wie erwartet", "Das ist gar nicht mal so schlecht",
"Der Test verlief positiv.","Sie fährt ein grünes Auto.", "Der Fall wurde an die Polzei übergeben."]
model = SentimentModel(model_name = "oliverguhr/german-sentiment-bert")
print(model.predict_sentiment(texts))
```
## Model and Data
If you are interested in code and data that was used to train this model please have a look at [this repository](https://github.com/oliverguhr/german-sentiment) and our [paper](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.201.pdf). Here is a table of the F1 scores that his model achieves on following datasets. Since we trained this model on a newer version of the transformer library, the results are slightly better than reported in the paper.
| Dataset | F1 micro Score |
| :----------------------------------------------------------- | -------------: |
| [holidaycheck](https://github.com/oliverguhr/german-sentiment) | 0.9568 |
| [scare](https://www.romanklinger.de/scare/) | 0.9418 |
| [filmstarts](https://github.com/oliverguhr/german-sentiment) | 0.9021 |
| [germeval](https://sites.google.com/view/germeval2017-absa/home) | 0.7536 |
| [PotTS](https://www.aclweb.org/anthology/L16-1181/) | 0.6780 |
| [emotions](https://github.com/oliverguhr/german-sentiment) | 0.9649 |
| [sb10k](https://www.spinningbytes.com/resources/germansentiment/) | 0.7376 |
| [Leipzig Wikipedia Corpus 2016](https://wortschatz.uni-leipzig.de/de/download/german) | 0.9967 |
| all | 0.9639 |
## Cite
For feedback and questions contact me view mail or Twitter [@oliverguhr](https://twitter.com/oliverguhr). Please cite us if you found this useful:
```
@InProceedings{guhr-EtAl:2020:LREC,
author = {Guhr, Oliver and Schumann, Anne-Kathrin and Bahrmann, Frank and Böhme, Hans Joachim},
title = {Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
month = {May},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {1620--1625},
url = {https://www.aclweb.org/anthology/2020.lrec-1.201}
}
```

View File

@@ -5,12 +5,12 @@ https://huggingface.co/savasy/bert-base-turkish-sentiment-cased
This model is used for Sentiment Analysis, which is based on BERTurk for Turkish Language https://huggingface.co/dbmdz/bert-base-turkish-cased
# Dataset
## Dataset
The dataset is taken from the studies [2] and [3] and merged.
The dataset is taken from the studies [[2]](#paper-2) and [[3]](#paper-3), and merged.
* The study [2] gathered movie and product reviews. The products are book, DVD, electronics, and kitchen.
The movie dataset is taken from a cinema Web page (www.beyazperde.com) with
The movie dataset is taken from a cinema Web page ([Beyazperde](www.beyazperde.com)) with
5331 positive and 5331 negative sentences. Reviews in the Web page are marked in
scale from 0 to 5 by the users who made the reviews. The study considered a review
sentiment positive if the rating is equal to or bigger than 4, and negative if it is less
@@ -19,9 +19,9 @@ Web page. They constructed benchmark dataset consisting of reviews regarding som
products (book, DVD, etc.). Likewise, reviews are marked in the range from 1 to 5,
and majority class of reviews are 5. Each category has 700 positive and 700 negative
reviews in which average rating of negative reviews is 2.27 and of positive reviews
is 4.5. This dataset is also used the study [1]
is 4.5. This dataset is also used by the study [[1]](#paper-1).
* The study[3] collected tweet dataset. They proposed a new approach for automatically classifying the sentiment of microblog messages. The proposed approach is based on utilizing robust feature representation and fusion.
* The study [[3]](#paper-3) collected tweet dataset. They proposed a new approach for automatically classifying the sentiment of microblog messages. The proposed approach is based on utilizing robust feature representation and fusion.
*Merged Dataset*
@@ -32,20 +32,21 @@ is 4.5. This dataset is also used the study [1]
| 32000 |train.tsv|
| *48290* |*total*|
### The dataset is used by following papers
The dataset is used by following papers
* 1 Yildirim, Savaş. (2020). Comparing Deep Neural Networks to Traditional Models for Sentiment Analysis in Turkish Language. 10.1007/978-981-15-1216-2_12.
* 2 Demirtas, Erkin and Mykola Pechenizkiy. 2013. Cross-lingual polarity detection with machine translation. In Proceedings of the Second International Workshop on Issues of Sentiment
<a id="paper-1">[1]</a> Yildirim, Savaş. (2020). Comparing Deep Neural Networks to Traditional Models for Sentiment Analysis in Turkish Language. 10.1007/978-981-15-1216-2_12.
<a id="paper-2">[2]</a> Demirtas, Erkin and Mykola Pechenizkiy. 2013. Cross-lingual polarity detection with machine translation. In Proceedings of the Second International Workshop on Issues of Sentiment
Discovery and Opinion Mining (WISDOM 13)
* [3] Hayran, A., Sert, M. (2017), "Sentiment Analysis on Microblog Data based on Word Embedding and Fusion Techniques", IEEE 25th Signal Processing and Communications Applications Conference (SIU 2017), Belek, Turkey
# Training
<a id="paper-3">[3]</a> Hayran, A., Sert, M. (2017), "Sentiment Analysis on Microblog Data based on Word Embedding and Fusion Techniques", IEEE 25th Signal Processing and Communications Applications Conference (SIU 2017), Belek, Turkey
```
## Training
```shell
export GLUE_DIR="./sst-2-newall"
export TASK_NAME=SST-2
python3 run_glue.py \
--model_type bert \
@@ -59,88 +60,79 @@ python3 run_glue.py \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir "./model"
```
## Results
> 05/10/2020 17:00:43 - INFO - transformers.trainer - \*\*\*\*\* Running Evaluation \*\*\*\*\*
> 05/10/2020 17:00:43 - INFO - transformers.trainer - Num examples = 7999
> 05/10/2020 17:00:43 - INFO - transformers.trainer - Batch size = 8
> Evaluation: 100% 1000/1000 [00:34<00:00, 29.04it/s]
> 05/10/2020 17:01:17 - INFO - \_\_main__ - \*\*\*\*\* Eval results sst-2 \*\*\*\*\*
> 05/10/2020 17:01:17 - INFO - \_\_main__ - acc = 0.9539942492811602
> 05/10/2020 17:01:17 - INFO - \_\_main__ - loss = 0.16348013816401363
Accuracy is about **95.4%**
# Results
## Code Usage
> 05/10/2020 17:00:43 - INFO - transformers.trainer - ***** Running Evaluation *****
> 05/10/2020 17:00:43 - INFO - transformers.trainer - Num examples = 7999
> 05/10/2020 17:00:43 - INFO - transformers.trainer - Batch size = 8
>Evaluation: 100% 1000/1000 [00:34<00:00, 29.04it/s]
>05/10/2020 17:01:17 - INFO - __main__ - ***** Eval results sst-2 *****
>05/10/2020 17:01:17 - INFO - __main__ - acc = 0.9539942492811602
>05/10/2020 17:01:17 - INFO - __main__ - loss = 0.16348013816401363
Accuracy is about *%95.4*
# Code Usage
```
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sa= pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
p= sa("bu telefon modelleri çok kaliteli , her parçası çok özel bence")
p = sa("bu telefon modelleri çok kaliteli , her parçası çok özel bence")
print(p)
#[{'label': 'LABEL_1', 'score': 0.9871089}]
print (p[0]['label']=='LABEL_1')
#True
# [{'label': 'LABEL_1', 'score': 0.9871089}]
print(p[0]['label'] == 'LABEL_1')
# True
p= sa("Film çok kötü ve çok sahteydi")
p = sa("Film çok kötü ve çok sahteydi")
print(p)
#[{'label': 'LABEL_0', 'score': 0.9975505}]
print (p[0]['label']=='LABEL_1')
#False
# [{'label': 'LABEL_0', 'score': 0.9975505}]
print(p[0]['label'] == 'LABEL_1')
# False
```
# Test your data
## Test
### Data
Suppose your file has lots of lines of comment and label (1 or 0) at the end (tab seperated)
> comment1 ... \t label
> comment2 ... \t label
> comment1 ... \t label
> comment2 ... \t label
> ...
### Code
```
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
f="/path/to/your/file/yourfile.tsv"
model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sa= pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
sa = pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
i,crr=0,0
for line in open(f):
lines=line.strip().split("\t")
if len(lines)==2:
i=i+1
if i%100==0:
print(i)
pred= sa(lines[0])
pred=pred[0]["label"].split("_")[1]
if pred== lines[1]:
crr=crr+1
input_file = "/path/to/your/file/yourfile.tsv"
i, crr = 0, 0
for line in open(input_file):
lines = line.strip().split("\t")
if len(lines) == 2:
i = i + 1
if i%100 == 0:
print(i)
pred = sa(lines[0])
pred = pred[0]["label"].split("_")[1]
if pred == lines[1]:
crr = crr + 1
print(crr, i, crr/i)
```

View File

@@ -0,0 +1,67 @@
---
language: turkish
---
# Turkish SQuAD Model : Question Answering
I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD
* BERT-base: https://huggingface.co/dbmdz/bert-base-turkish-uncased
* TQuAD dataset: https://github.com/TQuad/turkish-nlp-qa-dataset
# Training Code
```
!python3 run_squad.py \
--model_type bert \
--model_name_or_path dbmdz/bert-base-turkish-uncased\
--do_train \
--do_eval \
--train_file trainQ.json \
--predict_file dev1.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 5.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir "./model"
```
# Example Usage
> Load Model
```
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained("./model")
model = AutoModelForQuestionAnswering.from_pretrained("./model")
nlp=pipeline("question-answering", model=model, tokenizer=tokenizer)
```
> Apply the model
```
sait="ABASIYANIK, Sait Faik. Hikayeci (Adapazarı 23 Kasım 1906-İstanbul 11 Mayıs 1954). \
İlk öğrenimine Adapazarında Rehber-i Terakki Mektebinde başladı. İki yıl kadar Adapazarı İdadisinde okudu.\
İstanbul Erkek Lisesinde devam ettiği orta öğrenimini Bursa Lisesinde tamamladı (1928). İstanbul Edebiyat \
Fakültesine iki yıl devam ettikten sonra babasının isteği üzerine iktisat öğrenimi için İsviçreye gitti. \
Kısa süre sonra iktisat öğrenimini bırakarak Lozandan Grenoblea geçti. Üç yıl başıboş bir edebiyat öğrenimi \
gördükten sonra babası tarafından geri çağrıldı (1933). Bir müddet Halıcıoğlu Ermeni Yetim Mektebi'nde Türkçe \
gurup dersleri öğretmenliği yaptı. Ticarete atıldıysa da tutunamadı. Bir ay Haber gazetesinde adliye muhabirliği\
yaptı (1942). Babasının ölümü üzerine aileden kalan emlakin geliri ile avare bir hayata başladı. Evlenemedi.\
Yazları Burgaz adasındaki köşklerinde, kışları Şişlideki apartmanlarında annesi ile beraber geçen bu fazla \
içkili bohem hayatı ömrünün sonuna kadar sürdü."
print(nlp(question="Ne zaman avare bir hayata başladı?", context=sait))
print(nlp(question="Sait Faik hangi Lisede orta öğrenimini tamamladı?", context=sait))
```
```
# Ask your self ! type your question
print(nlp(question="...?", context=sait))
```
Check My other Model
https://huggingface.co/savasy

View File

@@ -0,0 +1,51 @@
---
tags:
- exbert
license: apache-2.0
---
# ouBioBERT-Base, Uncased
Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT) is a language model based on the BERT-Base (Devlin, et al., 2019) architecture. We pre-trained ouBioBERT on PubMed abstracts from the PubMed baseline (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline) via our method.
The details of the pre-training procedure can be found in Wada, et al. (2020).
## Evaluation
We evaluated the performance of ouBioBERT in terms of the biomedical language understanding evaluation (BLUE) benchmark (Peng, et al., 2019). The numbers are mean (standard deviation) on five different random seeds.
| Dataset | Task Type | Score |
|:----------------|:-----------------------------|-------------:|
| MedSTS | Sentence similarity | 84.9 (0.6) |
| BIOSSES | Sentence similarity | 92.3 (0.8) |
| BC5CDR-disease | Named-entity recognition | 87.4 (0.1) |
| BC5CDR-chemical | Named-entity recognition | 93.7 (0.2) |
| ShARe/CLEFE | Named-entity recognition | 80.1 (0.4) |
| DDI | Relation extraction | 81.1 (1.5) |
| ChemProt | Relation extraction | 75.0 (0.3) |
| i2b2 2010 | Relation extraction | 74.0 (0.8) |
| HoC | Document classification | 86.4 (0.5) |
| MedNLI | Inference | 83.6 (0.7) |
| **Total** | Macro average of the scores |**83.8 (0.3)**|
## Code for Fine-tuning
We made the source code for fine-tuning freely available at [our repository](https://github.com/sy-wada/blue_benchmark_with_transformers).
## Citation
If you use our work in your research, please kindly cite the following paper:
```bibtex
@misc{2005.07202,
Author = {Shoya Wada and Toshihiro Takeda and Shiro Manabe and Shozo Konishi and Jun Kamohara and Yasushi Matsumura},
Title = {A pre-training technique to localize medical BERT and enhance BioBERT},
Year = {2020},
Eprint = {arXiv:2005.07202},
}
```
<a href="https://huggingface.co/exbert/?model=seiya/oubiobert-base-uncased&sentence=Coronavirus%20disease%20(COVID-19)%20is%20caused%20by%20SARS-COV2%20and%20represents%20the%20causative%20agent%20of%20a%20potentially%20fatal%20disease%20that%20is%20of%20great%20global%20public%20health%20concern.">
<img width="300px" src="https://hf-dinosaur.huggingface.co/exbert/button.png">
</a>

View File

@@ -0,0 +1,38 @@
# T5 for question-answering
This is T5-base model fine-tuned on SQuAD1.1 for QA using text-to-text approach
## Model training
This model was trained on colab TPU with 35GB RAM for 4 epochs
## Results:
| Metric | #Value |
|-------------|---------|
| Exact Match | 81.5610 |
| F1 | 89.9601 |
## Model in Action 🚀
```
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("valhalla/t5-base-squad")
model = AutoModelWithLMHead.from_pretrained("valhalla/t5-base-squad")
def get_answer(question, context):
input_text = "question: %s context: %s </s>" % (question, context)
features = tokenizer.batch_encode_plus([input_text], return_tensors='pt')
out = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(out[0])
context = "In Norse mythology, Valhalla is a majestic, enormous hall located in Asgard, ruled over by the god Odin."
question = "What is Valhalla ?"
get_answer(question, context)
# output: 'a majestic, enormous hall located in Asgard, ruled over by the god Odin'
```
Play with this model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1a5xpJiUjZybfU9Mi-aDkOp116PZ9-wni?usp=sharing)
> Created by Suraj Patil [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/patil-suraj/)
[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/psuraj28)

View File

@@ -3,7 +3,6 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"pycharm": {
"is_executing": false,
"name": "#%% md\n"
@@ -77,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false,
@@ -85,77 +84,7 @@
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: transformers in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (2.5.1)\n",
"Requirement already satisfied: filelock in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (3.0.12)\n",
"Requirement already satisfied: sentencepiece in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (0.1.83)\n",
"Requirement already satisfied: boto3 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (1.12.0)\n",
"Requirement already satisfied: requests in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (2.22.0)\n",
"Requirement already satisfied: numpy in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (1.18.1)\n",
"Requirement already satisfied: sacremoses in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (0.0.35)\n",
"Requirement already satisfied: tokenizers==0.5.2 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (0.5.2)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (2020.1.8)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from transformers) (4.42.1)\n",
"Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from boto3->transformers) (0.3.3)\n",
"Requirement already satisfied: botocore<1.16.0,>=1.15.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from boto3->transformers) (1.15.0)\n",
"Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from boto3->transformers) (0.9.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests->transformers) (2019.11.28)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests->transformers) (2.8)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests->transformers) (1.25.8)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests->transformers) (3.0.4)\n",
"Requirement already satisfied: joblib in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from sacremoses->transformers) (0.14.0)\n",
"Requirement already satisfied: click in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from sacremoses->transformers) (7.0)\n",
"Requirement already satisfied: six in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from sacremoses->transformers) (1.14.0)\n",
"Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from botocore<1.16.0,>=1.15.0->boto3->transformers) (0.15.2)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from botocore<1.16.0,>=1.15.0->boto3->transformers) (2.8.1)\n",
"Requirement already satisfied: tensorflow==2.1.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (2.1.0)\n",
"Requirement already satisfied: termcolor>=1.1.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.1.0)\n",
"Requirement already satisfied: keras-preprocessing>=1.1.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.1.0)\n",
"Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (3.1.0)\n",
"Requirement already satisfied: protobuf>=3.8.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (3.11.4)\n",
"Requirement already satisfied: numpy<2.0,>=1.16.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.18.1)\n",
"Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (2.1.0)\n",
"Requirement already satisfied: keras-applications>=1.0.8 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.0.8)\n",
"Requirement already satisfied: wrapt>=1.11.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.11.2)\n",
"Requirement already satisfied: six>=1.12.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.14.0)\n",
"Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (2.1.0)\n",
"Requirement already satisfied: scipy==1.4.1; python_version >= \"3\" in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.4.1)\n",
"Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (0.1.8)\n",
"Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (0.34.2)\n",
"Requirement already satisfied: grpcio>=1.8.6 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (1.16.1)\n",
"Requirement already satisfied: absl-py>=0.7.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (0.9.0)\n",
"Requirement already satisfied: gast==0.2.2 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (0.2.2)\n",
"Requirement already satisfied: astor>=0.6.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorflow==2.1.0) (0.8.0)\n",
"Requirement already satisfied: setuptools in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from protobuf>=3.8.0->tensorflow==2.1.0) (45.2.0.post20200210)\n",
"Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (1.11.2)\n",
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (0.4.1)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (3.1.1)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (1.0.0)\n",
"Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (2.22.0)\n",
"Requirement already satisfied: h5py in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from keras-applications>=1.0.8->tensorflow==2.1.0) (2.10.0)\n",
"Requirement already satisfied: rsa<4.1,>=3.1.4 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (4.0)\n",
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (4.0.0)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (0.2.8)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (1.3.0)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (2.8)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (2019.11.28)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (3.0.4)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (1.25.8)\r\n",
"Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (0.4.8)\r\n",
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/Caskroom/miniconda/base/envs/huggingface/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0) (3.1.0)\r\n"
]
}
],
"outputs": [],
"source": [
"!pip install transformers\n",
"!pip install tensorflow==2.1.0"
@@ -174,7 +103,7 @@
{
"data": {
"text/plain": [
"<torch.autograd.grad_mode.set_grad_enabled at 0x102c0ce10>"
"<torch.autograd.grad_mode.set_grad_enabled at 0x7f10b441e890>"
]
},
"execution_count": 2,
@@ -441,7 +370,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {
"pycharm": {
"is_executing": false
@@ -458,13 +387,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"output differences: 1.6236e-05\n",
"pooled differences: -1.3039e-08\n"
]
}
],
"source": [
"# transformers generates a ready to use dictionary with all the required parameters for the specific framework.\n",
"input_tf = tokenizer.encode_plus(\"This is a sample input\", return_tensors=\"tf\")\n",
@@ -476,7 +414,7 @@
"# Models outputs 2 values (The value for each tokens, the pooled representation of the input sentence)\n",
"# Here we compare the output differences between PyTorch and TensorFlow.\n",
"for name, o_tf, o_pt in zip([\"output\", \"pooled\"], output_tf, output_pt):\n",
" print(\"{} differences: {}\".format(name, (o_tf.numpy() - o_pt.numpy()).sum()))"
" print(\"{} differences: {:.5}\".format(name, (o_tf.numpy() - o_pt.numpy()).sum()))"
]
},
{
@@ -504,13 +442,24 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 232 ms, sys: 0 ns, total: 232 ms\n",
"Wall time: 21.1 ms\n",
"CPU times: user 511 ms, sys: 0 ns, total: 511 ms\n",
"Wall time: 43.9 ms\n"
]
}
],
"source": [
"from transformers import DistilBertModel\n",
"\n",
@@ -541,13 +490,25 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tokens (int) : [102, 12272, 9355, 5746, 30881, 215, 261, 5945, 4118, 212, 2414, 153, 1942, 232, 3532, 566, 103]\n",
"Tokens (str) : ['[CLS]', 'Hug', '##ging', 'Fac', '##e', 'ist', 'eine', 'französische', 'Firma', 'mit', 'Sitz', 'in', 'New', '-', 'York', '.', '[SEP]']\n",
"Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n",
"\n",
"Token wise output: torch.Size([1, 7, 768]), Pooled output: torch.Size([1, 768])\n"
]
}
],
"source": [
"# Let's load German BERT from the Bavarian State Library\n",
"de_bert = BertModel.from_pretrained(\"dbmdz/bert-base-german-cased\")\n",
@@ -557,7 +518,14 @@
" \"Hugging Face ist eine französische Firma mit Sitz in New-York.\",\n",
" return_tensors=\"pt\"\n",
")\n",
"output_de, pooled_de = de_bert(**de_input)"
"print(\"Tokens (int) : {}\".format(de_input['input_ids'].tolist()[0]))\n",
"print(\"Tokens (str) : {}\".format([de_tokenizer.convert_ids_to_tokens(s) for s in de_input['input_ids'].tolist()[0]]))\n",
"print(\"Tokens (attn_mask): {}\".format(de_input['attention_mask'].tolist()[0]))\n",
"print()\n",
"\n",
"output_de, pooled_de = de_bert(**de_input)\n",
"\n",
"print(\"Token wise output: {}, Pooled output: {}\".format(outputs.shape, pooled.shape))"
]
}
],
@@ -577,7 +545,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
"version": "3.7.4"
},
"pycharm": {
"stem_cell": {
@@ -590,5 +558,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

View File

@@ -4,15 +4,25 @@ You can find here a list of the official notebooks provided by Hugging Face.
Also, we would like to list here interesting content created by the community.
If you wrote some notebook(s) leveraging transformers and would like be listed here, please open a
Pull Request and we'll review it so it can be included here.
Pull Request so it can be included under the Community notebooks.
## Hugging Face's notebooks :hugs:
| Notebook | Description | |
|:----------|:-------------:|------:|
|:----------|:-------------|------:|
| [Getting Started Tokenizers](https://github.com/huggingface/transformers/blob/master/notebooks/01-training-tokenizers.ipynb) | How to train and use your very own tokenizer |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/master/notebooks/01-training-tokenizers.ipynb) |
| [Getting Started Transformers](https://github.com/huggingface/transformers/blob/master/notebooks/02-transformers.ipynb) | How to easily start using transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/master/notebooks/02-transformers.ipynb) |
| [How to use Pipelines](https://github.com/huggingface/transformers/blob/master/notebooks/03-pipelines.ipynb) | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/master/notebooks/03-pipelines.ipynb) |
| [How to train a language model](https://github.com/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)| Highlight all the steps to effectively train Transformer model on custom data | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)|
| [How to generate text](https://github.com/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)| How to use different decoding methods for language generation with transformers | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)|
| [How to export model to ONNX](https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb) | Highlight how to export and run inference workloads through ONNX |
## Community notebooks:
| Notebook | Description | Author | |
|:----------|:-------------|:-------------|------:|
| [Train T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) | How to train T5 on SQUAD with Transformers and Nlp | [Suraj Patil](https://github.com/patil-suraj) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb#scrollTo=QLGiFCDqvuil) |
| [Fine-tune T5 for Classification and Multiple Choice](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | How to fine-tune T5 for classification and multiple choice tasks using a text-to-text format with PyTorch Lightning | [Suraj Patil](https://github.com/patil-suraj) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) |
| [Fine-tune DialoGPT on New Datasets and Languages](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | How to fine-tune the DialoGPT model on a new dataset for open-dialog conversational chatbots | [Nathan Cooper](https://github.com/ncoop57) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb)

View File

@@ -36,5 +36,5 @@ multi_line_output = 3
use_parentheses = True
[flake8]
ignore = E203, E501, W503
ignore = E203, E501, E741, W503
max-line-length = 119

View File

@@ -67,8 +67,18 @@ extras = {}
extras["mecab"] = ["mecab-python3"]
extras["sklearn"] = ["scikit-learn"]
extras["tf"] = ["tensorflow"]
extras["tf-cpu"] = ["tensorflow-cpu"]
# keras2onnx and onnxconverter-common version is specific through a commit until 1.7.0 lands on pypi
extras["tf"] = [
"tensorflow",
"onnxconverter-common",
"keras2onnx"
]
extras["tf-cpu"] = [
"tensorflow-cpu",
"onnxconverter-common",
"keras2onnx"
]
extras["torch"] = ["torch"]
extras["serving"] = ["pydantic", "uvicorn", "fastapi", "starlette"]
@@ -79,14 +89,14 @@ extras["docs"] = ["recommonmark", "sphinx", "sphinx-markdown-tables", "sphinx-rt
extras["quality"] = [
"black",
"isort",
"flake8==3.7.9",
"flake8",
]
extras["dev"] = extras["testing"] + extras["quality"] + ["mecab-python3", "scikit-learn", "tensorflow", "torch"]
setup(
name="transformers",
version="2.9.1",
author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors",
version="2.10.0",
author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors",
author_email="thomas@huggingface.co",
description="State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch",
long_description=open("README.md", "r", encoding="utf-8").read(),

View File

@@ -2,7 +2,7 @@
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
__version__ = "2.9.1"
__version__ = "2.10.0"
# Work around to update TensorFlow's absl.logging threshold which alters the
# default Python logging output behavior when present.
@@ -44,6 +44,7 @@ from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, Electr
from .configuration_encoder_decoder import EncoderDecoderConfig
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
from .configuration_marian import MarianConfig
from .configuration_mmbt import MMBTConfig
from .configuration_openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig
@@ -138,6 +139,7 @@ from .tokenization_distilbert import DistilBertTokenizer, DistilBertTokenizerFas
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
from .tokenization_flaubert import FlaubertTokenizer
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
from .tokenization_longformer import LongformerTokenizer
from .tokenization_openai import OpenAIGPTTokenizer, OpenAIGPTTokenizerFast
from .tokenization_reformer import ReformerTokenizer
from .tokenization_roberta import RobertaTokenizer, RobertaTokenizerFast
@@ -319,6 +321,7 @@ if is_torch_available():
ElectraForMaskedLM,
ElectraForTokenClassification,
ElectraPreTrainedModel,
ElectraForSequenceClassification,
ElectraModel,
load_tf_weights_in_electra,
ELECTRA_PRETRAINED_MODEL_ARCHIVE_MAP,
@@ -332,6 +335,8 @@ if is_torch_available():
REFORMER_PRETRAINED_MODEL_ARCHIVE_MAP,
)
from .modeling_longformer import LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP, LongformerModel, LongformerForMaskedLM
# Optimization
from .optimization import (
AdamW,

View File

@@ -28,6 +28,7 @@ from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, Electr
from .configuration_encoder_decoder import EncoderDecoderConfig
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
from .configuration_marian import MarianConfig
from .configuration_openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig
from .configuration_reformer import ReformerConfig
@@ -62,6 +63,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP,
FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP,
LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
]
for key, value, in pretrained_map.items()
)
@@ -77,6 +79,7 @@ CONFIG_MAPPING = OrderedDict(
("marian", MarianConfig,),
("bart", BartConfig,),
("reformer", ReformerConfig,),
("longformer", LongformerConfig,),
("roberta", RobertaConfig,),
("flaubert", FlaubertConfig,),
("bert", BertConfig,),
@@ -133,6 +136,7 @@ class AutoConfig:
- contains `albert`: :class:`~transformers.AlbertConfig` (ALBERT model)
- contains `camembert`: :class:`~transformers.CamembertConfig` (CamemBERT model)
- contains `xlm-roberta`: :class:`~transformers.XLMRobertaConfig` (XLM-RoBERTa model)
- contains `longformer`: :class:`~transformers.LongformerConfig` (Longformer model)
- contains `roberta`: :class:`~transformers.RobertaConfig` (RoBERTa model)
- contains `reformer`: :class:`~transformers.ReformerConfig` (Reformer model)
- contains `bert`: :class:`~transformers.BertConfig` (Bert model)
@@ -145,7 +149,6 @@ class AutoConfig:
- contains `flaubert` : :class:`~transformers.FlaubertConfig` (Flaubert model)
- contains `electra` : :class:`~transformers.ElectraConfig` (ELECTRA model)
Args:
pretrained_model_name_or_path (:obj:`string`):
Is either: \

View File

@@ -0,0 +1,69 @@
# coding=utf-8
# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Longformer configuration """
import logging
from typing import List, Union
from .configuration_roberta import RobertaConfig
logger = logging.getLogger(__name__)
LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"longformer-base-4096": "https://s3.amazonaws.com/models.huggingface.co/bert/allenai/longformer-base-4096/config.json",
"longformer-large-4096": "https://s3.amazonaws.com/models.huggingface.co/bert/allenai/longformer-large-4096/config.json",
}
class LongformerConfig(RobertaConfig):
r"""
This is the configuration class to store the configuration of an :class:`~transformers.LongformerModel`.
It is used to instantiate an Longformer model according to the specified arguments, defining the model
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
the RoBERTa `roberta-base <https://huggingface.co/roberta-base>`__ architecture with a sequence length 4,096.
The :class:`~transformers.LongformerConfig` class directly inherits :class:`~transformers.RobertaConfig`.
It reuses the same defaults. Please check the parent class for more information.
Args:
attention_window (:obj:`int` or :obj:`List[int]`, optional, defaults to 512):
Size of an attention window around each token. If :obj:`int`, use the same size for all layers.
To specify a different window size for each layer, use a :obj:`List[int]` where
``len(attention_window) == num_hidden_layers``.
Example::
from transformers import LongformerConfig, LongformerModel
# Initializing a Longformer configuration
configuration = LongformerConfig()
# Initializing a model from the configuration
model = LongformerModel(configuration)
# Accessing the model configuration
configuration = model.config
Attributes:
pretrained_config_archive_map (Dict[str, str]):
A dictionary containing all the available pre-trained checkpoints.
"""
pretrained_config_archive_map = LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP
model_type = "longformer"
def __init__(self, attention_window: Union[List[int], int] = 512, **kwargs):
super().__init__(**kwargs)
self.attention_window = attention_window

View File

@@ -68,6 +68,6 @@ class RobertaConfig(BertConfig):
model_type = "roberta"
def __init__(self, pad_token_id=1, bos_token_id=0, eos_token_id=2, **kwargs):
"""Constructs FlaubertConfig.
"""Constructs RobertaConfig.
"""
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

View File

@@ -39,10 +39,10 @@ class T5Config(PretrainedConfig):
Arguments:
vocab_size_or_config_json_file: Vocabulary size of `inputs_ids` in `T5Model`.
hidden_size: Size of the encoder layers and the pooler layer.
num_hidden_layers: Number of hidden layers in the Transformer encoder.
num_attention_heads: Number of attention heads for each attention layer in
the Transformer encoder.
d_model: Size of the encoder layers and the pooler layer. `d_model` can also accesed via the property `hidden_size`.
num_layers: Number of hidden layers in the Transformer encoder. `num_layers` can also be accessed via the property `num_hidden_layers`.
num_heads: Number of attention heads for each attention layer in
the Transformer encoder. `num_heads` can also be accessed via the property `num_attention_heads`.
intermediate_size: The size of the "intermediate" (i.e., feed-forward)
layer in the Transformer encoder.
hidden_act: The non-linear activation function (function or string) in the
@@ -51,9 +51,9 @@ class T5Config(PretrainedConfig):
layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob: The dropout ratio for the attention
probabilities.
max_position_embeddings: The maximum sequence length that this model might
n_positions: The maximum sequence length that this model might
ever be used with. Typically set this to something large just in case
(e.g., 512 or 1024 or 2048).
(e.g., 512 or 1024 or 2048). `n_positions` can also be accessed via the property `max_position_embeddings'.
type_vocab_size: The vocabulary size of the `token_type_ids` passed into
`T5Model`.
initializer_factor: A factor for initializing all weight matrices (should be kept to 1.0, used for initialization testing).

View File

@@ -0,0 +1,220 @@
from argparse import ArgumentParser
from itertools import takewhile
from os import listdir, makedirs
from os.path import abspath, dirname, exists
from typing import Dict, List, Optional, Tuple
from transformers import is_tf_available, is_torch_available
from transformers.pipelines import Pipeline, pipeline
from transformers.tokenization_utils import BatchEncoding
class OnnxConverterArgumentParser(ArgumentParser):
"""
Wraps all the script arguments supported to export transformers models to ONNX IR
"""
def __init__(self):
super(OnnxConverterArgumentParser, self).__init__("ONNX Converter")
self.add_argument("--model", type=str, required=True, help="Model's id or path (ex: bert-base-cased)")
self.add_argument("--tokenizer", type=str, help="Tokenizer's id or path (ex: bert-base-cased)")
self.add_argument("--framework", type=str, choices=["pt", "tf"], help="Framework for loading the model")
self.add_argument("--opset", type=int, default=11, help="ONNX opset to use")
self.add_argument("--check-loading", action="store_true", help="Check ONNX is able to load the model")
self.add_argument("--use-external-format", action="store_true", help="Allow exporting model >= than 2Gb")
self.add_argument("output")
def ensure_valid_input(model, tokens, input_names):
"""
Ensure input are presented in the correct order, without any None
Args:
model: The model used to forward the input data
tokens: BatchEncoding holding the input data
input_names: The name of the inputs
Returns: Tuple
"""
model_args_name = model.forward.__code__.co_varnames
model_args_pos = [(model_args_name.index(name) - 1, name) for name in input_names]
model_args = [None] * (max(map(lambda x: x[0], model_args_pos)) + 1)
for arg_pos, arg_name in model_args_pos:
model_args[arg_pos] = tokens[arg_name]
model_args = tuple(model_args) # Need to be ordered
return tuple(takewhile(lambda arg: arg is not None, model_args))
def infer_shapes(nlp: Pipeline, framework: str) -> Tuple[List[str], List[str], Dict, BatchEncoding]:
def build_shape_dict(tensor, is_input: bool, seq_len: int):
if isinstance(tensor, (tuple, list)):
return [build_shape_dict(t, is_input, seq_len) for t in tensor]
else:
# Let's assume batch is the first axis with only 1 element (~~ might not be always true ...)
axes = {[axis for axis, numel in enumerate(tensor.shape) if numel == 1][0]: "batch"}
if is_input:
if len(tensor.shape) == 2:
axes[1] = "sequence"
else:
raise ValueError("Unable to infer tensor axes ({})".format(len(tensor.shape)))
else:
seq_axes = [dim for dim, shape in enumerate(tensor.shape) if shape == seq_len]
axes.update({dim: "sequence" for dim in seq_axes})
return axes
tokens = nlp.tokenizer.encode_plus("This is a sample output", return_tensors=framework)
seq_len = tokens.input_ids.shape[-1]
outputs = nlp.model(**tokens) if framework == "pt" else nlp.model(tokens)
if not isinstance(outputs, (list, tuple)):
outputs = (outputs,)
# Generate input names & axes
input_vars = list(tokens.keys())
input_dynamic_axes = {k: build_shape_dict(v, True, seq_len) for k, v in tokens.items()}
# flatten potentially grouped outputs (past for gpt2, attentions)
outputs_flat = []
for output in outputs:
if isinstance(output, (tuple, list)):
outputs_flat.extend(output)
else:
outputs_flat.append(output)
# Generate output names & axes
output_names = ["output_{}".format(i) for i in range(len(outputs_flat))]
output_dynamic_axes = {k: build_shape_dict(v, False, seq_len) for k, v in zip(output_names, outputs_flat)}
# Create the aggregated axes representation
dynamic_axes = dict(input_dynamic_axes, **output_dynamic_axes)
return input_vars, output_names, dynamic_axes, tokens
def load_graph_from_args(framework: str, model: str, tokenizer: Optional[str] = None) -> Pipeline:
# If no tokenizer provided
if tokenizer is None:
tokenizer = model
print("Loading pipeline (model: {}, tokenizer: {})".format(model, tokenizer))
# Allocate tokenizer and model
return pipeline("feature-extraction", model=model, tokenizer=tokenizer, framework=framework)
def convert_pytorch(nlp: Pipeline, opset: int, output: str, use_external_format: bool):
if not is_torch_available():
raise Exception("Cannot convert because PyTorch is not installed. Please install torch first.")
import torch
from torch.onnx import export
print("PyTorch: {}".format(torch.__version__))
with torch.no_grad():
input_names, output_names, dynamic_axes, tokens = infer_shapes(nlp, "pt")
model_args = ensure_valid_input(nlp.model, tokens, input_names)
export(
nlp.model,
model_args,
f=output,
input_names=input_names,
output_names=output_names,
dynamic_axes=dynamic_axes,
do_constant_folding=True,
use_external_data_format=use_external_format,
enable_onnx_checker=True,
opset_version=opset,
)
def convert_tensorflow(nlp: Pipeline, opset: int, output: str):
if not is_tf_available():
raise Exception(
"Cannot convert {} because TF is not installed. Please install torch first.".format(args.model)
)
print("/!\\ Please note TensorFlow doesn't support exporting model > 2Gb /!\\")
try:
import tensorflow as tf
from keras2onnx import convert_keras, save_model, __version__ as k2ov
print("TensorFlow: {}, keras2onnx: {}".format(tf.version.VERSION, k2ov))
# Build
input_names, output_names, dynamic_axes, tokens = infer_shapes(nlp, "tf")
# Forward
nlp.model.predict(tokens.data)
onnx_model = convert_keras(nlp.model, nlp.model.name, target_opset=opset)
save_model(onnx_model, output)
except ImportError as e:
raise Exception(
"Cannot import {} required to convert TF model to ONNX. Please install {} first.".format(e.name, e.name)
)
def convert(
framework: str,
model: str,
output: str,
opset: int,
tokenizer: Optional[str] = None,
use_external_format: bool = False,
):
print("ONNX opset version set to: {}".format(opset))
# Load the pipeline
nlp = load_graph_from_args(framework, model, tokenizer)
parent = dirname(output)
if not exists(parent):
print("Creating folder {}".format(parent))
makedirs(parent)
elif len(listdir(parent)) > 0:
raise Exception("Folder {} is not empty, aborting conversion".format(parent))
# Export the graph
if framework == "pt":
convert_pytorch(nlp, opset, output, use_external_format)
else:
convert_tensorflow(nlp, opset, output)
def verify(path: str):
from onnxruntime import InferenceSession, SessionOptions
from onnxruntime.capi.onnxruntime_pybind11_state import RuntimeException
print("Checking ONNX model loading from: {}".format(path))
try:
onnx_options = SessionOptions()
_ = InferenceSession(path, onnx_options, providers=["CPUExecutionProvider"])
print("Model correctly loaded")
except RuntimeException as re:
print("Error while loading the model: {}".format(re))
if __name__ == "__main__":
parser = OnnxConverterArgumentParser()
args = parser.parse_args()
# Make sure output is absolute path
args.output = abspath(args.output)
try:
# Convert
convert(args.framework, args.model, args.output, args.opset, args.tokenizer, args.use_external_format)
# And verify
if args.check_loading:
verify(args.output)
except Exception as e:
print("Error while converting the model: {}".format(e))
exit(1)

View File

@@ -226,7 +226,7 @@ def lmap(f, x) -> List:
def fetch_test_set(test_set_url):
import wget
fname = wget.download(test_set_url, f"opus_test.txt")
fname = wget.download(test_set_url, "opus_test.txt")
lns = Path(fname).open().readlines()
src = lmap(str.strip, lns[::4])
gold = lmap(str.strip, lns[1::4])

View File

@@ -2,7 +2,8 @@ import logging
import os
import time
from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum
from typing import List, Optional, Union
import torch
from filelock import FileLock
@@ -47,6 +48,12 @@ class GlueDataTrainingArguments:
self.task_name = self.task_name.lower()
class Split(Enum):
train = "train"
dev = "dev"
test = "test"
class GlueDataset(Dataset):
"""
This will be superseded by a framework-agnostic approach
@@ -62,16 +69,21 @@ class GlueDataset(Dataset):
args: GlueDataTrainingArguments,
tokenizer: PreTrainedTokenizer,
limit_length: Optional[int] = None,
evaluate=False,
mode: Union[str, Split] = Split.train,
):
self.args = args
processor = glue_processors[args.task_name]()
self.processor = glue_processors[args.task_name]()
self.output_mode = glue_output_modes[args.task_name]
if isinstance(mode, str):
try:
mode = Split[mode]
except KeyError:
raise KeyError("mode is not a valid split name")
# Load data features from cache or dataset file
cached_features_file = os.path.join(
args.data_dir,
"cached_{}_{}_{}_{}".format(
"dev" if evaluate else "train", tokenizer.__class__.__name__, str(args.max_seq_length), args.task_name,
mode.value, tokenizer.__class__.__name__, str(args.max_seq_length), args.task_name,
),
)
@@ -88,7 +100,7 @@ class GlueDataset(Dataset):
)
else:
logger.info(f"Creating features from dataset file at {args.data_dir}")
label_list = processor.get_labels()
label_list = self.processor.get_labels()
if args.task_name in ["mnli", "mnli-mm"] and tokenizer.__class__ in (
RobertaTokenizer,
RobertaTokenizerFast,
@@ -96,11 +108,12 @@ class GlueDataset(Dataset):
):
# HACK(label indices are swapped in RoBERTa pretrained model)
label_list[1], label_list[2] = label_list[2], label_list[1]
examples = (
processor.get_dev_examples(args.data_dir)
if evaluate
else processor.get_train_examples(args.data_dir)
)
if mode == Split.dev:
examples = self.processor.get_dev_examples(args.data_dir)
elif mode == Split.test:
examples = self.processor.get_test_examples(args.data_dir)
else:
examples = self.processor.get_train_examples(args.data_dir)
if limit_length is not None:
examples = examples[:limit_length]
self.features = glue_convert_examples_to_features(
@@ -114,7 +127,7 @@ class GlueDataset(Dataset):
torch.save(self.features, cached_features_file)
# ^ This seems to take a lot of time so I want to investigate why and how we can improve.
logger.info(
f"Saving features into cached file %s [took %.3f s]", cached_features_file, time.time() - start
"Saving features into cached file %s [took %.3f s]", cached_features_file, time.time() - start
)
def __len__(self):
@@ -122,3 +135,6 @@ class GlueDataset(Dataset):
def __getitem__(self, i) -> InputFeatures:
return self.features[i]
def get_labels(self):
return self.processor.get_labels()

View File

@@ -4,10 +4,10 @@ import pickle
import time
import torch
from filelock import FileLock
from torch.utils.data.dataset import Dataset
from ...tokenization_utils import PreTrainedTokenizer
from ...trainer import torch_distributed_zero_first
logger = logging.getLogger(__name__)
@@ -20,7 +20,7 @@ class TextDataset(Dataset):
"""
def __init__(
self, tokenizer: PreTrainedTokenizer, file_path: str, block_size: int, overwrite_cache=False, local_rank=-1,
self, tokenizer: PreTrainedTokenizer, file_path: str, block_size: int, overwrite_cache=False,
):
assert os.path.isfile(file_path)
@@ -31,9 +31,10 @@ class TextDataset(Dataset):
directory, "cached_lm_{}_{}_{}".format(tokenizer.__class__.__name__, str(block_size), filename,),
)
with torch_distributed_zero_first(local_rank):
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
# Make sure only the first process in distributed training processes the dataset,
# and the others will use the cache.
lock_path = cached_features_file + ".lock"
with FileLock(lock_path):
if os.path.exists(cached_features_file) and not overwrite_cache:
start = time.time()
@@ -64,7 +65,7 @@ class TextDataset(Dataset):
with open(cached_features_file, "wb") as handle:
pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)
logger.info(
f"Saving features into cached file %s [took %.3f s]", cached_features_file, time.time() - start
"Saving features into cached file %s [took %.3f s]", cached_features_file, time.time() - start
)
def __len__(self):
@@ -80,7 +81,7 @@ class LineByLineTextDataset(Dataset):
soon.
"""
def __init__(self, tokenizer: PreTrainedTokenizer, file_path: str, block_size: int, local_rank=-1):
def __init__(self, tokenizer: PreTrainedTokenizer, file_path: str, block_size: int):
assert os.path.isfile(file_path)
# Here, we do not cache the features, operating under the assumption
# that we will soon use fast multithreaded tokenizers from the

View File

@@ -126,7 +126,9 @@ def _glue_convert_examples_to_features(
label_map = {label: i for i, label in enumerate(label_list)}
def label_from_example(example: InputExample) -> Union[int, float]:
def label_from_example(example: InputExample) -> Union[int, float, None]:
if example.label is None:
return None
if output_mode == "classification":
return label_map[example.label]
elif output_mode == "regression":
@@ -180,12 +182,16 @@ class MrpcProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -193,7 +199,7 @@ class MrpcProcessor(DataProcessor):
guid = "%s-%s" % (set_type, i)
text_a = line[3]
text_b = line[4]
label = line[0]
label = None if set_type == "test" else line[0]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples
@@ -218,12 +224,16 @@ class MnliProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev_matched.tsv")), "dev_matched")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test_matched.tsv")), "test_matched")
def get_labels(self):
"""See base class."""
return ["contradiction", "entailment", "neutral"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -231,7 +241,7 @@ class MnliProcessor(DataProcessor):
guid = "%s-%s" % (set_type, line[0])
text_a = line[8]
text_b = line[9]
label = line[-1]
label = None if set_type.startswith("test") else line[-1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples
@@ -241,7 +251,11 @@ class MnliMismatchedProcessor(MnliProcessor):
def get_dev_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev_mismatched.tsv")), "dev_matched")
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev_mismatched.tsv")), "dev_mismatched")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test_mismatched.tsv")), "test_mismatched")
class ColaProcessor(DataProcessor):
@@ -264,17 +278,25 @@ class ColaProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
test_mode = set_type == "test"
if test_mode:
lines = lines[1:]
text_index = 1 if test_mode else 3
examples = []
for (i, line) in enumerate(lines):
guid = "%s-%s" % (set_type, i)
text_a = line[3]
label = line[1]
text_a = line[text_index]
label = None if test_mode else line[1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
return examples
@@ -299,19 +321,23 @@ class Sst2Processor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
continue
guid = "%s-%s" % (set_type, i)
text_a = line[0]
label = line[1]
label = None if set_type == "test" else line[1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
return examples
@@ -336,12 +362,16 @@ class StsbProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return [None]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -349,7 +379,7 @@ class StsbProcessor(DataProcessor):
guid = "%s-%s" % (set_type, line[0])
text_a = line[7]
text_b = line[8]
label = line[-1]
label = None if set_type == "test" else line[-1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples
@@ -374,21 +404,28 @@ class QqpProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
test_mode = set_type == "test"
q1_index = 1 if test_mode else 3
q2_index = 2 if test_mode else 4
examples = []
for (i, line) in enumerate(lines):
if i == 0:
continue
guid = "%s-%s" % (set_type, line[0])
try:
text_a = line[3]
text_b = line[4]
label = line[5]
text_a = line[q1_index]
text_b = line[q2_index]
label = None if test_mode else line[5]
except IndexError:
continue
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
@@ -413,14 +450,18 @@ class QnliProcessor(DataProcessor):
def get_dev_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev_matched")
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["entailment", "not_entailment"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -428,7 +469,7 @@ class QnliProcessor(DataProcessor):
guid = "%s-%s" % (set_type, line[0])
text_a = line[1]
text_b = line[2]
label = line[-1]
label = None if set_type == "test" else line[-1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples
@@ -453,12 +494,16 @@ class RteProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["entailment", "not_entailment"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -466,7 +511,7 @@ class RteProcessor(DataProcessor):
guid = "%s-%s" % (set_type, line[0])
text_a = line[1]
text_b = line[2]
label = line[-1]
label = None if set_type == "test" else line[-1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples
@@ -491,12 +536,16 @@ class WnliProcessor(DataProcessor):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
"""Creates examples for the training, dev and test sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
@@ -504,7 +553,7 @@ class WnliProcessor(DataProcessor):
guid = "%s-%s" % (set_type, line[0])
text_a = line[1]
text_b = line[2]
label = line[-1]
label = None if set_type == "test" else line[-1]
examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
return examples

View File

@@ -195,18 +195,22 @@ def squad_convert_example_to_features(example, max_seq_length, doc_stride, max_q
cls_index = span["input_ids"].index(tokenizer.cls_token_id)
# p_mask: mask with 1 for token than cannot be in the answer (0 for token which can be in an answer)
# Original TF implem also keep the classification token (set to 0) (not sure why...)
p_mask = np.array(span["token_type_ids"])
p_mask = np.minimum(p_mask, 1)
# Original TF implem also keep the classification token (set to 0)
p_mask = np.ones_like(span["token_type_ids"])
if tokenizer.padding_side == "right":
# Limit positive values to one
p_mask = 1 - p_mask
p_mask[len(truncated_query) + sequence_added_tokens :] = 0
else:
p_mask[-len(span["tokens"]) : -(len(truncated_query) + sequence_added_tokens)] = 0
p_mask[np.where(np.array(span["input_ids"]) == tokenizer.sep_token_id)[0]] = 1
pad_token_indices = np.where(span["input_ids"] == tokenizer.pad_token_id)
special_token_indices = np.asarray(
tokenizer.get_special_tokens_mask(span["input_ids"], already_has_special_tokens=True)
).nonzero()
# Set the CLS index to '0'
p_mask[pad_token_indices] = 1
p_mask[special_token_indices] = 1
# Set the cls index to 0: the CLS index can be used for impossible answers
p_mask[cls_index] = 0
span_is_impossible = example.is_impossible

View File

@@ -98,6 +98,10 @@ class DataProcessor:
"""Gets a collection of `InputExample`s for the dev set."""
raise NotImplementedError()
def get_test_examples(self, data_dir):
"""Gets a collection of `InputExample`s for the test set."""
raise NotImplementedError()
def get_labels(self):
"""Gets the list of labels for this data set."""
raise NotImplementedError()

View File

@@ -550,7 +550,7 @@ class AlbertModel(AlbertPreTrainedModel):
token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
extended_attention_mask = extended_attention_mask.to(dtype=self.dtype) # fp16 compatibility
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

View File

@@ -30,6 +30,7 @@ from .configuration_auto import (
EncoderDecoderConfig,
FlaubertConfig,
GPT2Config,
LongformerConfig,
OpenAIGPTConfig,
ReformerConfig,
RobertaConfig,
@@ -87,6 +88,7 @@ from .modeling_electra import (
ELECTRA_PRETRAINED_MODEL_ARCHIVE_MAP,
ElectraForMaskedLM,
ElectraForPreTraining,
ElectraForSequenceClassification,
ElectraForTokenClassification,
ElectraModel,
)
@@ -99,6 +101,7 @@ from .modeling_flaubert import (
FlaubertWithLMHeadModel,
)
from .modeling_gpt2 import GPT2_PRETRAINED_MODEL_ARCHIVE_MAP, GPT2LMHeadModel, GPT2Model
from .modeling_longformer import LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP, LongformerForMaskedLM, LongformerModel
from .modeling_marian import MarianMTModel
from .modeling_openai import OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_MAP, OpenAIGPTLMHeadModel, OpenAIGPTModel
from .modeling_reformer import ReformerModel, ReformerModelWithLMHead
@@ -162,6 +165,7 @@ ALL_PRETRAINED_MODEL_ARCHIVE_MAP = dict(
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_MAP,
XLM_ROBERTA_PRETRAINED_MODEL_ARCHIVE_MAP,
ELECTRA_PRETRAINED_MODEL_ARCHIVE_MAP,
LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP,
]
for key, value, in pretrained_map.items()
)
@@ -174,6 +178,7 @@ MODEL_MAPPING = OrderedDict(
(CamembertConfig, CamembertModel),
(XLMRobertaConfig, XLMRobertaModel),
(BartConfig, BartModel),
(LongformerConfig, LongformerModel),
(RobertaConfig, RobertaModel),
(BertConfig, BertModel),
(OpenAIGPTConfig, OpenAIGPTModel),
@@ -196,6 +201,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict(
(CamembertConfig, CamembertForMaskedLM),
(XLMRobertaConfig, XLMRobertaForMaskedLM),
(BartConfig, BartForConditionalGeneration),
(LongformerConfig, LongformerForMaskedLM),
(RobertaConfig, RobertaForMaskedLM),
(BertConfig, BertForPreTraining),
(OpenAIGPTConfig, OpenAIGPTLMHeadModel),
@@ -218,6 +224,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
(XLMRobertaConfig, XLMRobertaForMaskedLM),
(MarianConfig, MarianMTModel),
(BartConfig, BartForConditionalGeneration),
(LongformerConfig, LongformerForMaskedLM),
(RobertaConfig, RobertaForMaskedLM),
(BertConfig, BertForMaskedLM),
(OpenAIGPTConfig, OpenAIGPTLMHeadModel),
@@ -245,6 +252,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
(XLNetConfig, XLNetForSequenceClassification),
(FlaubertConfig, FlaubertForSequenceClassification),
(XLMConfig, XLMForSequenceClassification),
(ElectraConfig, ElectraForSequenceClassification),
]
)
@@ -313,6 +321,7 @@ class AutoModel:
The model class to instantiate is selected based on the configuration class:
- isInstance of `distilbert` configuration class: :class:`~transformers.DistilBertModel` (DistilBERT model)
- isInstance of `longformer` configuration class: :class:`~transformers.LongformerModel` (Longformer model)
- isInstance of `roberta` configuration class: :class:`~transformers.RobertaModel` (RoBERTa model)
- isInstance of `bert` configuration class: :class:`~transformers.BertModel` (Bert model)
- isInstance of `openai-gpt` configuration class: :class:`~transformers.OpenAIGPTModel` (OpenAI GPT model)
@@ -355,6 +364,7 @@ class AutoModel:
- contains `albert`: :class:`~transformers.AlbertModel` (ALBERT model)
- contains `camembert`: :class:`~transformers.CamembertModel` (CamemBERT model)
- contains `xlm-roberta`: :class:`~transformers.XLMRobertaModel` (XLM-RoBERTa model)
- contains `longformer` :class:`~transformers.LongformerModel` (Longformer model)
- contains `roberta`: :class:`~transformers.RobertaModel` (RoBERTa model)
- contains `bert`: :class:`~transformers.BertModel` (Bert model)
- contains `openai-gpt`: :class:`~transformers.OpenAIGPTModel` (OpenAI GPT model)
@@ -463,6 +473,7 @@ class AutoModelForPreTraining:
The model class to instantiate is selected based on the configuration class:
- isInstance of `distilbert` configuration class: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
- isInstance of `longformer` configuration class: :class:`~transformers.LongformerForMaskedLM` (Longformer model)
- isInstance of `roberta` configuration class: :class:`~transformers.RobertaForMaskedLM` (RoBERTa model)
- isInstance of `bert` configuration class: :class:`~transformers.BertForPreTraining` (Bert model)
- isInstance of `openai-gpt` configuration class: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
@@ -504,6 +515,7 @@ class AutoModelForPreTraining:
- contains `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
- contains `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model)
- contains `xlm-roberta`: :class:`~transformers.XLMRobertaForMaskedLM` (XLM-RoBERTa model)
- contains `longformer`: :class:`~transformers.LongformerForMaskedLM` (Longformer model)
- contains `roberta`: :class:`~transformers.RobertaForMaskedLM` (RoBERTa model)
- contains `bert`: :class:`~transformers.BertForPreTraining` (Bert model)
- contains `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
@@ -606,6 +618,7 @@ class AutoModelWithLMHead:
The model class to instantiate is selected based on the configuration class:
- isInstance of `distilbert` configuration class: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
- isInstance of `longformer` configuration class: :class:`~transformers.LongformerForMaskedLM` (Longformer model)
- isInstance of `roberta` configuration class: :class:`~transformers.RobertaForMaskedLM` (RoBERTa model)
- isInstance of `bert` configuration class: :class:`~transformers.BertForMaskedLM` (Bert model)
- isInstance of `openai-gpt` configuration class: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
@@ -648,6 +661,7 @@ class AutoModelWithLMHead:
- contains `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
- contains `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model)
- contains `xlm-roberta`: :class:`~transformers.XLMRobertaForMaskedLM` (XLM-RoBERTa model)
- contains `longformer`: :class:`~transformers.LongformerForMaskedLM` (Longformer model)
- contains `roberta`: :class:`~transformers.RobertaForMaskedLM` (RoBERTa model)
- contains `bert`: :class:`~transformers.BertForMaskedLM` (Bert model)
- contains `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)

View File

@@ -703,9 +703,7 @@ class BertModel(BertPreTrainedModel):
# We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
# ourselves in which case we just need to make it broadcastable to all heads.
extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(
attention_mask, input_shape, self.device
)
extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
# If a 2D ou 3D attention mask is provided for the cross-attention
# we need to make broadcastabe to [batch_size, num_heads, seq_length, seq_length]

View File

@@ -3,6 +3,7 @@ import os
import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss, MSELoss
from .activations import get_activation
from .configuration_electra import ElectraConfig
@@ -330,6 +331,112 @@ class ElectraModel(ElectraPreTrainedModel):
return hidden_states
class ElectraClassificationHead(nn.Module):
"""Head for sentence-level classification tasks."""
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.hidden_size, config.num_labels)
def forward(self, features, **kwargs):
x = features[:, 0, :] # take <s> token (equiv. to [CLS])
x = self.dropout(x)
x = self.dense(x)
x = get_activation("gelu")(x) # although BERT uses tanh here, it seems Electra authors used gelu here
x = self.dropout(x)
x = self.out_proj(x)
return x
@add_start_docstrings(
"""ELECTRA Model transformer with a sequence classification/regression head on top (a linear layer on top of
the pooled output) e.g. for GLUE tasks. """,
ELECTRA_START_DOCSTRING,
)
class ElectraForSequenceClassification(ElectraPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.electra = ElectraModel(config)
self.classifier = ElectraClassificationHead(config)
self.init_weights()
@add_start_docstrings_to_callable(ELECTRA_INPUTS_DOCSTRING)
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
labels=None,
):
r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for computing the sequence classification/regression loss.
Indices should be in :obj:`[0, ..., config.num_labels - 1]`.
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`label` is provided):
Classification (or regression if config.num_labels==1) loss.
logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`):
Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Examples::
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
"""
discriminator_hidden_states = self.electra(
input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds
)
sequence_output = discriminator_hidden_states[0]
logits = self.classifier(sequence_output)
outputs = (logits,) + discriminator_hidden_states[2:] # add hidden states and attention if they are here
if labels is not None:
if self.num_labels == 1:
# We are doing regression
loss_fct = MSELoss()
loss = loss_fct(logits.view(-1), labels.view(-1))
else:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs # (loss), logits, (hidden_states), (attentions)
@add_start_docstrings(
"""
Electra model with a binary classification head on top as used during pre-training for identifying generated

View File

@@ -0,0 +1,709 @@
# coding=utf-8
# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""PyTorch Longformer model. """
import logging
import math
import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss
from torch.nn import functional as F
from .configuration_longformer import LongformerConfig
from .file_utils import add_start_docstrings, add_start_docstrings_to_callable
from .modeling_bert import BertPreTrainedModel
from .modeling_roberta import RobertaLMHead, RobertaModel
logger = logging.getLogger(__name__)
LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP = {
"longformer-base-4096": "https://s3.amazonaws.com/models.huggingface.co/bert/allenai/longformer-base-4096/pytorch_model.bin",
"longformer-large-4096": "https://s3.amazonaws.com/models.huggingface.co/bert/allenai/longformer-large-4096/pytorch_model.bin",
}
class LongformerSelfAttention(nn.Module):
def __init__(self, config, layer_id):
super().__init__()
if config.hidden_size % config.num_attention_heads != 0:
raise ValueError(
"The hidden size (%d) is not a multiple of the number of attention "
"heads (%d)" % (config.hidden_size, config.num_attention_heads)
)
self.output_attentions = config.output_attentions
self.num_heads = config.num_attention_heads
self.head_dim = int(config.hidden_size / config.num_attention_heads)
self.embed_dim = config.hidden_size
self.query = nn.Linear(config.hidden_size, self.embed_dim)
self.key = nn.Linear(config.hidden_size, self.embed_dim)
self.value = nn.Linear(config.hidden_size, self.embed_dim)
# separate projection layers for tokens with global attention
self.query_global = nn.Linear(config.hidden_size, self.embed_dim)
self.key_global = nn.Linear(config.hidden_size, self.embed_dim)
self.value_global = nn.Linear(config.hidden_size, self.embed_dim)
self.dropout = config.attention_probs_dropout_prob
self.layer_id = layer_id
attention_window = config.attention_window[self.layer_id]
assert (
attention_window % 2 == 0
), f"`attention_window` for layer {self.layer_id} has to be an even value. Given {attention_window}"
assert (
attention_window > 0
), f"`attention_window` for layer {self.layer_id} has to be positive. Given {attention_window}"
self.one_sided_attention_window_size = attention_window // 2
@staticmethod
def _skew(x, direction):
"""Convert diagonals into columns (or columns into diagonals depending on `direction`"""
x_padded = F.pad(x, direction) # padding value is not important because it will be overwritten
x_padded = x_padded.view(*x_padded.size()[:-2], x_padded.size(-1), x_padded.size(-2))
return x_padded
@staticmethod
def _skew2(x):
"""shift every row 1 step to right converting columns into diagonals"""
# X = B x C x M x L
B, C, M, L = x.size()
x = F.pad(x, (0, M + 1)) # B x C x M x (L+M+1). Padding value is not important because it'll be overwritten
x = x.view(B, C, -1) # B x C x ML+MM+M
x = x[:, :, :-M] # B x C x ML+MM
x = x.view(B, C, M, M + L) # B x C, M x L+M
x = x[:, :, :, :-1]
return x
@staticmethod
def _chunk(x, w):
"""convert into overlapping chunkings. Chunk size = 2w, overlap size = w"""
# non-overlapping chunks of size = 2w
x = x.view(x.size(0), x.size(1) // (w * 2), w * 2, x.size(2))
# use `as_strided` to make the chunks overlap with an overlap size = w
chunk_size = list(x.size())
chunk_size[1] = chunk_size[1] * 2 - 1
chunk_stride = list(x.stride())
chunk_stride[1] = chunk_stride[1] // 2
return x.as_strided(size=chunk_size, stride=chunk_stride)
def _mask_invalid_locations(self, input_tensor, w) -> torch.Tensor:
affected_seqlen = w
beginning_mask_2d = input_tensor.new_ones(w, w + 1).tril().flip(dims=[0])
beginning_mask = beginning_mask_2d[None, :, None, :]
ending_mask = beginning_mask.flip(dims=(1, 3))
seqlen = input_tensor.size(1)
beginning_input = input_tensor[:, :affected_seqlen, :, : w + 1]
beginning_mask = beginning_mask[:, :seqlen].expand(beginning_input.size())
beginning_input.masked_fill_(beginning_mask == 1, -float("inf")) # `== 1` converts to bool or uint8
ending_input = input_tensor[:, -affected_seqlen:, :, -(w + 1) :]
ending_mask = ending_mask[:, -seqlen:].expand(ending_input.size())
ending_input.masked_fill_(ending_mask == 1, -float("inf")) # `== 1` converts to bool or uint8
def _sliding_chunks_matmul_qk(self, q: torch.Tensor, k: torch.Tensor, w: int):
"""Matrix multiplicatio of query x key tensors using with a sliding window attention pattern.
This implementation splits the input into overlapping chunks of size 2w (e.g. 512 for pretrained Longformer)
with an overlap of size w"""
batch_size, seqlen, num_heads, head_dim = q.size()
assert seqlen % (w * 2) == 0, f"Sequence length should be multiple of {w * 2}. Given {seqlen}"
assert q.size() == k.size()
chunks_count = seqlen // w - 1
# group batch_size and num_heads dimensions into one, then chunk seqlen into chunks of size w * 2
q = q.transpose(1, 2).reshape(batch_size * num_heads, seqlen, head_dim)
k = k.transpose(1, 2).reshape(batch_size * num_heads, seqlen, head_dim)
chunk_q = self._chunk(q, w)
chunk_k = self._chunk(k, w)
# matrix multipication
# bcxd: batch_size * num_heads x chunks x 2w x head_dim
# bcyd: batch_size * num_heads x chunks x 2w x head_dim
# bcxy: batch_size * num_heads x chunks x 2w x 2w
chunk_attn = torch.einsum("bcxd,bcyd->bcxy", (chunk_q, chunk_k)) # multiply
# convert diagonals into columns
diagonal_chunk_attn = self._skew(chunk_attn, direction=(0, 0, 0, 1))
# allocate space for the overall attention matrix where the chunks are compined. The last dimension
# has (w * 2 + 1) columns. The first (w) columns are the w lower triangles (attention from a word to
# w previous words). The following column is attention score from each word to itself, then
# followed by w columns for the upper triangle.
diagonal_attn = diagonal_chunk_attn.new_empty((batch_size * num_heads, chunks_count + 1, w, w * 2 + 1))
# copy parts from diagonal_chunk_attn into the compined matrix of attentions
# - copying the main diagonal and the upper triangle
diagonal_attn[:, :-1, :, w:] = diagonal_chunk_attn[:, :, :w, : w + 1]
diagonal_attn[:, -1, :, w:] = diagonal_chunk_attn[:, -1, w:, : w + 1]
# - copying the lower triangle
diagonal_attn[:, 1:, :, :w] = diagonal_chunk_attn[:, :, -(w + 1) : -1, w + 1 :]
diagonal_attn[:, 0, 1:w, 1:w] = diagonal_chunk_attn[:, 0, : w - 1, 1 - w :]
# separate batch_size and num_heads dimensions again
diagonal_attn = diagonal_attn.view(batch_size, num_heads, seqlen, 2 * w + 1).transpose(2, 1)
self._mask_invalid_locations(diagonal_attn, w)
return diagonal_attn
def _sliding_chunks_matmul_pv(self, prob: torch.Tensor, v: torch.Tensor, w: int):
"""Same as _sliding_chunks_matmul_qk but for prob and value tensors. It is expecting the same output
format from _sliding_chunks_matmul_qk"""
batch_size, seqlen, num_heads, head_dim = v.size()
assert seqlen % (w * 2) == 0
assert prob.size()[:3] == v.size()[:3]
assert prob.size(3) == 2 * w + 1
chunks_count = seqlen // w - 1
# group batch_size and num_heads dimensions into one, then chunk seqlen into chunks of size 2w
chunk_prob = prob.transpose(1, 2).reshape(batch_size * num_heads, seqlen // w, w, 2 * w + 1)
# group batch_size and num_heads dimensions into one
v = v.transpose(1, 2).reshape(batch_size * num_heads, seqlen, head_dim)
# pad seqlen with w at the beginning of the sequence and another w at the end
padded_v = F.pad(v, (0, 0, w, w), value=-1)
# chunk padded_v into chunks of size 3w and an overlap of size w
chunk_v_size = (batch_size * num_heads, chunks_count + 1, 3 * w, head_dim)
chunk_v_stride = padded_v.stride()
chunk_v_stride = chunk_v_stride[0], w * chunk_v_stride[1], chunk_v_stride[1], chunk_v_stride[2]
chunk_v = padded_v.as_strided(size=chunk_v_size, stride=chunk_v_stride)
skewed_prob = self._skew2(chunk_prob)
context = torch.einsum("bcwd,bcdh->bcwh", (skewed_prob, chunk_v))
return context.view(batch_size, num_heads, seqlen, head_dim).transpose(1, 2)
def forward(
self,
hidden_states,
attention_mask=None,
head_mask=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
):
"""
LongformerSelfAttention expects `len(hidden_states)` to be multiple of `attention_window`.
Padding to `attention_window` happens in LongformerModel.forward to avoid redoing the padding on each layer.
The `attention_mask` is changed in `BertModel.forward` from 0, 1, 2 to
-ve: no attention
0: local attention
+ve: global attention
`encoder_hidden_states` and `encoder_attention_mask` are not supported and should be None
"""
# TODO: add support for `encoder_hidden_states` and `encoder_attention_mask`
assert encoder_hidden_states is None, "`encoder_hidden_states` is not supported and should be None"
assert encoder_attention_mask is None, "`encoder_attention_mask` is not supported and shiould be None"
if attention_mask is not None:
attention_mask = attention_mask.squeeze(dim=2).squeeze(dim=1)
key_padding_mask = attention_mask < 0
extra_attention_mask = attention_mask > 0
remove_from_windowed_attention_mask = attention_mask != 0
num_extra_indices_per_batch = extra_attention_mask.long().sum(dim=1)
max_num_extra_indices_per_batch = num_extra_indices_per_batch.max()
if max_num_extra_indices_per_batch <= 0:
extra_attention_mask = None
else:
# To support the case of variable number of global attention in the rows of a batch,
# we use the following three selection masks to select global attention embeddings
# in a 3d tensor and pad it to `max_num_extra_indices_per_batch`
# 1) selecting embeddings that correspond to global attention
extra_attention_mask_nonzeros = extra_attention_mask.nonzero(as_tuple=True)
zero_to_max_range = torch.arange(
0, max_num_extra_indices_per_batch, device=num_extra_indices_per_batch.device
)
# mask indicating which values are actually going to be padding
selection_padding_mask = zero_to_max_range < num_extra_indices_per_batch.unsqueeze(dim=-1)
# 2) location of the non-padding values in the selected global attention
selection_padding_mask_nonzeros = selection_padding_mask.nonzero(as_tuple=True)
# 3) location of the padding values in the selected global attention
selection_padding_mask_zeros = (selection_padding_mask == 0).nonzero(as_tuple=True)
else:
remove_from_windowed_attention_mask = None
extra_attention_mask = None
key_padding_mask = None
hidden_states = hidden_states.transpose(0, 1)
seqlen, batch_size, embed_dim = hidden_states.size()
assert embed_dim == self.embed_dim
q = self.query(hidden_states)
k = self.key(hidden_states)
v = self.value(hidden_states)
q /= math.sqrt(self.head_dim)
q = q.view(seqlen, batch_size, self.num_heads, self.head_dim).transpose(0, 1)
k = k.view(seqlen, batch_size, self.num_heads, self.head_dim).transpose(0, 1)
# attn_weights = (batch_size, seqlen, num_heads, window*2+1)
attn_weights = self._sliding_chunks_matmul_qk(q, k, self.one_sided_attention_window_size)
self._mask_invalid_locations(attn_weights, self.one_sided_attention_window_size)
if remove_from_windowed_attention_mask is not None:
# This implementation is fast and takes very little memory because num_heads x hidden_size = 1
# from (batch_size x seqlen) to (batch_size x seqlen x num_heads x hidden_size)
remove_from_windowed_attention_mask = remove_from_windowed_attention_mask.unsqueeze(dim=-1).unsqueeze(
dim=-1
)
# cast to fp32/fp16 then replace 1's with -inf
float_mask = remove_from_windowed_attention_mask.type_as(q).masked_fill(
remove_from_windowed_attention_mask, -10000.0
)
ones = float_mask.new_ones(size=float_mask.size()) # tensor of ones
# diagonal mask with zeros everywhere and -inf inplace of padding
d_mask = self._sliding_chunks_matmul_qk(ones, float_mask, self.one_sided_attention_window_size)
attn_weights += d_mask
assert list(attn_weights.size()) == [
batch_size,
seqlen,
self.num_heads,
self.one_sided_attention_window_size * 2 + 1,
]
# the extra attention
if extra_attention_mask is not None:
selected_k = k.new_zeros(batch_size, max_num_extra_indices_per_batch, self.num_heads, self.head_dim)
selected_k[selection_padding_mask_nonzeros] = k[extra_attention_mask_nonzeros]
# (batch_size, seqlen, num_heads, max_num_extra_indices_per_batch)
selected_attn_weights = torch.einsum("blhd,bshd->blhs", (q, selected_k))
selected_attn_weights[selection_padding_mask_zeros[0], :, :, selection_padding_mask_zeros[1]] = -10000
# concat to attn_weights
# (batch_size, seqlen, num_heads, extra attention count + 2*window+1)
attn_weights = torch.cat((selected_attn_weights, attn_weights), dim=-1)
attn_weights_fp32 = F.softmax(attn_weights, dim=-1, dtype=torch.float32) # use fp32 for numerical stability
attn_weights = attn_weights_fp32.type_as(attn_weights)
if key_padding_mask is not None:
# softmax sometimes inserts NaN if all positions are masked, replace them with 0
attn_weights = torch.masked_fill(attn_weights, key_padding_mask.unsqueeze(-1).unsqueeze(-1), 0.0)
attn_probs = F.dropout(attn_weights, p=self.dropout, training=self.training)
v = v.view(seqlen, batch_size, self.num_heads, self.head_dim).transpose(0, 1)
attn = None
if extra_attention_mask is not None:
selected_attn_probs = attn_probs.narrow(-1, 0, max_num_extra_indices_per_batch)
selected_v = v.new_zeros(batch_size, max_num_extra_indices_per_batch, self.num_heads, self.head_dim)
selected_v[selection_padding_mask_nonzeros] = v[extra_attention_mask_nonzeros]
# use `matmul` because `einsum` crashes sometimes with fp16
# attn = torch.einsum('blhs,bshd->blhd', (selected_attn_probs, selected_v))
attn = torch.matmul(
selected_attn_probs.transpose(1, 2), selected_v.transpose(1, 2).type_as(selected_attn_probs)
).transpose(1, 2)
attn_probs = attn_probs.narrow(
-1, max_num_extra_indices_per_batch, attn_probs.size(-1) - max_num_extra_indices_per_batch
).contiguous()
if attn is None:
attn = self._sliding_chunks_matmul_pv(attn_probs, v, self.one_sided_attention_window_size)
else:
attn += self._sliding_chunks_matmul_pv(attn_probs, v, self.one_sided_attention_window_size)
assert attn.size() == (batch_size, seqlen, self.num_heads, self.head_dim), "Unexpected size"
attn = attn.transpose(0, 1).reshape(seqlen, batch_size, embed_dim).contiguous()
# For this case, we'll just recompute the attention for these indices
# and overwrite the attn tensor.
# TODO: remove the redundant computation
if extra_attention_mask is not None:
selected_hidden_states = hidden_states.new_zeros(max_num_extra_indices_per_batch, batch_size, embed_dim)
selected_hidden_states[selection_padding_mask_nonzeros[::-1]] = hidden_states[
extra_attention_mask_nonzeros[::-1]
]
q = self.query_global(selected_hidden_states)
k = self.key_global(hidden_states)
v = self.value_global(hidden_states)
q /= math.sqrt(self.head_dim)
q = (
q.contiguous()
.view(max_num_extra_indices_per_batch, batch_size * self.num_heads, self.head_dim)
.transpose(0, 1)
) # (batch_size * self.num_heads, max_num_extra_indices_per_batch, head_dim)
k = (
k.contiguous().view(-1, batch_size * self.num_heads, self.head_dim).transpose(0, 1)
) # batch_size * self.num_heads, seqlen, head_dim)
v = (
v.contiguous().view(-1, batch_size * self.num_heads, self.head_dim).transpose(0, 1)
) # batch_size * self.num_heads, seqlen, head_dim)
attn_weights = torch.bmm(q, k.transpose(1, 2))
assert list(attn_weights.size()) == [batch_size * self.num_heads, max_num_extra_indices_per_batch, seqlen]
attn_weights = attn_weights.view(batch_size, self.num_heads, max_num_extra_indices_per_batch, seqlen)
attn_weights[selection_padding_mask_zeros[0], :, selection_padding_mask_zeros[1], :] = -10000.0
if key_padding_mask is not None:
attn_weights = attn_weights.masked_fill(key_padding_mask.unsqueeze(1).unsqueeze(2), -10000.0,)
attn_weights = attn_weights.view(batch_size * self.num_heads, max_num_extra_indices_per_batch, seqlen)
attn_weights_float = F.softmax(
attn_weights, dim=-1, dtype=torch.float32
) # use fp32 for numerical stability
attn_probs = F.dropout(attn_weights_float.type_as(attn_weights), p=self.dropout, training=self.training)
selected_attn = torch.bmm(attn_probs, v)
assert list(selected_attn.size()) == [
batch_size * self.num_heads,
max_num_extra_indices_per_batch,
self.head_dim,
]
selected_attn_4d = selected_attn.view(
batch_size, self.num_heads, max_num_extra_indices_per_batch, self.head_dim
)
nonzero_selected_attn = selected_attn_4d[
selection_padding_mask_nonzeros[0], :, selection_padding_mask_nonzeros[1]
]
attn[extra_attention_mask_nonzeros[::-1]] = nonzero_selected_attn.view(
len(selection_padding_mask_nonzeros[0]), -1
).type_as(hidden_states)
context_layer = attn.transpose(0, 1)
if self.output_attentions:
if extra_attention_mask is not None:
# With global attention, return global attention probabilities only
# batch_size x num_heads x max_num_global_attention_tokens x sequence_length
# which is the attention weights from tokens with global attention to all tokens
# It doesn't not return local attention
# In case of variable number of global attantion in the rows of a batch,
# attn_weights are padded with -10000.0 attention scores
attn_weights = attn_weights.view(batch_size, self.num_heads, max_num_extra_indices_per_batch, seqlen)
else:
# without global attention, return local attention probabilities
# batch_size x num_heads x sequence_length x window_size
# which is the attention weights of every token attending to its neighbours
attn_weights = attn_weights.permute(0, 2, 1, 3)
outputs = (context_layer, attn_weights) if self.output_attentions else (context_layer,)
return outputs
LONGFORMER_START_DOCSTRING = r"""
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
usage and behavior.
Parameters:
config (:class:`~transformers.LongformerConfig`): Model configuration class with all the parameters of the
model. Initializing with a config file does not load the weights associated with the model, only the configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
"""
LONGFORMER_INPUTS_DOCSTRING = r"""
Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.LonmgformerTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.encode_plus` for details.
`What are input IDs? <../glossary.html#input-ids>`__
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to decide the attention given on each token, local attention, global attenion, or no attention (for padding tokens).
Tokens with global attention attends to all other tokens, and all other tokens attend to them. This is important for
task-specific finetuning because it makes the model more flexible at representing the task. For example,
for classification, the <s> token should be given global attention. For QA, all question tokens should also have
global attention. Please refer to the Longformer paper https://arxiv.org/abs/2004.05150 for more details.
Mask values selected in ``[0, 1, 2]``:
``0`` for no attention (padding tokens),
``1`` for local attention (a sliding window attention),
``2`` for global attention (tokens that attend to all other tokens, and all other tokens attend to them).
`What are attention masks? <../glossary.html#attention-mask>`__
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
corresponds to a `sentence B` token
`What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Indices of positions of each input sequence tokens in the position embeddings.
Selected in the range ``[0, config.max_position_embeddings - 1]``.
`What are position IDs? <../glossary.html#position-ids>`_
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
"""
@add_start_docstrings(
"The bare Longformer Model outputting raw hidden-states without any specific head on top.",
LONGFORMER_START_DOCSTRING,
)
class LongformerModel(RobertaModel):
"""
This class overrides :class:`~transformers.RobertaModel` to provide the ability to process
long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer`_by
Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention combines a local (sliding window)
and global attention to extend to long documents without the O(n^2) increase in memory and compute.
The selfattention module `LongformerSelfAttention` implemented here supports the combination of local and
global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive
and dilated attention are more relevant for autoregressive language modeling than finetuning on downstream
tasks. Future release will add support for autoregressive attention, but the support for dilated attention
requires a custom CUDA kernel to be memory and compute efficient.
.. _`Longformer: the Long-Document Transformer`:
https://arxiv.org/abs/2004.05150
"""
config_class = LongformerConfig
pretrained_model_archive_map = LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP
base_model_prefix = "longformer"
def __init__(self, config):
super().__init__(config)
if isinstance(config.attention_window, int):
assert config.attention_window % 2 == 0, "`config.attention_window` has to be an even value"
assert config.attention_window > 0, "`config.attention_window` has to be positive"
config.attention_window = [config.attention_window] * config.num_hidden_layers # one value per layer
else:
assert len(config.attention_window) == config.num_hidden_layers, (
"`len(config.attention_window)` should equal `config.num_hidden_layers`. "
f"Expected {config.num_hidden_layers}, given {len(config.attention_window)}"
)
for i, layer in enumerate(self.encoder.layer):
# replace the `modeling_bert.BertSelfAttention` object with `LongformerSelfAttention`
layer.attention.self = LongformerSelfAttention(config, layer_id=i)
self.init_weights()
def _pad_to_window_size(
self,
input_ids: torch.Tensor,
attention_mask: torch.Tensor,
token_type_ids: torch.Tensor,
position_ids: torch.Tensor,
inputs_embeds: torch.Tensor,
attention_window: int,
pad_token_id: int,
):
"""A helper function to pad tokens and mask to work with implementation of Longformer selfattention."""
assert attention_window % 2 == 0, f"`attention_window` should be an even value. Given {attention_window}"
input_shape = input_ids.shape if input_ids is not None else inputs_embeds.shape
batch_size, seqlen = input_shape[:2]
padding_len = (attention_window - seqlen % attention_window) % attention_window
if padding_len > 0:
logger.info(
"Input ids are automatically padded from {} to {} to be a multiple of `config.attention_window`: {}".format(
seqlen, seqlen + padding_len, attention_window
)
)
if input_ids is not None:
input_ids = F.pad(input_ids, (0, padding_len), value=pad_token_id)
if attention_mask is not None:
attention_mask = F.pad(
attention_mask, (0, padding_len), value=False
) # no attention on the padding tokens
if token_type_ids is not None:
token_type_ids = F.pad(token_type_ids, (0, padding_len), value=0) # pad with token_type_id = 0
if position_ids is not None:
# pad with position_id = pad_token_id as in modeling_roberta.RobertaEmbeddings
position_ids = F.pad(position_ids, (0, padding_len), value=pad_token_id)
if inputs_embeds is not None:
input_ids_padding = inputs_embeds.new_full(
(batch_size, padding_len), self.config.pad_token_id, dtype=torch.long,
)
inputs_embeds_padding = self.embeddings(input_ids_padding)
inputs_embeds = torch.cat([inputs_embeds, inputs_embeds_padding], dim=-2)
return padding_len, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds
@add_start_docstrings_to_callable(LONGFORMER_INPUTS_DOCSTRING)
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
inputs_embeds=None,
masked_lm_labels=None,
):
r"""
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs:
masked_lm_loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Masked language modeling loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Examples::
import torch
from transformers import LongformerModel, LongformerTokenizer
model = LongformerModel.from_pretrained('longformer-base-4096')
tokenizer = LongformerTokenizer.from_pretrained('longformer-base-4096')
SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document
input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0) # batch of size 1
# Attention mask values -- 0: no attention, 1: local attention, 2: global attention
attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention
attention_mask[:, [1, 4, 21,]] = 2 # Set global attention based on the task. For example,
# classification: the <s> token
# QA: question tokens
# LM: potentially on the beginning of sentences and paragraphs
sequence_output, pooled_output = model(input_ids, attention_mask=attention_mask)
"""
# padding
attention_window = (
self.config.attention_window
if isinstance(self.config.attention_window, int)
else max(self.config.attention_window)
)
padding_len, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds = self._pad_to_window_size(
input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
inputs_embeds=inputs_embeds,
attention_window=attention_window,
pad_token_id=self.config.pad_token_id,
)
# embed
output = super().forward(
input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=None,
inputs_embeds=inputs_embeds,
encoder_hidden_states=None,
encoder_attention_mask=None,
)
# undo padding
if padding_len > 0:
# `output` has the following tensors: sequence_output, pooled_output, (hidden_states), (attentions)
# `sequence_output`: unpad because the calling function is expecting a length == input_ids.size(1)
# `pooled_output`: independent of the sequence length
# `hidden_states`: mainly used for debugging and analysis, so keep the padding
# `attentions`: mainly used for debugging and analysis, so keep the padding
output = output[0][:, :-padding_len], *output[1:]
return output
@add_start_docstrings("""Longformer Model with a `language modeling` head on top. """, LONGFORMER_START_DOCSTRING)
class LongformerForMaskedLM(BertPreTrainedModel):
config_class = LongformerConfig
pretrained_model_archive_map = LONGFORMER_PRETRAINED_MODEL_ARCHIVE_MAP
base_model_prefix = "longformer"
def __init__(self, config):
super().__init__(config)
self.longformer = LongformerModel(config)
self.lm_head = RobertaLMHead(config)
self.init_weights()
@add_start_docstrings_to_callable(LONGFORMER_INPUTS_DOCSTRING)
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
inputs_embeds=None,
masked_lm_labels=None,
):
r"""
masked_lm_labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the masked language modeling loss.
Indices should be in ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring)
Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens with labels
in ``[0, ..., config.vocab_size]``
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs:
masked_lm_loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Masked language modeling loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
Examples::
import torch
from transformers import LongformerForMaskedLM, LongformerTokenizer
model = LongformerForMaskedLM.from_pretrained('longformer-base-4096')
tokenizer = LongformerTokenizer.from_pretrained('longformer-base-4096')
SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document
input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0) # batch of size 1
attention_mask = None # default is local attention everywhere, which is a good choice for MaskedLM
# check ``LongformerModel.forward`` for more details how to set `attention_mask`
loss, prediction_scores = model(input_ids, attention_mask=attention_mask, masked_lm_labels=input_ids)
"""
outputs = self.longformer(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
inputs_embeds=inputs_embeds,
)
sequence_output = outputs[0]
prediction_scores = self.lm_head(sequence_output)
outputs = (prediction_scores,) + outputs[2:] # Add hidden states and attention if they are here
if masked_lm_labels is not None:
loss_fct = CrossEntropyLoss()
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
outputs = (masked_lm_loss,) + outputs
return outputs # (masked_lm_loss), prediction_scores, (hidden_states), (attentions)

View File

@@ -149,8 +149,12 @@ class T5LayerNorm(nn.Module):
self.variance_epsilon = eps
def forward(self, x):
variance = x.pow(2).mean(-1, keepdim=True)
# layer norm should always be calculated in float32
variance = x.to(torch.float32).pow(2).mean(-1, keepdim=True)
x = x / torch.sqrt(variance + self.variance_epsilon)
if self.weight.dtype == torch.float16:
x = x.to(torch.float16)
return self.weight * x
@@ -691,14 +695,16 @@ class T5Stack(T5PreTrainedModel):
attention_mask = torch.ones(batch_size, mask_seq_length).to(inputs_embeds.device)
if self.is_decoder and encoder_attention_mask is None and encoder_hidden_states is not None:
encoder_seq_length = encoder_hidden_states.shape[1]
encoder_attention_mask = torch.ones(batch_size, encoder_seq_length).to(inputs_embeds.device)
encoder_attention_mask = torch.ones(
batch_size, encoder_seq_length, device=inputs_embeds.device, dtype=torch.long
)
# initialize past_key_value_states with `None` if past does not exist
if past_key_value_states is None:
past_key_value_states = [None] * len(self.block)
# ourselves in which case we just need to make it broadcastable to all heads.
extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape, self.device)
extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape, inputs_embeds.device)
if self.is_decoder and encoder_attention_mask is not None:
encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
@@ -733,6 +739,7 @@ class T5Stack(T5PreTrainedModel):
# layer_outputs is a tuple with:
# hidden-states, key-value-states, (self-attention weights), (self-attention position bias), (cross-attention weights), (cross-attention position bias)
hidden_states, present_key_value_state = layer_outputs[:2]
if i == 0:
# We share the position biases between the layers - the first layer store them
# layer_outputs = hidden-states, key-value-states (self-attention weights), (self-attention position bias), (cross-attention weights), (cross-attention position bias)

View File

@@ -537,7 +537,7 @@ class TFT5MainLayer(tf.keras.layers.Layer):
def call(
self,
input_ids,
inputs,
attention_mask=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
@@ -548,19 +548,19 @@ class TFT5MainLayer(tf.keras.layers.Layer):
training=False,
):
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
input_shape = shape_list(input_ids)
input_ids = tf.reshape(input_ids, (-1, input_shape[-1]))
if inputs is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both inputs and inputs_embeds at the same time")
elif inputs is not None:
input_shape = shape_list(inputs)
inputs = tf.reshape(inputs, (-1, input_shape[-1]))
elif inputs_embeds is not None:
input_shape = shape_list(inputs_embeds)[:-1]
else:
raise ValueError("You have to specify either input_ids or inputs_embeds")
raise ValueError("You have to specify either inputs or inputs_embeds")
if inputs_embeds is None:
assert self.embed_tokens is not None, "You have to intialize the model with valid token embeddings"
inputs_embeds = self.embed_tokens(input_ids)
inputs_embeds = self.embed_tokens(inputs)
batch_size, seq_length = input_shape
@@ -725,11 +725,11 @@ class TFT5PreTrainedModel(TFPreTrainedModel):
@property
def dummy_inputs(self):
input_ids = tf.constant(DUMMY_INPUTS)
inputs = tf.constant(DUMMY_INPUTS)
input_mask = tf.constant(DUMMY_MASK)
dummy_inputs = {
"inputs": input_ids,
"decoder_input_ids": input_ids,
"inputs": inputs,
"decoder_input_ids": inputs,
"decoder_attention_mask": input_mask,
}
return dummy_inputs
@@ -759,11 +759,11 @@ T5_START_DOCSTRING = r""" The T5 model was proposed in
If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
- a single Tensor with input_ids only and nothing else: `model(inputs_ids)
- a single Tensor with inputs only and nothing else: `model(inputs_ids)
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
`model([input_ids, attention_mask])` or `model([input_ids, attention_mask, token_type_ids])`
`model([inputs, attention_mask])` or `model([inputs, attention_mask, token_type_ids])`
- a dictionary with one or several input Tensors associaed to the input names given in the docstring:
`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})`
`model({'inputs': inputs, 'token_type_ids': token_type_ids})`
Parameters:
config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model.
@@ -780,7 +780,7 @@ T5_INPUTS_DOCSTRING = r"""
T5 is a model with relative position embeddings so you should be able to pad the inputs on
the right or the left.
Indices can be obtained using :class:`transformers.T5Tokenizer`.
To know more on how to prepare :obj:`input_ids` for pre-training take a look at
To know more on how to prepare :obj:`inputs` for pre-training take a look at
`T5 Training <./t5.html#training>`_ .
See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
@@ -805,8 +805,8 @@ T5_INPUTS_DOCSTRING = r"""
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
If `use_cache` is True, `decoder_past_key_value_states` are returned and can be used to speed up decoding (see `decoder_past_key_value_states`).
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
Optionally, instead of passing :obj:`inputs` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `inputs` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (:obj:`tf.Tensor` of shape :obj:`(batch_size, target_sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`):
Optionally, instead of passing :obj:`decoder_input_ids` you can choose to directly pass an embedded representation.
@@ -885,8 +885,8 @@ class TFT5Model(TFT5PreTrainedModel):
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = TFT5Model.from_pretrained('t5-small')
input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="tf") # Batch size 1
outputs = model(input_ids, decoder_input_ids=input_ids)
inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="tf") # Batch size 1
outputs = model(inputs, decoder_input_ids=inputs)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
"""
@@ -897,7 +897,7 @@ class TFT5Model(TFT5PreTrainedModel):
kwargs["inputs"] = inputs
# retrieve arguments
input_ids = kwargs.get("inputs", None)
inputs = kwargs.get("inputs", None)
inputs_embeds = kwargs.get("inputs_embeds", None)
attention_mask = kwargs.get("attention_mask", None)
encoder_outputs = kwargs.get("encoder_outputs", None)
@@ -911,7 +911,7 @@ class TFT5Model(TFT5PreTrainedModel):
# Encode if needed (training, first prediction pass)
if encoder_outputs is None:
encoder_outputs = self.encoder(
input_ids, attention_mask=attention_mask, inputs_embeds=inputs_embeds, head_mask=head_mask,
inputs, attention_mask=attention_mask, inputs_embeds=inputs_embeds, head_mask=head_mask,
)
hidden_states = encoder_outputs[0]
@@ -1006,14 +1006,14 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = TFT5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="tf") # Batch size 1
outputs = model(input_ids, decoder_input_ids=input_ids)
inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="tf") # Batch size 1
outputs = model(inputs, decoder_input_ids=inputs)
prediction_scores = outputs[0]
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = TFT5ForConditionalGeneration.from_pretrained('t5-small')
input_ids = tokenizer.encode("summarize: Hello, my dog is cute", return_tensors="tf") # Batch size 1
model.generate(input_ids)
inputs = tokenizer.encode("summarize: Hello, my dog is cute", return_tensors="tf") # Batch size 1
model.generate(inputs)
"""
@@ -1023,7 +1023,7 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
kwargs["inputs"] = inputs
# retrieve arguments
input_ids = kwargs.get("inputs", None)
inputs = kwargs.get("inputs", None)
decoder_input_ids = kwargs.get("decoder_input_ids", None)
attention_mask = kwargs.get("attention_mask", None)
encoder_outputs = kwargs.get("encoder_outputs", None)
@@ -1038,7 +1038,7 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
if encoder_outputs is None:
# Convert encoder inputs in embeddings if needed
encoder_outputs = self.encoder(
input_ids, attention_mask=attention_mask, inputs_embeds=inputs_embeds, head_mask=head_mask,
inputs, attention_mask=attention_mask, inputs_embeds=inputs_embeds, head_mask=head_mask,
)
hidden_states = encoder_outputs[0]
@@ -1076,7 +1076,7 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
return decoder_outputs + encoder_outputs
def prepare_inputs_for_generation(self, input_ids, past, attention_mask, use_cache, **kwargs):
def prepare_inputs_for_generation(self, inputs, past, attention_mask, use_cache, **kwargs):
assert past is not None, "past has to be defined for encoder_outputs"
# first step
@@ -1087,7 +1087,7 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
return {
"inputs": None, # inputs don't have to be defined, but still need to be passed to make Keras.layer.__call__ happy
"decoder_input_ids": input_ids, # input_ids are the decoder_input_ids
"decoder_input_ids": inputs, # inputs are the decoder_input_ids
"decoder_past_key_value_states": decoder_past_key_value_states,
"encoder_outputs": encoder_outputs,
"attention_mask": attention_mask,

View File

@@ -929,7 +929,9 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
else:
tokens_to_add = next_token
# add token and increase length by one
input_ids = tf.concat([input_ids, tf.expand_dims(tokens_to_add, -1)], 1)
cur_len = cur_len + 1
if eos_token_id is not None:
eos_in_sents = tokens_to_add == eos_token_id
@@ -955,8 +957,6 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
[attention_mask, tf.ones((shape_list(attention_mask)[0], 1), dtype=tf.int32)], axis=-1
)
cur_len = cur_len + 1
# if there are different sentences lengths in the batch, some batches have to be padded
min_sent_length = tf.math.reduce_min(sent_lengths)
max_sent_length = tf.math.reduce_max(sent_lengths)
@@ -970,7 +970,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
tf.expand_dims(sent_lengths, -1), [batch_size, max_sent_length]
)
broad_casted_range = tf.transpose(
tf.broadcast_to(tf.expand_dims(tf.range(max_length), -1), [max_length, batch_size])
tf.broadcast_to(tf.expand_dims(tf.range(max_sent_length), -1), [max_sent_length, batch_size])
)
decoded = tf.where(broad_casted_range < broad_casted_sent_lengths, input_ids, padding)
@@ -1205,9 +1205,11 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
beam_tokens = tf.convert_to_tensor([x[1] for x in next_batch_beam], dtype=tf.int32)
beam_idx = tf.convert_to_tensor([x[2] for x in next_batch_beam], dtype=tf.int32)
# re-order batch
# re-order batch and update current length
input_ids = tf.stack([tf.identity(input_ids[x, :]) for x in beam_idx])
input_ids = tf.concat([input_ids, tf.expand_dims(beam_tokens, 1)], axis=-1)
cur_len = cur_len + 1
# re-order internal states
if past is not None:
past = self._reorder_cache(past, beam_idx)
@@ -1218,9 +1220,6 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
[attention_mask, tf.ones((shape_list(attention_mask)[0], 1), dtype=tf.int32)], axis=-1
)
# update current length
cur_len = cur_len + 1
# finalize all open beam hypotheses and end to generated hypotheses
for batch_idx in range(batch_size):
# Add all open beam hypothesis to generated_hyps

View File

@@ -17,7 +17,7 @@
import inspect
import logging
import os
from typing import Callable, Tuple
from typing import Callable, List, Tuple
import torch
from torch import Tensor, device, dtype, nn
@@ -110,11 +110,33 @@ class ModuleUtilsMixin:
@property
def device(self) -> device:
return next(self.parameters()).device
try:
return next(self.parameters()).device
except StopIteration:
# For nn.DataParallel compatibility in PyTorch 1.5
def find_tensor_attributes(module: nn.Module) -> List[Tuple[str, Tensor]]:
tuples = [(k, v) for k, v in module.__dict__.items() if torch.is_tensor(v)]
return tuples
gen = self._named_members(get_members_fn=find_tensor_attributes)
first_tuple = next(gen)
return first_tuple[1].device
@property
def dtype(self) -> dtype:
return next(self.parameters()).dtype
try:
return next(self.parameters()).dtype
except StopIteration:
# For nn.DataParallel compatibility in PyTorch 1.5
def find_tensor_attributes(module: nn.Module) -> List[Tuple[str, Tensor]]:
tuples = [(k, v) for k, v in module.__dict__.items() if torch.is_tensor(v)]
return tuples
gen = self._named_members(get_members_fn=find_tensor_attributes)
first_tuple = next(gen)
return first_tuple[1].dtype
def invert_attention_mask(self, encoder_attention_mask: Tensor) -> Tensor:
"""type: torch.Tensor -> torch.Tensor"""
@@ -128,7 +150,18 @@ class ModuleUtilsMixin:
# encoder_extended_attention_mask = (encoder_extended_attention_mask ==
# encoder_extended_attention_mask.transpose(-1, -2))
encoder_extended_attention_mask = encoder_extended_attention_mask.to(dtype=self.dtype) # fp16 compatibility
encoder_extended_attention_mask = (1.0 - encoder_extended_attention_mask) * -1e9
if self.dtype == torch.float16:
encoder_extended_attention_mask = (1.0 - encoder_extended_attention_mask) * -1e4
elif self.dtype == torch.float32:
encoder_extended_attention_mask = (1.0 - encoder_extended_attention_mask) * -1e9
else:
raise ValueError(
"{} not recognized. `dtype` should be set to either `torch.float32` or `torch.float16`".format(
self.dtype
)
)
return encoder_extended_attention_mask
def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: tuple, device: device):
@@ -737,7 +770,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
import torch_xla.core.xla_model as xm
model = xm.send_cpu_data_to_device(model, xm.xla_device())
model = model.to(xm.xla_device())
model.to(xm.xla_device())
return model
@@ -1236,13 +1269,15 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
else:
tokens_to_add = next_token
# add token and increase length by one
input_ids = torch.cat([input_ids, tokens_to_add.unsqueeze(-1)], dim=-1)
cur_len = cur_len + 1
if eos_token_id is not None:
eos_in_sents = tokens_to_add == eos_token_id
# if sentence is unfinished and the token to add is eos, sent_lengths is filled with current length
is_sents_unfinished_and_token_to_add_is_eos = unfinished_sents.mul(eos_in_sents.long()).bool()
sent_lengths.masked_fill_(is_sents_unfinished_and_token_to_add_is_eos, cur_len + 1)
sent_lengths.masked_fill_(is_sents_unfinished_and_token_to_add_is_eos, cur_len)
# unfinished_sents is set to zero if eos in sentence
unfinished_sents.mul_((~eos_in_sents).long())
@@ -1256,8 +1291,6 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
[attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
)
cur_len = cur_len + 1
# if there are different sentences lengths in the batch, some batches have to be padded
if sent_lengths.min().item() != sent_lengths.max().item():
assert pad_token_id is not None, "`Pad_token_id` has to be defined if batches have different lengths"
@@ -1473,9 +1506,11 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
beam_tokens = input_ids.new([x[1] for x in next_batch_beam])
beam_idx = input_ids.new([x[2] for x in next_batch_beam])
# re-order batch
# re-order batch and update current length
input_ids = input_ids[beam_idx, :]
input_ids = torch.cat([input_ids, beam_tokens.unsqueeze(1)], dim=-1)
cur_len = cur_len + 1
# re-order internal states
if past is not None:
past = self._reorder_cache(past, beam_idx)
@@ -1486,9 +1521,6 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
[attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
)
# update current length
cur_len = cur_len + 1
# finalize all open beam hypotheses and end to generated hypotheses
for batch_idx in range(batch_size):
if done[batch_idx]:

View File

@@ -623,7 +623,7 @@ class XLNetModel(XLNetPreTrainedModel):
mask_lo = torch.tril(attn_mask, diagonal=-1)
ret = torch.cat([ret[:, :qlen] + mask_lo, ret[:, qlen:]], dim=1)
ret = ret.to(next(self.parameters()))
ret = ret.to(self.device)
return ret
def cache_mem(self, curr_out, prev_mem):
@@ -685,7 +685,7 @@ class XLNetModel(XLNetPreTrainedModel):
fwd_pos_seq = fwd_pos_seq.clamp(-self.clamp_len, self.clamp_len)
pos_emb = self.positional_embedding(fwd_pos_seq, inv_freq, bsz)
pos_emb = pos_emb.to(next(self.parameters()))
pos_emb = pos_emb.to(self.device)
return pos_emb
@add_start_docstrings_to_callable(XLNET_INPUTS_DOCSTRING)
@@ -761,8 +761,8 @@ class XLNetModel(XLNetPreTrainedModel):
mlen = mems[0].shape[0] if mems is not None and mems[0] is not None else 0
klen = mlen + qlen
dtype_float = next(self.parameters()).dtype
device = next(self.parameters()).device
dtype_float = self.dtype
device = self.device
# Attention mask
# causal attention mask

View File

@@ -152,8 +152,8 @@ class AdamW(Optimizer):
# Decay the first and second moment running average coefficient
# In-place operations to update the averages at the same time
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
exp_avg_sq.mul_(beta2).addcmul_(1.0 - beta2, grad, grad)
exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
denom = exp_avg_sq.sqrt().add_(group["eps"])
step_size = group["lr"]
@@ -162,7 +162,7 @@ class AdamW(Optimizer):
bias_correction2 = 1.0 - beta2 ** state["step"]
step_size = step_size * math.sqrt(bias_correction2) / bias_correction1
p.data.addcdiv_(-step_size, exp_avg, denom)
p.data.addcdiv_(exp_avg, denom, value=-step_size)
# Just adding the square of the weights to the loss function is *not*
# the correct way of using L2 regularization/weight decay with Adam,
@@ -173,6 +173,6 @@ class AdamW(Optimizer):
# of the weights to the loss with plain (non-momentum) SGD.
# Add weight decay at the end (fixed version)
if group["weight_decay"] > 0.0:
p.data.add_(-group["lr"] * group["weight_decay"], p.data)
p.data.add_(p.data, alpha=-group["lr"] * group["weight_decay"])
return loss

View File

@@ -75,7 +75,7 @@ def create_optimizer(init_lr, num_train_steps, num_warmup_steps, end_lr=0.0, opt
beta_1=0.9,
beta_2=0.999,
epsilon=1e-6,
exclude_from_weight_decay=["layer_norm", "bias"],
exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"],
)
return optimizer
@@ -217,7 +217,7 @@ class GradientAccumulator(object):
"""The accumulated gradients on the current replica."""
if not self._gradients:
raise ValueError("The accumulator should be called first to initialize the gradients")
return list(gradient.value() for gradient in self._gradients)
return list(gradient.value() if gradient is not None else gradient for gradient in self._gradients)
def __call__(self, gradients):
"""Accumulates :obj:`gradients` on the current replica."""
@@ -231,6 +231,8 @@ class GradientAccumulator(object):
synchronization=tf.VariableSynchronization.ON_READ,
aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
)
if gradient is not None
else gradient
for gradient in gradients
]
)
@@ -238,7 +240,8 @@ class GradientAccumulator(object):
raise ValueError("Expected %s gradients, but got %d" % (len(self._gradients), len(gradients)))
for accum_gradient, gradient in zip(self._gradients, gradients):
accum_gradient.assign_add(gradient)
if accum_gradient is not None and gradient is not None:
accum_gradient.assign_add(gradient)
self._accum_steps.assign_add(1)
@@ -248,4 +251,5 @@ class GradientAccumulator(object):
return
self._accum_steps.assign(0)
for gradient in self._gradients:
gradient.assign(tf.zeros_like(gradient))
if gradient is not None:
gradient.assign(tf.zeros_like(gradient))

View File

@@ -24,7 +24,7 @@ from abc import ABC, abstractmethod
from contextlib import contextmanager
from itertools import chain
from os.path import abspath, exists
from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple, Union
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Sequence, Tuple, Union
import numpy as np
@@ -58,6 +58,10 @@ if is_torch_available():
AutoModelWithLMHead,
)
if TYPE_CHECKING:
from .modeling_utils import PreTrainedModel
from .modeling_tf_utils import TFPreTrainedModel
logger = logging.getLogger(__name__)
@@ -864,6 +868,7 @@ class NerPipeline(Pipeline):
binary_output: bool = False,
ignore_labels=["O"],
task: str = "",
grouped_entities: bool = False,
):
super().__init__(
model=model,
@@ -878,6 +883,7 @@ class NerPipeline(Pipeline):
self._basic_tokenizer = BasicTokenizer(do_lower_case=False)
self.ignore_labels = ignore_labels
self.grouped_entities = grouped_entities
def __call__(self, *args, **kwargs):
inputs = self._args_parser(*args, **kwargs)
@@ -907,23 +913,74 @@ class NerPipeline(Pipeline):
score = np.exp(entities) / np.exp(entities).sum(-1, keepdims=True)
labels_idx = score.argmax(axis=-1)
answer = []
for idx, label_idx in enumerate(labels_idx):
if self.model.config.id2label[label_idx] not in self.ignore_labels:
answer += [
{
"word": self.tokenizer.convert_ids_to_tokens(int(input_ids[idx])),
"score": score[idx][label_idx].item(),
"entity": self.model.config.id2label[label_idx],
}
]
entities = []
entity_groups = []
entity_group_disagg = []
# Filter to labels not in `self.ignore_labels`
filtered_labels_idx = [
(idx, label_idx)
for idx, label_idx in enumerate(labels_idx)
if self.model.config.id2label[label_idx] not in self.ignore_labels
]
for idx, label_idx in filtered_labels_idx:
entity = {
"word": self.tokenizer.convert_ids_to_tokens(int(input_ids[idx])),
"score": score[idx][label_idx].item(),
"entity": self.model.config.id2label[label_idx],
"index": idx,
}
last_idx, _ = filtered_labels_idx[-1]
if self.grouped_entities:
if not entity_group_disagg:
entity_group_disagg += [entity]
if idx == last_idx:
entity_groups += [self.group_entities(entity_group_disagg)]
continue
# If the current entity is similar and adjacent to the previous entity, append it to the disaggregated entity group
if (
entity["entity"] == entity_group_disagg[-1]["entity"]
and entity["index"] == entity_group_disagg[-1]["index"] + 1
):
entity_group_disagg += [entity]
# Group the entities at the last entity
if idx == last_idx:
entity_groups += [self.group_entities(entity_group_disagg)]
# If the current entity is different from the previous entity, aggregate the disaggregated entity group
else:
entity_groups += [self.group_entities(entity_group_disagg)]
entity_group_disagg = [entity]
entities += [entity]
# Append
answers += [answer]
if self.grouped_entities:
answers += [entity_groups]
else:
answers += [entities]
if len(answers) == 1:
return answers[0]
return answers
def group_entities(self, entities):
"""
Returns grouped entities
"""
# Get the last entity in the entity group
entity = entities[-1]["entity"]
scores = np.mean([entity["score"] for entity in entities])
tokens = [entity["word"] for entity in entities]
entity_group = {
"entity_group": entity,
"score": np.mean(scores),
"word": self.tokenizer.convert_tokens_to_string(tokens),
}
return entity_group
TokenClassificationPipeline = NerPipeline
@@ -1509,7 +1566,7 @@ class TranslationPipeline(Pipeline):
return results
# Register all the supported task here
# Register all the supported tasks here
SUPPORTED_TASKS = {
"feature-extraction": {
"impl": FeatureExtractionPipeline,
@@ -1572,9 +1629,9 @@ SUPPORTED_TASKS = {
"tf": TFAutoModelWithLMHead if is_tf_available() else None,
"pt": AutoModelWithLMHead if is_torch_available() else None,
"default": {
"model": {"pt": "bart-large-cnn", "tf": None},
"model": {"pt": "bart-large-cnn", "tf": "t5-small"},
"config": None,
"tokenizer": ("bart-large-cnn", {"use_fast": False}),
"tokenizer": {"pt": ("bart-large-cnn", {"use_fast": False}), "tf": "t5-small"},
},
},
"translation_en_to_fr": {

View File

@@ -29,6 +29,7 @@ from .configuration_auto import (
ElectraConfig,
FlaubertConfig,
GPT2Config,
LongformerConfig,
OpenAIGPTConfig,
ReformerConfig,
RobertaConfig,
@@ -50,6 +51,7 @@ from .tokenization_distilbert import DistilBertTokenizer, DistilBertTokenizerFas
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
from .tokenization_flaubert import FlaubertTokenizer
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
from .tokenization_longformer import LongformerTokenizer
from .tokenization_marian import MarianTokenizer
from .tokenization_openai import OpenAIGPTTokenizer, OpenAIGPTTokenizerFast
from .tokenization_reformer import ReformerTokenizer
@@ -73,6 +75,7 @@ TOKENIZER_MAPPING = OrderedDict(
(XLMRobertaConfig, (XLMRobertaTokenizer, None)),
(MarianConfig, (MarianTokenizer, None)),
(BartConfig, (BartTokenizer, None)),
(LongformerConfig, (LongformerTokenizer, None)),
(RobertaConfig, (RobertaTokenizer, RobertaTokenizerFast)),
(ReformerConfig, (ReformerTokenizer, None)),
(ElectraConfig, (ElectraTokenizer, ElectraTokenizerFast)),
@@ -105,6 +108,7 @@ class AutoTokenizer:
- contains `albert`: AlbertTokenizer (ALBERT model)
- contains `camembert`: CamembertTokenizer (CamemBERT model)
- contains `xlm-roberta`: XLMRobertaTokenizer (XLM-RoBERTa model)
- contains `longformer`: LongformerTokenizer (AllenAI Longformer model)
- contains `roberta`: RobertaTokenizer (RoBERTa model)
- contains `bert`: BertTokenizer (Bert model)
- contains `openai-gpt`: OpenAIGPTTokenizer (OpenAI GPT model)
@@ -136,6 +140,7 @@ class AutoTokenizer:
- contains `albert`: AlbertTokenizer (ALBERT model)
- contains `camembert`: CamembertTokenizer (CamemBERT model)
- contains `xlm-roberta`: XLMRobertaTokenizer (XLM-RoBERTa model)
- contains `longformer`: LongformerTokenizer (AllenAI Longformer model)
- contains `roberta`: RobertaTokenizer (RoBERTa model)
- contains `bert-base-japanese`: BertJapaneseTokenizer (Bert model)
- contains `bert`: BertTokenizer (Bert model)

View File

@@ -27,8 +27,6 @@ vocab_url = "https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-v
merges_url = "https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt"
_all_bart_models = ["bart-large", "bart-large-mnli", "bart-large-cnn", "bart-large-xsum"]
VOCAB_FILES_NAMES = {"vocab_file": "sentence.bpe.model"}
class BartTokenizer(RobertaTokenizer):
# merges and vocab same as Roberta
@@ -44,6 +42,6 @@ SPM_URL = "https://s3.amazonaws.com/models.huggingface.co/bert/facebook/mbart-la
class MBartTokenizer(XLMRobertaTokenizer):
vocab_files_names = VOCAB_FILES_NAMES
vocab_files_names = {"vocab_file": "sentencepiece.bpe.model"}
max_model_input_sizes = {m: 1024 for m in _all_mbart_models}
pretrained_vocab_files_map = {"vocab_file": {m: SPM_URL for m in _all_mbart_models}}

View File

@@ -0,0 +1,42 @@
# coding=utf-8
# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from .tokenization_roberta import RobertaTokenizer
logger = logging.getLogger(__name__)
# vocab and merges same as roberta
vocab_url = "https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json"
merges_url = "https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt"
_all_longformer_models = ["longformer-base-4096", "longformer-large-4096"]
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
"longformer-base-4096": 4096,
"longformer-large-4096": 4096,
}
class LongformerTokenizer(RobertaTokenizer):
# merges and vocab same as Roberta
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
pretrained_vocab_files_map = {
"vocab_file": {m: vocab_url for m in _all_longformer_models},
"merges_file": {m: merges_url for m in _all_longformer_models},
}

View File

@@ -1,7 +1,9 @@
import json
import re
import warnings
from typing import Dict, List, Optional, Union
from pathlib import Path
from shutil import copyfile
from typing import Dict, List, Optional, Tuple, Union
import sentencepiece
@@ -15,7 +17,7 @@ vocab_files_names = {
"vocab": "vocab.json",
"tokenizer_config_file": "tokenizer_config.json",
}
MODEL_NAMES = ("opus-mt-en-de",) # TODO(SS): the only required constant is vocab_files_names
MODEL_NAMES = ("opus-mt-en-de",) # TODO(SS): delete this, the only required constant is vocab_files_names
PRETRAINED_VOCAB_FILES_MAP = {
k: {m: f"{S3_BUCKET_PREFIX}/Helsinki-NLP/{m}/{fname}" for m in MODEL_NAMES}
for k, fname in vocab_files_names.items()
@@ -55,14 +57,16 @@ class MarianTokenizer(PreTrainedTokenizer):
eos_token="</s>",
pad_token="<pad>",
max_len=512,
**kwargs,
):
super().__init__(
# bos_token=bos_token,
# bos_token=bos_token, unused. Start decoding with config.decoder_start_token_id
max_len=max_len,
eos_token=eos_token,
unk_token=unk_token,
pad_token=pad_token,
**kwargs,
)
self.encoder = load_json(vocab)
if self.unk_token not in self.encoder:
@@ -72,21 +76,23 @@ class MarianTokenizer(PreTrainedTokenizer):
self.source_lang = source_lang
self.target_lang = target_lang
self.supported_language_codes: list = [k for k in self.encoder if k.startswith(">>") and k.endswith("<<")]
self.spm_files = [source_spm, target_spm]
# load SentencePiece model for pre-processing
self.spm_source = sentencepiece.SentencePieceProcessor()
self.spm_source.Load(source_spm)
self.spm_target = sentencepiece.SentencePieceProcessor()
self.spm_target.Load(target_spm)
self.spm_source = load_spm(source_spm)
self.spm_target = load_spm(target_spm)
self.current_spm = self.spm_source
# Multilingual target side: default to using first supported language code.
self.supported_language_codes: list = [k for k in self.encoder if k.startswith(">>") and k.endswith("<<")]
self._setup_normalizer()
def _setup_normalizer(self):
try:
from mosestokenizer import MosesPunctuationNormalizer
self.punc_normalizer = MosesPunctuationNormalizer(source_lang)
self.punc_normalizer = MosesPunctuationNormalizer(self.source_lang)
except ImportError:
warnings.warn("Recommended: pip install mosestokenizer")
self.punc_normalizer = lambda x: x
@@ -124,9 +130,6 @@ class MarianTokenizer(PreTrainedTokenizer):
# We don't expect to process pairs, but leave the pair logic for API consistency
return token_ids_0 + token_ids_1 + [self.eos_token_id]
def batch_decode(self, token_ids, **kwargs) -> List[str]:
return [self.decode(ids, **kwargs) for ids in token_ids]
def prepare_translation_batch(
self,
src_texts: List[str],
@@ -179,6 +182,65 @@ class MarianTokenizer(PreTrainedTokenizer):
def vocab_size(self) -> int:
return len(self.encoder)
def save_vocabulary(self, save_directory: str) -> Tuple[str]:
"""save vocab file to json and copy spm files from their original path."""
save_dir = Path(save_directory)
assert save_dir.is_dir(), f"{save_directory} should be a directory"
save_json(self.encoder, save_dir / self.vocab_files_names["vocab"])
for f in self.spm_files:
dest_path = save_dir / Path(f).name
if not dest_path.exists():
copyfile(f, save_dir / Path(f).name)
return tuple(save_dir / f for f in self.vocab_files_names)
def get_vocab(self) -> Dict:
vocab = self.encoder.copy()
vocab.update(self.added_tokens_encoder)
return vocab
def __getstate__(self) -> Dict:
state = self.__dict__.copy()
state.update({k: None for k in ["spm_source", "spm_target", "current_spm", "punc_normalizer"]})
return state
def __setstate__(self, d: Dict) -> None:
self.__dict__ = d
self.spm_source, self.spm_target = (load_spm(f) for f in self.spm_files)
self.current_spm = self.spm_source
self._setup_normalizer()
def num_special_tokens_to_add(self, **unused):
"""Just EOS"""
return 1
def _special_token_mask(self, seq):
all_special_ids = set(self.all_special_ids) # call it once instead of inside list comp
all_special_ids.remove(self.unk_token_id) # <unk> is only sometimes special
return [1 if x in all_special_ids else 0 for x in seq]
def get_special_tokens_mask(
self, token_ids_0: List, token_ids_1: Optional[List] = None, already_has_special_tokens: bool = False
) -> List[int]:
"""Get list where entries are [1] if a token is [eos] or [pad] else 0."""
if already_has_special_tokens:
return self._special_token_mask(token_ids_0)
elif token_ids_1 is None:
return self._special_token_mask(token_ids_0) + [1]
else:
return self._special_token_mask(token_ids_0 + token_ids_1) + [1]
def load_spm(path: str) -> sentencepiece.SentencePieceProcessor:
spm = sentencepiece.SentencePieceProcessor()
spm.Load(path)
return spm
def save_json(data, path: str) -> None:
with open(path, "w") as f:
json.dump(data, f, indent=2)
def load_json(path: str) -> Union[Dict, List]:
with open(path, "r") as f:

View File

@@ -199,7 +199,7 @@ class RobertaTokenizer(GPT2Tokenizer):
if token_ids_1 is not None:
raise ValueError(
"You should not supply a second sequence if the provided sequence of "
"ids is already formated with special tokens for the model."
"ids is already formatted with special tokens for the model."
)
return list(map(lambda x: 1 if x in [self.sep_token_id, self.cls_token_id] else 0, token_ids_0))

View File

@@ -771,26 +771,26 @@ class PreTrainedTokenizer(SpecialTokensMixin):
raise NotImplementedError
@property
def is_fast(self):
def is_fast(self) -> bool:
return False
@property
def max_len(self):
def max_len(self) -> int:
""" Kept here for backward compatibility.
Now renamed to `model_max_length` to avoid ambiguity.
"""
return self.model_max_length
@property
def max_len_single_sentence(self):
def max_len_single_sentence(self) -> int:
return self.model_max_length - self.num_special_tokens_to_add(pair=False)
@property
def max_len_sentences_pair(self):
def max_len_sentences_pair(self) -> int:
return self.model_max_length - self.num_special_tokens_to_add(pair=True)
@max_len_single_sentence.setter
def max_len_single_sentence(self, value):
def max_len_single_sentence(self, value) -> int:
""" For backward compatibility, allow to try to setup 'max_len_single_sentence' """
if value == self.model_max_length - self.num_special_tokens_to_add(pair=False):
logger.warning(
@@ -802,7 +802,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
)
@max_len_sentences_pair.setter
def max_len_sentences_pair(self, value):
def max_len_sentences_pair(self, value) -> int:
""" For backward compatibility, allow to try to setup 'max_len_sentences_pair' """
if value == self.model_max_length - self.num_special_tokens_to_add(pair=True):
logger.warning(
@@ -1118,7 +1118,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
return vocab_files + (special_tokens_map_file, added_tokens_file)
def save_vocabulary(self, save_directory):
def save_vocabulary(self, save_directory) -> Tuple[str]:
""" Save the tokenizer vocabulary to a directory. This method does *NOT* save added tokens
and special token mappings.
@@ -1128,7 +1128,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
"""
raise NotImplementedError
def add_tokens(self, new_tokens):
def add_tokens(self, new_tokens: Union[str, List[str]]) -> int:
"""
Add a list of new tokens to the tokenizer class. If the new tokens are not in the
vocabulary, they are added to it with indices starting from length of the current vocabulary.
@@ -1156,7 +1156,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
if not isinstance(new_tokens, list):
new_tokens = [new_tokens]
to_add_tokens = []
tokens_to_add = []
for token in new_tokens:
assert isinstance(token, str)
if self.init_kwargs.get("do_lower_case", False) and token not in self.all_special_tokens:
@@ -1164,18 +1164,18 @@ class PreTrainedTokenizer(SpecialTokensMixin):
if (
token != self.unk_token
and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)
and token not in to_add_tokens
and token not in tokens_to_add
):
to_add_tokens.append(token)
tokens_to_add.append(token)
logger.info("Adding %s to the vocabulary", token)
added_tok_encoder = dict((tok, len(self) + i) for i, tok in enumerate(to_add_tokens))
added_tok_encoder = dict((tok, len(self) + i) for i, tok in enumerate(tokens_to_add))
added_tok_decoder = {v: k for k, v in added_tok_encoder.items()}
self.added_tokens_encoder.update(added_tok_encoder)
self.unique_added_tokens_encoder = set(self.added_tokens_encoder.keys()).union(set(self.all_special_tokens))
self.added_tokens_decoder.update(added_tok_decoder)
return len(to_add_tokens)
return len(tokens_to_add)
def num_special_tokens_to_add(self, pair=False):
"""
@@ -2080,10 +2080,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
def build_inputs_with_special_tokens(self, token_ids_0: List, token_ids_1: Optional[List] = None) -> List:
"""
Build model inputs from a sequence or a pair of sequence for sequence classification tasks
by concatenating and adding special tokens.
A RoBERTa sequence has the following format:
single sequence: <s> X </s>
pair of sequences: <s> A </s></s> B </s>
by concatenating and adding special tokens. This implementation does not add special tokens.
"""
if token_ids_1 is None:
return token_ids_0
@@ -2183,6 +2180,9 @@ class PreTrainedTokenizer(SpecialTokensMixin):
else:
return text
def batch_decode(self, sequences: List[List[int]], **kwargs) -> List[str]:
return [self.decode(seq, **kwargs) for seq in sequences]
@staticmethod
def clean_up_tokenization(out_string: str) -> str:
""" Clean up a list of simple English tokenization artifacts like spaces before punctuations and abreviated forms.

View File

@@ -1,12 +1,13 @@
import json
import logging
import math
import os
import random
import re
import shutil
from contextlib import contextmanager
from pathlib import Path
from typing import Callable, Dict, List, Optional, Tuple, Union
from typing import Callable, Dict, List, Optional, Tuple
import numpy as np
import torch
@@ -14,7 +15,7 @@ from torch import nn
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import Dataset
from torch.utils.data.distributed import DistributedSampler
from torch.utils.data.sampler import RandomSampler
from torch.utils.data.sampler import RandomSampler, Sampler, SequentialSampler
from tqdm.auto import tqdm, trange
from .data.data_collator import DataCollator, DefaultDataCollator
@@ -89,7 +90,7 @@ def set_seed(seed: int):
@contextmanager
def torch_distributed_zero_first(local_rank: int):
"""
Decorator to make all processes in distributed training wait for the first one (locally) to do something.
Decorator to make all processes in distributed training wait for each local_master to do something.
"""
if local_rank not in [-1, 0]:
torch.distributed.barrier()
@@ -98,6 +99,50 @@ def torch_distributed_zero_first(local_rank: int):
torch.distributed.barrier()
class SequentialDistributedSampler(Sampler):
"""
Distributed Sampler that subsamples indicies sequentially,
making it easier to collate all results at the end.
Even though we only use this sampler for eval and predict (no training),
which means that the model params won't have to be synced (i.e. will not hang
for synchronization even if varied number of forward passes), we still add extra
samples to the sampler to make it evenly divisible (like in `DistributedSampler`)
to make it easy to `gather` or `reduce` resulting tensors at the end of the loop.
"""
def __init__(self, dataset, num_replicas=None, rank=None):
if num_replicas is None:
if not torch.distributed.is_available():
raise RuntimeError("Requires distributed package to be available")
num_replicas = torch.distributed.get_world_size()
if rank is None:
if not torch.distributed.is_available():
raise RuntimeError("Requires distributed package to be available")
rank = torch.distributed.get_rank()
self.dataset = dataset
self.num_replicas = num_replicas
self.rank = rank
self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
self.total_size = self.num_samples * self.num_replicas
def __iter__(self):
indices = list(range(len(self.dataset)))
# add extra samples to make it evenly divisible
indices += indices[: (self.total_size - len(indices))]
assert len(indices) == self.total_size
# subsample
indices = indices[self.rank * self.num_samples : (self.rank + 1) * self.num_samples]
assert len(indices) == self.num_samples
return iter(indices)
def __len__(self):
return self.num_samples
def get_tpu_sampler(dataset: Dataset):
if xm.xrt_world_size() <= 1:
return RandomSampler(dataset)
@@ -142,7 +187,7 @@ class Trainer:
prediction_loss_only:
(Optional) in evaluation and prediction, only return the loss
"""
self.model = model
self.model = model.to(args.device)
self.args = args
if data_collator is not None:
self.data_collator = data_collator
@@ -155,7 +200,7 @@ class Trainer:
self.optimizers = optimizers
if tb_writer is not None:
self.tb_writer = tb_writer
elif is_tensorboard_available() and self.args.local_rank in [-1, 0]:
elif is_tensorboard_available() and self.is_world_master():
self.tb_writer = SummaryWriter(log_dir=self.args.logging_dir)
if not is_tensorboard_available():
logger.warning(
@@ -170,7 +215,7 @@ class Trainer:
)
set_seed(self.args.seed)
# Create output directory if needed
if self.is_local_master():
if self.is_world_master():
os.makedirs(self.args.output_dir, exist_ok=True)
if is_tpu_available():
# Set an xla_device flag on the model's config.
@@ -196,9 +241,6 @@ class Trainer:
collate_fn=self.data_collator.collate_batch,
)
if is_tpu_available():
data_loader = pl.ParallelLoader(data_loader, [self.args.device]).per_device_loader(self.args.device)
return data_loader
def get_eval_dataloader(self, eval_dataset: Optional[Dataset] = None) -> DataLoader:
@@ -207,36 +249,42 @@ class Trainer:
eval_dataset = eval_dataset if eval_dataset is not None else self.eval_dataset
sampler = get_tpu_sampler(eval_dataset) if is_tpu_available() else None
if is_tpu_available():
sampler = SequentialDistributedSampler(
eval_dataset, num_replicas=xm.xrt_world_size(), rank=xm.get_ordinal()
)
elif self.args.local_rank != -1:
sampler = SequentialDistributedSampler(eval_dataset)
else:
sampler = SequentialSampler(eval_dataset)
data_loader = DataLoader(
eval_dataset,
sampler=sampler,
batch_size=self.args.eval_batch_size,
shuffle=False,
collate_fn=self.data_collator.collate_batch,
)
if is_tpu_available():
data_loader = pl.ParallelLoader(data_loader, [self.args.device]).per_device_loader(self.args.device)
return data_loader
def get_test_dataloader(self, test_dataset: Dataset) -> DataLoader:
# We use the same batch_size as for eval.
sampler = get_tpu_sampler(test_dataset) if is_tpu_available() else None
if is_tpu_available():
sampler = SequentialDistributedSampler(
test_dataset, num_replicas=xm.xrt_world_size(), rank=xm.get_ordinal()
)
elif self.args.local_rank != -1:
sampler = SequentialDistributedSampler(test_dataset)
else:
sampler = SequentialSampler(test_dataset)
data_loader = DataLoader(
test_dataset,
sampler=sampler,
batch_size=self.args.eval_batch_size,
shuffle=False,
collate_fn=self.data_collator.collate_batch,
)
if is_tpu_available():
data_loader = pl.ParallelLoader(data_loader, [self.args.device]).per_device_loader(self.args.device)
return data_loader
def get_optimizers(
@@ -293,15 +341,11 @@ class Trainer:
self.model, log=os.getenv("WANDB_WATCH", "gradients"), log_freq=max(100, self.args.logging_steps)
)
def num_examples(self, dataloader: Union[DataLoader, "pl.PerDeviceLoader"]) -> int:
def num_examples(self, dataloader: DataLoader) -> int:
"""
Helper to get num of examples from a DataLoader, by accessing its Dataset.
"""
if is_tpu_available():
assert isinstance(dataloader, pl.PerDeviceLoader)
return len(dataloader._loader._loader.dataset)
else:
return len(dataloader.dataset)
return len(dataloader.dataset)
def train(self, model_path: Optional[str] = None):
"""
@@ -331,11 +375,12 @@ class Trainer:
and os.path.isfile(os.path.join(model_path, "scheduler.pt"))
):
# Load in optimizer and scheduler states
optimizer.load_state_dict(torch.load(os.path.join(model_path, "optimizer.pt")))
optimizer.load_state_dict(
torch.load(os.path.join(model_path, "optimizer.pt"), map_location=self.args.device)
)
scheduler.load_state_dict(torch.load(os.path.join(model_path, "scheduler.pt")))
model = self.model
model.to(self.args.device)
if self.args.fp16:
if not is_apex_available():
raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
@@ -404,7 +449,17 @@ class Trainer:
epochs_trained, int(num_train_epochs), desc="Epoch", disable=not self.is_local_master()
)
for epoch in train_iterator:
epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=not self.is_local_master())
if isinstance(train_dataloader, DataLoader) and isinstance(train_dataloader.sampler, DistributedSampler):
train_dataloader.sampler.set_epoch(epoch)
if is_tpu_available():
parallel_loader = pl.ParallelLoader(train_dataloader, [self.args.device]).per_device_loader(
self.args.device
)
epoch_iterator = tqdm(parallel_loader, desc="Iteration", disable=not self.is_local_master())
else:
epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=not self.is_local_master())
for step, inputs in enumerate(epoch_iterator):
# Skip past any already trained steps if resuming training
@@ -434,37 +489,43 @@ class Trainer:
self.global_step += 1
self.epoch = epoch + (step + 1) / len(epoch_iterator)
if self.is_local_master():
if (self.args.logging_steps > 0 and self.global_step % self.args.logging_steps == 0) or (
self.global_step == 1 and self.args.logging_first_step
):
logs: Dict[str, float] = {}
logs["loss"] = (tr_loss - logging_loss) / self.args.logging_steps
logs["learning_rate"] = scheduler.get_last_lr()[0]
logging_loss = tr_loss
if (self.args.logging_steps > 0 and self.global_step % self.args.logging_steps == 0) or (
self.global_step == 1 and self.args.logging_first_step
):
logs: Dict[str, float] = {}
logs["loss"] = (tr_loss - logging_loss) / self.args.logging_steps
logs["learning_rate"] = (
scheduler.get_last_lr()[0]
)
logging_loss = tr_loss
self._log(logs)
self._log(logs)
if self.args.evaluate_during_training:
self.evaluate()
if self.args.evaluate_during_training:
self.evaluate()
if self.args.save_steps > 0 and self.global_step % self.args.save_steps == 0:
# In all cases (even distributed/parallel), self.model is always a reference
# to the model we want to save.
if hasattr(model, "module"):
assert model.module is self.model
else:
assert model is self.model
# Save model checkpoint
output_dir = os.path.join(
self.args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}"
)
if self.args.save_steps > 0 and self.global_step % self.args.save_steps == 0:
# In all cases (even distributed/parallel), self.model is always a reference
# to the model we want to save.
if hasattr(model, "module"):
assert model.module is self.model
else:
assert model is self.model
# Save model checkpoint
output_dir = os.path.join(self.args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}")
self.save_model(output_dir)
self.save_model(output_dir)
if self.is_world_master():
self._rotate_checkpoints()
if is_tpu_available():
xm.rendezvous("saving_optimizer_states")
xm.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
xm.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
elif self.is_world_master():
torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
logger.info("Saving optimizer and scheduler states to %s", output_dir)
if self.args.max_steps > 0 and self.global_step > self.args.max_steps:
epoch_iterator.close()
@@ -540,11 +601,30 @@ class Trainer:
Saving best-practices: if you use default names for the model,
you can reload it using from_pretrained().
Will only save from the master process.
Will only save from the world_master process (unless in TPUs).
"""
if self.is_world_master():
if is_tpu_available():
self._save_tpu(output_dir)
elif self.is_world_master():
self._save(output_dir)
def _save_tpu(self, output_dir: Optional[str] = None):
output_dir = output_dir if output_dir is not None else self.args.output_dir
logger.info("Saving model checkpoint to %s", output_dir)
if xm.is_master_ordinal():
os.makedirs(output_dir, exist_ok=True)
torch.save(self.args, os.path.join(output_dir, "training_args.bin"))
# Save a trained model and configuration using `save_pretrained()`.
# They can then be reloaded using `from_pretrained()`
if not isinstance(self.model, PreTrainedModel):
raise ValueError("Trainer.model appears to not be a PreTrainedModel")
xm.rendezvous("saving_checkpoint")
self.model.save_pretrained(output_dir)
def _save(self, output_dir: Optional[str] = None):
output_dir = output_dir if output_dir is not None else self.args.output_dir
os.makedirs(output_dir, exist_ok=True)
@@ -627,6 +707,7 @@ class Trainer:
In that case, this method will also return metrics, like in evaluate().
"""
test_dataloader = self.get_test_dataloader(test_dataset)
return self._prediction_loop(test_dataloader, description="Prediction")
def _prediction_loop(
@@ -640,27 +721,29 @@ class Trainer:
prediction_loss_only = prediction_loss_only if prediction_loss_only is not None else self.prediction_loss_only
model = self.model
# multi-gpu eval
if self.args.n_gpu > 1 and not isinstance(self.model, torch.nn.DataParallel):
model = torch.nn.DataParallel(self.model)
if self.args.n_gpu > 1:
model = torch.nn.DataParallel(model)
else:
model = self.model
model.to(self.args.device)
# Note: in torch.distributed mode, there's no point in wrapping the model
# inside a DistributedDataParallel as we'll be under `no_grad` anyways.
if is_tpu_available():
batch_size = dataloader._loader._loader.batch_size
else:
batch_size = dataloader.batch_size
batch_size = dataloader.batch_size
logger.info("***** Running %s *****", description)
logger.info(" Num examples = %d", self.num_examples(dataloader))
logger.info(" Batch size = %d", batch_size)
eval_losses: List[float] = []
preds: np.ndarray = None
label_ids: np.ndarray = None
preds: torch.Tensor = None
label_ids: torch.Tensor = None
model.eval()
if is_tpu_available():
dataloader = pl.ParallelLoader(dataloader, [self.args.device]).per_device_loader(self.args.device)
for inputs in tqdm(dataloader, desc=description):
has_labels = any(inputs.get(k) is not None for k in ["labels", "masked_lm_labels"])
has_labels = any(inputs.get(k) is not None for k in ["labels", "lm_labels", "masked_lm_labels"])
for k, v in inputs.items():
inputs[k] = v.to(self.args.device)
@@ -675,19 +758,33 @@ class Trainer:
if not prediction_loss_only:
if preds is None:
preds = logits.detach().cpu().numpy()
preds = logits.detach()
else:
preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)
preds = torch.cat((preds, logits.detach()), dim=0)
if inputs.get("labels") is not None:
if label_ids is None:
label_ids = inputs["labels"].detach().cpu().numpy()
label_ids = inputs["labels"].detach()
else:
label_ids = np.append(label_ids, inputs["labels"].detach().cpu().numpy(), axis=0)
label_ids = torch.cat((label_ids, inputs["labels"].detach()), dim=0)
if is_tpu_available():
if self.args.local_rank != -1:
# In distributed mode, concatenate all results from all nodes:
if preds is not None:
preds = self.distributed_concat(preds, num_total_examples=self.num_examples(dataloader))
if label_ids is not None:
label_ids = self.distributed_concat(label_ids, num_total_examples=self.num_examples(dataloader))
elif is_tpu_available():
# tpu-comment: Get all predictions and labels from all worker shards of eval dataset
preds = xm.mesh_reduce("eval_preds", preds, np.concatenate)
label_ids = xm.mesh_reduce("eval_out_label_ids", label_ids, np.concatenate)
if preds is not None:
preds = xm.mesh_reduce("eval_preds", preds, torch.cat)
if label_ids is not None:
label_ids = xm.mesh_reduce("eval_label_ids", label_ids, torch.cat)
# Finally, turn the aggregated tensors into numpy arrays.
if preds is not None:
preds = preds.cpu().numpy()
if label_ids is not None:
label_ids = label_ids.cpu().numpy()
if self.compute_metrics is not None and preds is not None and label_ids is not None:
metrics = self.compute_metrics(EvalPrediction(predictions=preds, label_ids=label_ids))
@@ -702,3 +799,15 @@ class Trainer:
metrics[f"eval_{key}"] = metrics.pop(key)
return PredictionOutput(predictions=preds, label_ids=label_ids, metrics=metrics)
def distributed_concat(self, tensor: torch.Tensor, num_total_examples: int) -> torch.Tensor:
assert self.args.local_rank != -1
output_tensors = [tensor.clone() for _ in range(torch.distributed.get_world_size())]
torch.distributed.all_gather(output_tensors, tensor)
concat = torch.cat(output_tensors, dim=0)
# truncate the dummy elements added by SequentialDistributedSampler
output = concat[:num_total_examples]
return output

View File

@@ -141,7 +141,7 @@ class TFTrainer:
self.optimizer = tf.keras.optimizers.get(
{"class_name": self.args.optimizer_name, "config": {"learning_rate": self.args.learning_rate}}
)
logger.info("Created an/a {} optimizer".format(self.optimizer))
logger.info("Created an/a {} optimizer".format(self.args.optimizer_name))
def _create_checkpoint_manager(self, max_to_keep: int = 5, load_model: bool = True) -> None:
"""
@@ -335,12 +335,8 @@ class TFTrainer:
gradient / tf.cast(gradient_scale, gradient.dtype) for gradient in self.gradient_accumulator.gradients
]
gradients = [(tf.clip_by_value(grad, -self.args.max_grad_norm, self.args.max_grad_norm)) for grad in gradients]
vars = self.model.trainable_variables
if self.args.mode in ["token-classification", "question-answering"]:
vars = [var for var in self.model.trainable_variables if "pooler" not in var.name]
self.optimizer.apply_gradients(list(zip(gradients, vars)))
self.optimizer.apply_gradients(list(zip(gradients, self.model.trainable_variables)))
self.gradient_accumulator.reset()
def _accumulate_next_gradients(self):
@@ -375,12 +371,10 @@ class TFTrainer:
def _forward(self, features, labels):
"""Forwards a training example and accumulates the gradients."""
per_example_loss, _ = self._run_model(features, labels, True)
vars = self.model.trainable_variables
if self.args.mode in ["token-classification", "question-answering"]:
vars = [var for var in self.model.trainable_variables if "pooler" not in var.name]
gradients = self.optimizer.get_gradients(per_example_loss, vars)
gradients = tf.gradients(per_example_loss, self.model.trainable_variables)
gradients = [
g if g is not None else tf.zeros_like(v) for g, v in zip(gradients, self.model.trainable_variables)
]
self.gradient_accumulator(gradients)

View File

@@ -80,8 +80,9 @@ class AutoModelTest(unittest.TestCase):
model, loading_info = AutoModelForPreTraining.from_pretrained(model_name, output_loading_info=True)
self.assertIsNotNone(model)
self.assertIsInstance(model, BertForPreTraining)
for value in loading_info.values():
self.assertEqual(len(value), 0)
for key, value in loading_info.items():
# Only one value should not be initialized and in the missing keys.
self.assertEqual(len(value), 1 if key == "missing_keys" else 0)
@slow
def test_lmhead_model_from_pretrained(self):

View File

@@ -231,7 +231,7 @@ class BartTranslationTests(unittest.TestCase):
"""Only load the model if needed."""
if self._model is None:
model = BartForConditionalGeneration.from_pretrained("mbart-large-en-ro")
self._model = model
self._model = model.to(torch_device)
return self._model
@slow
@@ -257,10 +257,7 @@ class BartTranslationTests(unittest.TestCase):
)
}
translated_tokens = model.generate(input_ids=inputs["input_ids"].to(torch_device), num_beams=5,)
decoded = [
self.tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False)
for g in translated_tokens
]
decoded = self.tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
self.assertEqual(expected_translation_romanian, decoded[0])
def test_mbart_enro_config(self):
@@ -576,11 +573,13 @@ class BartModelIntegrationTests(unittest.TestCase):
PGE_ARTICLE = """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
EXPECTED_SUMMARY = "California's largest power company has begun shutting off power to tens of thousands of homes and businesses in the state."
dct = tok.batch_encode_plus([PGE_ARTICLE], max_length=1024, pad_to_max_length=True, return_tensors="pt",)
dct = tok.batch_encode_plus([PGE_ARTICLE], max_length=1024, pad_to_max_length=True, return_tensors="pt",).to(
torch_device
)
hypotheses_batch = model.generate(
input_ids=dct["input_ids"].to(torch_device),
attention_mask=dct["attention_mask"].to(torch_device),
input_ids=dct["input_ids"],
attention_mask=dct["attention_mask"],
num_beams=2,
max_length=62,
min_length=11,
@@ -590,9 +589,7 @@ class BartModelIntegrationTests(unittest.TestCase):
decoder_start_token_id=model.config.eos_token_id,
)
decoded = [
tok.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in hypotheses_batch
]
decoded = tok.batch_decode(hypotheses_batch, skip_special_tokens=True,)
self.assertEqual(EXPECTED_SUMMARY, decoded[0])
def test_xsum_config_generation_params(self):

View File

@@ -30,6 +30,7 @@ class CamembertModelIntegrationTest(unittest.TestCase):
@slow
def test_output_embeds_base_model(self):
model = CamembertModel.from_pretrained("camembert-base")
model.to(torch_device)
input_ids = torch.tensor(
[[5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]], device=torch_device, dtype=torch.long,

View File

@@ -23,7 +23,7 @@ from typing import List
from transformers import is_torch_available
from .utils import require_torch, slow, torch_device
from .utils import require_multigpu, require_torch, slow, torch_device
if is_torch_available():
@@ -758,6 +758,31 @@ class ModelTesterMixin:
return True
return False
@require_multigpu
def test_multigpu_data_parallel_forward(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
# some params shouldn't be scattered by nn.DataParallel
# so just remove them if they are present.
blacklist_non_batched_params = ["head_mask"]
for k in blacklist_non_batched_params:
inputs_dict.pop(k, None)
# move input tensors to cuda:O
for k, v in inputs_dict.items():
if torch.is_tensor(v):
inputs_dict[k] = v.to(0)
for model_class in self.all_model_classes:
model = model_class(config=config)
model.to(0)
model.eval()
# Wrap model in nn.DataParallel
model = torch.nn.DataParallel(model)
with torch.no_grad():
_ = model(**inputs_dict)
global_rng = random.Random()

View File

@@ -41,7 +41,7 @@ class CTRLModelTest(ModelTesterMixin, unittest.TestCase):
def __init__(
self,
parent,
batch_size=13,
batch_size=14,
seq_length=7,
is_training=True,
use_token_type_ids=True,
@@ -219,6 +219,7 @@ class CTRLModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_ctrl(self):
model = CTRLLMHeadModel.from_pretrained("ctrl")
model.to(torch_device)
input_ids = torch.tensor(
[[11859, 0, 1611, 8]], dtype=torch.long, device=torch_device
) # Legal the president is

View File

@@ -30,6 +30,7 @@ if is_torch_available():
ElectraForMaskedLM,
ElectraForTokenClassification,
ElectraForPreTraining,
ElectraForSequenceClassification,
)
from transformers.modeling_electra import ELECTRA_PRETRAINED_MODEL_ARCHIVE_MAP
@@ -242,6 +243,31 @@ class ElectraModelTest(ModelTesterMixin, unittest.TestCase):
self.parent.assertListEqual(list(result["logits"].size()), [self.batch_size, self.seq_length])
self.check_loss_output(result)
def create_and_check_electra_for_sequence_classification(
self,
config,
input_ids,
token_type_ids,
input_mask,
sequence_labels,
token_labels,
choice_labels,
fake_token_labels,
):
config.num_labels = self.num_labels
model = ElectraForSequenceClassification(config)
model.to(torch_device)
model.eval()
loss, logits = model(
input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=sequence_labels
)
result = {
"loss": loss,
"logits": logits,
}
self.parent.assertListEqual(list(result["logits"].size()), [self.batch_size, self.num_labels])
self.check_loss_output(result)
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
@@ -280,6 +306,10 @@ class ElectraModelTest(ModelTesterMixin, unittest.TestCase):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_electra_for_pretraining(*config_and_inputs)
def test_for_sequence_classification(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_electra_for_sequence_classification(*config_and_inputs)
@slow
def test_model_from_pretrained(self):
for model_name in list(ELECTRA_PRETRAINED_MODEL_ARCHIVE_MAP.keys())[:1]:

View File

@@ -329,5 +329,5 @@ class EncoderDecoderModelTest(unittest.TestCase):
@slow
def test_real_bert_model_from_pretrained(self):
model = EncoderDecoderModel.from_pretrained("bert-base-uncased", "bert-base-uncased")
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
self.assertIsNotNone(model)

View File

@@ -46,7 +46,7 @@ class GPT2ModelTest(ModelTesterMixin, unittest.TestCase):
def __init__(
self,
parent,
batch_size=13,
batch_size=14,
seq_length=7,
is_training=True,
use_token_type_ids=True,
@@ -343,6 +343,7 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_gpt2(self):
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.to(torch_device)
input_ids = torch.tensor([[464, 3290]], dtype=torch.long, device=torch_device) # The dog
expected_output_ids = [
464,
@@ -372,6 +373,7 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_distilgpt2(self):
model = GPT2LMHeadModel.from_pretrained("distilgpt2")
model.to(torch_device)
input_ids = torch.tensor([[464, 1893]], dtype=torch.long, device=torch_device) # The president
expected_output_ids = [
464,

View File

@@ -0,0 +1,253 @@
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
from transformers import is_torch_available
from .test_configuration_common import ConfigTester
from .test_modeling_common import ModelTesterMixin, ids_tensor
from .utils import require_torch, slow, torch_device
if is_torch_available():
import torch
from transformers import (
LongformerConfig,
LongformerModel,
LongformerForMaskedLM,
)
class LongformerModelTester(object):
def __init__(
self,
parent,
batch_size=13,
seq_length=7,
is_training=True,
use_input_mask=True,
use_token_type_ids=True,
use_labels=True,
vocab_size=99,
hidden_size=32,
num_hidden_layers=5,
num_attention_heads=4,
intermediate_size=37,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=16,
type_sequence_label_size=2,
initializer_range=0.02,
num_labels=3,
num_choices=4,
scope=None,
attention_window=4,
):
self.parent = parent
self.batch_size = batch_size
self.seq_length = seq_length
self.is_training = is_training
self.use_input_mask = use_input_mask
self.use_token_type_ids = use_token_type_ids
self.use_labels = use_labels
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.intermediate_size = intermediate_size
self.hidden_act = hidden_act
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_size = type_vocab_size
self.type_sequence_label_size = type_sequence_label_size
self.initializer_range = initializer_range
self.num_labels = num_labels
self.num_choices = num_choices
self.scope = scope
self.attention_window = attention_window
# `ModelTesterMixin.test_attention_outputs` is expecting attention tensors to be of size
# [num_attention_heads, encoder_seq_length, encoder_key_length], but LongformerSelfAttention
# returns attention of shape [num_attention_heads, encoder_seq_length, self.attention_window + 1]
# because its local attention only attends to `self.attention_window + 1` locations
self.key_length = self.attention_window + 1
# because of padding `encoder_seq_length`, is different from `seq_length`. Relevant for
# the `test_attention_outputs` and `test_hidden_states_output` tests
self.encoder_seq_length = (
self.seq_length + (self.attention_window - self.seq_length % self.attention_window) % self.attention_window
)
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
input_mask = None
if self.use_input_mask:
input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2)
token_type_ids = None
if self.use_token_type_ids:
token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size)
sequence_labels = None
token_labels = None
choice_labels = None
if self.use_labels:
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
choice_labels = ids_tensor([self.batch_size], self.num_choices)
config = LongformerConfig(
vocab_size=self.vocab_size,
hidden_size=self.hidden_size,
num_hidden_layers=self.num_hidden_layers,
num_attention_heads=self.num_attention_heads,
intermediate_size=self.intermediate_size,
hidden_act=self.hidden_act,
hidden_dropout_prob=self.hidden_dropout_prob,
attention_probs_dropout_prob=self.attention_probs_dropout_prob,
max_position_embeddings=self.max_position_embeddings,
type_vocab_size=self.type_vocab_size,
initializer_range=self.initializer_range,
attention_window=self.attention_window,
)
return config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
def check_loss_output(self, result):
self.parent.assertListEqual(list(result["loss"].size()), [])
def create_and_check_longformer_model(
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
):
model = LongformerModel(config=config)
model.to(torch_device)
model.eval()
sequence_output, pooled_output = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
sequence_output, pooled_output = model(input_ids, token_type_ids=token_type_ids)
sequence_output, pooled_output = model(input_ids)
result = {
"sequence_output": sequence_output,
"pooled_output": pooled_output,
}
self.parent.assertListEqual(
list(result["sequence_output"].size()), [self.batch_size, self.seq_length, self.hidden_size]
)
self.parent.assertListEqual(list(result["pooled_output"].size()), [self.batch_size, self.hidden_size])
def create_and_check_longformer_for_masked_lm(
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
):
model = LongformerForMaskedLM(config=config)
model.to(torch_device)
model.eval()
loss, prediction_scores = model(
input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, masked_lm_labels=token_labels
)
result = {
"loss": loss,
"prediction_scores": prediction_scores,
}
self.parent.assertListEqual(
list(result["prediction_scores"].size()), [self.batch_size, self.seq_length, self.vocab_size]
)
self.check_loss_output(result)
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
config,
input_ids,
token_type_ids,
input_mask,
sequence_labels,
token_labels,
choice_labels,
) = config_and_inputs
inputs_dict = {"input_ids": input_ids, "token_type_ids": token_type_ids, "attention_mask": input_mask}
return config, inputs_dict
@require_torch
class LongformerModelTest(ModelTesterMixin, unittest.TestCase):
test_pruning = False # pruning is not supported
test_headmasking = False # head masking is not supported
test_torchscript = False
all_model_classes = (LongformerForMaskedLM, LongformerModel) if is_torch_available() else ()
def setUp(self):
self.model_tester = LongformerModelTester(self)
self.config_tester = ConfigTester(self, config_class=LongformerConfig, hidden_size=37)
def test_config(self):
self.config_tester.run_common_tests()
def test_longformer_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_longformer_model(*config_and_inputs)
def test_longformer_for_masked_lm(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_longformer_for_masked_lm(*config_and_inputs)
class LongformerModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_no_head(self):
model = LongformerModel.from_pretrained("longformer-base-4096")
model.to(torch_device)
# 'Hello world! ' repeated 1000 times
input_ids = torch.tensor(
[[0] + [20920, 232, 328, 1437] * 1000 + [2]], dtype=torch.long, device=torch_device
) # long input
attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device)
attention_mask[:, [1, 4, 21]] = 2 # Set global attention on a few random positions
output = model(input_ids, attention_mask=attention_mask)[0]
expected_output_sum = torch.tensor(74585.8594, device=torch_device)
expected_output_mean = torch.tensor(0.0243, device=torch_device)
self.assertTrue(torch.allclose(output.sum(), expected_output_sum, atol=1e-4))
self.assertTrue(torch.allclose(output.mean(), expected_output_mean, atol=1e-4))
@slow
def test_inference_masked_lm(self):
model = LongformerForMaskedLM.from_pretrained("longformer-base-4096")
model.to(torch_device)
# 'Hello world! ' repeated 1000 times
input_ids = torch.tensor(
[[0] + [20920, 232, 328, 1437] * 1000 + [2]], dtype=torch.long, device=torch_device
) # long input
loss, prediction_scores = model(input_ids, masked_lm_labels=input_ids)
expected_loss = torch.tensor(0.0620, device=torch_device)
expected_prediction_scores_sum = torch.tensor(-6.1599e08, device=torch_device)
expected_prediction_scores_mean = torch.tensor(-3.0622, device=torch_device)
input_ids = input_ids.to(torch_device)
self.assertTrue(torch.allclose(loss, expected_loss, atol=1e-4))
self.assertTrue(torch.allclose(prediction_scores.sum(), expected_prediction_scores_sum, atol=1e-4))
self.assertTrue(torch.allclose(prediction_scores.mean(), expected_prediction_scores_mean, atol=1e-4))

View File

@@ -129,11 +129,6 @@ class TestMarian_EN_DE_More(MarianIntegrationTest):
max_indices = logits.argmax(-1)
self.tokenizer.batch_decode(max_indices)
def test_tokenizer_equivalence(self):
batch = self.tokenizer.prepare_translation_batch(["I am a small frog"]).to(torch_device)
expected = [38, 121, 14, 697, 38848, 0]
self.assertListEqual(expected, batch.input_ids[0].tolist())
def test_unk_support(self):
t = self.tokenizer
ids = t.prepare_translation_batch(["||"]).to(torch_device).input_ids[0].tolist()

View File

@@ -227,6 +227,7 @@ class OPENAIGPTModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_openai_gpt(self):
model = OpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
model.to(torch_device)
input_ids = torch.tensor([[481, 4735, 544]], dtype=torch.long, device=torch_device) # the president is
expected_output_ids = [
481,

View File

@@ -19,7 +19,7 @@ from transformers import is_torch_available
from .test_configuration_common import ConfigTester
from .test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor
from .utils import require_torch, slow, torch_device
from .utils import require_multigpu, require_torch, slow, torch_device
if is_torch_available():
@@ -448,9 +448,14 @@ class ReformerTesterMixin:
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_reformer_model_fp16_generate(*config_and_inputs)
@require_multigpu
def test_multigpu_data_parallel_forward(self):
# Opt-out of this test.
pass
@require_torch
class ReformerLocalAttnModelTest(ModelTesterMixin, ReformerTesterMixin, unittest.TestCase):
class ReformerLocalAttnModelTest(ReformerTesterMixin, ModelTesterMixin, unittest.TestCase):
all_model_classes = (ReformerModel, ReformerModelWithLMHead) if is_torch_available() else ()
all_generative_model_classes = (ReformerModelWithLMHead,) if is_torch_available() else ()
test_pruning = False
@@ -504,7 +509,7 @@ class ReformerLocalAttnModelTest(ModelTesterMixin, ReformerTesterMixin, unittest
@require_torch
class ReformerLSHAttnModelTest(ModelTesterMixin, unittest.TestCase, ReformerTesterMixin):
class ReformerLSHAttnModelTest(ReformerTesterMixin, ModelTesterMixin, unittest.TestCase):
all_model_classes = (ReformerModel, ReformerModelWithLMHead) if is_torch_available() else ()
all_generative_model_classes = (ReformerModelWithLMHead,) if is_torch_available() else ()
test_pruning = False

View File

@@ -304,6 +304,16 @@ class T5ModelTest(ModelTesterMixin, unittest.TestCase):
output_with_past_cache = model.generate(input_ids[:1], num_beams=2, max_length=5, do_sample=True)
self.parent.assertTrue(torch.all(output_with_past_cache == output_without_past_cache))
def create_and_check_t5_model_fp16_forward(
self, config, input_ids, decoder_input_ids, attention_mask, decoder_attention_mask, lm_labels,
):
model = T5Model(config=config)
model.to(torch_device)
model.half()
model.eval()
output = model(input_ids, decoder_input_ids=input_ids, attention_mask=attention_mask)[0]
self.parent.assertFalse(torch.isnan(output).any().item())
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
@@ -355,6 +365,11 @@ class T5ModelTest(ModelTesterMixin, unittest.TestCase):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_t5_and_check_t5_generate_with_past_key_value_states(*config_and_inputs)
@unittest.skipIf(torch_device == "cpu", "Cant do half precision")
def test_t5_model_fp16_forward(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_t5_model_fp16_forward(*config_and_inputs)
@slow
def test_model_from_pretrained(self):
for model_name in list(T5_PRETRAINED_MODEL_ARCHIVE_MAP.keys())[:1]:
@@ -429,6 +444,7 @@ class T5ModelIntegrationTests(unittest.TestCase):
)
input_ids = tok.encode(model.config.prefix + original_input, return_tensors="pt")
input_ids = input_ids.to(torch_device)
output = model.generate(
input_ids=input_ids,
@@ -456,6 +472,7 @@ class T5ModelIntegrationTests(unittest.TestCase):
expected_translation = "Cette section d'images provenant de l'enregistrement infrarouge effectué par le télescope Spitzer montre un « portrait familial » de générations innombrables de étoiles : les plus anciennes sont observées sous forme de pointes bleues, alors que les « nouveau-nés » de couleur rose dans la salle des accouchements doivent être plus difficiles "
input_ids = tok.encode(model.config.prefix + original_input, return_tensors="pt")
input_ids = input_ids.to(torch_device)
output = model.generate(
input_ids=input_ids,
@@ -483,6 +500,7 @@ class T5ModelIntegrationTests(unittest.TestCase):
expected_translation = "Taco Bell a declarat că intenţionează să adauge 2 000 de locaţii în SUA până în 2022."
input_ids = tok.encode(model.config.prefix + original_input, return_tensors="pt")
input_ids = input_ids.to(torch_device)
output = model.generate(
input_ids=input_ids,

View File

@@ -21,7 +21,7 @@ from transformers import is_torch_available
from .test_configuration_common import ConfigTester
from .test_modeling_common import ModelTesterMixin, ids_tensor
from .utils import require_torch, slow, torch_device
from .utils import require_multigpu, require_torch, slow, torch_device
if is_torch_available():
@@ -43,7 +43,7 @@ class TransfoXLModelTest(ModelTesterMixin, unittest.TestCase):
def __init__(
self,
parent,
batch_size=13,
batch_size=14,
seq_length=7,
mem_len=30,
clamp_len=15,
@@ -207,6 +207,11 @@ class TransfoXLModelTest(ModelTesterMixin, unittest.TestCase):
output_result = self.model_tester.create_transfo_xl_lm_head(*config_and_inputs)
self.model_tester.check_transfo_xl_lm_head_output(output_result)
@require_multigpu
def test_multigpu_data_parallel_forward(self):
# Opt-out of this test.
pass
@slow
def test_model_from_pretrained(self):
for model_name in list(TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_MAP.keys())[:1]:
@@ -218,6 +223,7 @@ class TransfoXLModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_transfo_xl_wt103(self):
model = TransfoXLLMHeadModel.from_pretrained("transfo-xl-wt103")
model.to(torch_device)
input_ids = torch.tensor(
[
[

View File

@@ -434,6 +434,7 @@ class XLMModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_xlm_mlm_en_2048(self):
model = XLMWithLMHeadModel.from_pretrained("xlm-mlm-en-2048")
model.to(torch_device)
input_ids = torch.tensor([[14, 447]], dtype=torch.long, device=torch_device) # the president
expected_output_ids = [
14,
@@ -459,4 +460,4 @@ class XLMModelLanguageGenerationTest(unittest.TestCase):
] # the president the president the president the president the president the president the president the president the president the president
# TODO(PVP): this and other input_ids I tried for generation give pretty bad results. Not sure why. Model might just not be made for auto-regressive inference
output_ids = model.generate(input_ids, do_sample=False)
self.assertListEqual(output_ids[0].numpy().tolist(), expected_output_ids)
self.assertListEqual(output_ids[0].cpu().numpy().tolist(), expected_output_ids)

Some files were not shown because too many files have changed in this diff Show More