Add splinter (#12955)
* splinter template * initialize splinter classes * Splinter Tokenizer * splinter.rst * tokenization fixes * Documentation & some minor variable name changes * bug fix (added back question_token_id to config) + variable names * Minor bug fixes + variable name changes * Fix Splinter references after merge with new transformers * changes after running make style & quality * Fix documentation unindent * Fix doc indentation in tokenization_splinter * Fix also SplinterTokenizerFast * Add Splinter to index.rst and README * Fixdouble whitespace from index.rst * Fixed index.rst with 'make fix-copies' * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/splinter/__init__.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Added "copied from BERT" comments * Removing unnexessary code from modeling_splinter * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Remove references to TF modeling from splinter * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add differences between Splinter and Bert tokenizers * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Doc formatting * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * bug fix: remove load_tf_weights attribute * Some minor quality changes * Update docs/source/model_doc/splinter.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change FullyConnectedLayer to SplinterFullyConnectedLayer * Variable naming * Reove gather_positions function * Remove ClassificationHead as it's outdated * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Remove hardcoded 102 token id * Minor style change * Added "tau" organization to all model identifiers & URLS * Added tau to the tests as well * Copy-from comments * Removed all unnecessary classes (e.g. SplinterForMaskedLM) * Running make fix-copies * Bug fix: Further removed unnecessary classes * Add Splinter to AutoTokenization * Add an integration test for Splinter * Removed initialize_new_qass from config - It will be done through different checkpoints * Removed `initialize_new_qass` from documentation as well * Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code * Minor change to test * SplinterTokenizer now doesn't abstract from BertTokenizer * SplinterTokenizerFast also dosn't abstract from Bert * style and quality * bug fix: import ing torch in tests only if it's available * Auto mappings * Changed copyrights in Splinter's files * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
@@ -257,41 +257,44 @@ Supported models
|
||||
53. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
|
||||
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
|
||||
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
54. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
|
||||
54. `Splinter <https://huggingface.co/transformers/model_doc/master/splinter.html>`__ (from Tel Aviv University),
|
||||
released together with the paper `Few-Shot Question Answering by Pretraining Span Selection
|
||||
<https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
|
||||
55. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
|
||||
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
|
||||
Krishna, and Kurt W. Keutzer.
|
||||
55. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
|
||||
56. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
|
||||
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
|
||||
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
56. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
|
||||
57. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
|
||||
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
|
||||
Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
57. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
|
||||
58. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
|
||||
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
|
||||
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
58. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
|
||||
59. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
|
||||
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
|
||||
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
|
||||
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
59. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
|
||||
60. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
|
||||
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
|
||||
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
60. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
|
||||
61. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
|
||||
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
|
||||
Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
61. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
|
||||
62. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
|
||||
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
|
||||
62. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
|
||||
63. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
|
||||
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
|
||||
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
63. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
|
||||
64. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
|
||||
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
|
||||
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
|
||||
Zettlemoyer and Veselin Stoyanov.
|
||||
64. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
|
||||
65. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
|
||||
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
|
||||
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
65. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
|
||||
66. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
|
||||
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
|
||||
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
|
||||
@@ -413,6 +416,8 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Speech2Text | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
@@ -571,6 +576,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
model_doc/roberta
|
||||
model_doc/roformer
|
||||
model_doc/speech_to_text
|
||||
model_doc/splinter
|
||||
model_doc/squeezebert
|
||||
model_doc/t5
|
||||
model_doc/tapas
|
||||
|
||||
87
docs/source/model_doc/splinter.rst
Normal file
87
docs/source/model_doc/splinter.rst
Normal file
@@ -0,0 +1,87 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Splinter
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Splinter model was proposed in `Few-Shot Question Answering by Pretraining Span Selection
|
||||
<https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. Splinter
|
||||
is an encoder-only transformer (similar to BERT) pretrained using the recurring span selection task on a large corpus
|
||||
comprising Wikipedia and the Toronto Book Corpus.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order
|
||||
of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred
|
||||
training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between
|
||||
current pretraining objectives and question answering. We propose a new pretraining scheme tailored for question
|
||||
answering: recurring span selection. Given a passage with multiple sets of recurring spans, we mask in each set all
|
||||
recurring spans but one, and ask the model to select the correct span in the passage for each masked span. Masked spans
|
||||
are replaced with a special token, viewed as a question representation, that is later used during fine-tuning to select
|
||||
the answer span. The resulting model obtains surprisingly good results on multiple benchmarks (e.g., 72.7 F1 on SQuAD
|
||||
with only 128 training examples), while maintaining competitive performance in the high-resource setting.
|
||||
|
||||
Tips:
|
||||
|
||||
- Splinter was trained to predict answers spans conditioned on a special [QUESTION] token. These tokens contextualize
|
||||
to question representations which are used to predict the answers. This layer is called QASS, and is the default
|
||||
behaviour in the :class:`~transformers.SplinterForQuestionAnswering` class. Therefore:
|
||||
- Use :class:`~transformers.SplinterTokenizer` (rather than :class:`~transformers.BertTokenizer`), as it already
|
||||
contains this special token. Also, its default behavior is to use this token when two sequences are given (for
|
||||
example, in the `run_qa.py` script).
|
||||
- If you plan on using Splinter outside `run_qa.py`, please keep in mind the question token - it might be important for
|
||||
the success of your model, especially in a few-shot setting.
|
||||
- Please note there are two different checkpoints for each size of Splinter. Both are basically the same, except that
|
||||
one also has the pretrained wights of the QASS layer (`tau/splinter-base-qass` and `tau/splinter-large-qass`) and one
|
||||
doesn't (`tau/splinter-base` and `tau/splinter-large`). This is done to support randomly initializing this layer at
|
||||
fine-tuning, as it is shown to yield better results for some cases in the paper.
|
||||
|
||||
This model was contributed by `yuvalkirstain <https://huggingface.co/yuvalkirstain>`__ and `oriram
|
||||
<https://huggingface.co/oriram>`__. The original code can be found `here <https://github.com/oriram/splinter>`__.
|
||||
|
||||
SplinterConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SplinterConfig
|
||||
:members:
|
||||
|
||||
|
||||
SplinterTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SplinterTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
SplinterTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SplinterTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
SplinterModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SplinterModel
|
||||
:members: forward
|
||||
|
||||
|
||||
SplinterForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SplinterForQuestionAnswering
|
||||
:members: forward
|
||||
Reference in New Issue
Block a user