Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md * With saved modifs * Address review comment * Treat all files * .mdx -> .md * Remove special char * Update utils/tests_fetcher.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> --------- Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
This commit is contained in:
108
docs/source/en/model_doc/bert-generation.md
Normal file
108
docs/source/en/model_doc/bert-generation.md
Normal file
@@ -0,0 +1,108 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# BertGeneration
|
||||
|
||||
## Overview
|
||||
|
||||
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
|
||||
[`EncoderDecoderModel`] as proposed in [Leveraging Pre-trained Checkpoints for Sequence Generation
|
||||
Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Unsupervised pretraining of large neural models has recently revolutionized Natural Language Processing. By
|
||||
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
|
||||
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
|
||||
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
|
||||
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
|
||||
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
|
||||
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
|
||||
Text Summarization, Sentence Splitting, and Sentence Fusion.*
|
||||
|
||||
Usage:
|
||||
|
||||
- The model can be used in combination with the [`EncoderDecoderModel`] to leverage two pretrained
|
||||
BERT checkpoints for subsequent fine-tuning.
|
||||
|
||||
```python
|
||||
>>> # leverage checkpoints for Bert2Bert model...
|
||||
>>> # use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
|
||||
>>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> decoder = BertGenerationDecoder.from_pretrained(
|
||||
... "bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102
|
||||
... )
|
||||
>>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
|
||||
|
||||
>>> # create tokenizer...
|
||||
>>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
|
||||
|
||||
>>> input_ids = tokenizer(
|
||||
... "This is a long article to summarize", add_special_tokens=False, return_tensors="pt"
|
||||
... ).input_ids
|
||||
>>> labels = tokenizer("This is a short summary", return_tensors="pt").input_ids
|
||||
|
||||
>>> # train...
|
||||
>>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
|
||||
>>> loss.backward()
|
||||
```
|
||||
|
||||
- Pretrained [`EncoderDecoderModel`] are also directly available in the model hub, e.g.,
|
||||
|
||||
|
||||
```python
|
||||
>>> # instantiate sentence fusion model
|
||||
>>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
|
||||
>>> input_ids = tokenizer(
|
||||
... "This is the first sentence. This is the second sentence.", add_special_tokens=False, return_tensors="pt"
|
||||
... ).input_ids
|
||||
|
||||
>>> outputs = sentence_fuser.generate(input_ids)
|
||||
|
||||
>>> print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
Tips:
|
||||
|
||||
- [`BertGenerationEncoder`] and [`BertGenerationDecoder`] should be used in
|
||||
combination with [`EncoderDecoder`].
|
||||
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
||||
Therefore, no EOS token should be added to the end of the input.
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
|
||||
found [here](https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder).
|
||||
|
||||
## BertGenerationConfig
|
||||
|
||||
[[autodoc]] BertGenerationConfig
|
||||
|
||||
## BertGenerationTokenizer
|
||||
|
||||
[[autodoc]] BertGenerationTokenizer
|
||||
- save_vocabulary
|
||||
|
||||
## BertGenerationEncoder
|
||||
|
||||
[[autodoc]] BertGenerationEncoder
|
||||
- forward
|
||||
|
||||
## BertGenerationDecoder
|
||||
|
||||
[[autodoc]] BertGenerationDecoder
|
||||
- forward
|
||||
Reference in New Issue
Block a user