Update AutoModel classes in summarization example (#12178)

- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`

This silences all warnings.
This commit is contained in:
Kilian Kluge
2021-06-15 16:36:10 +02:00
committed by GitHub
parent d6c929e200
commit a79585bbf9

View File

@@ -827,18 +827,18 @@ CNN / Daily Mail), it yields very good results.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
>>> inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512)
>>> inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512, truncation=True)
>>> outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.