add GPTSAN model (reopen) (#21291)
* add GPTSAN-Japanese * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN (update for review) * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * fix typo in comment text * add GPTSAN * add GPTSAN * add GPTSAN * add GPTSAN * fix document and comments * fix class name GPTSAN->GPTSan * fix import and test for tokenizer
This commit is contained in:
@@ -301,6 +301,8 @@
|
||||
title: GPT-J
|
||||
- local: model_doc/gpt2
|
||||
title: GPT2
|
||||
- local: model_doc/gptsan-japanese
|
||||
title: GPTSAN Japanese
|
||||
- local: model_doc/gpt-sw3
|
||||
title: GPTSw3
|
||||
- local: model_doc/herbert
|
||||
|
||||
@@ -119,6 +119,7 @@ The documentation is organized into five sections:
|
||||
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||
1. **[GPT-Sw3](model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren.
|
||||
1. **[GPTSAN-japanese](model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama).
|
||||
1. **[Graphormer](model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.
|
||||
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
|
||||
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
@@ -306,6 +307,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| GPT NeoX Japanese | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||
| GPT-Sw3 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| GPTSAN-japanese | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Graphormer | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| GroupViT | ❌ | ❌ | ✅ | ✅ | ❌ |
|
||||
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
|
||||
|
||||
117
docs/source/en/model_doc/gptsan-japanese.mdx
Normal file
117
docs/source/en/model_doc/gptsan-japanese.mdx
Normal file
@@ -0,0 +1,117 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# GPTSAN-japanese
|
||||
|
||||
## Overview
|
||||
|
||||
The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).
|
||||
|
||||
GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM
|
||||
in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can
|
||||
fine-tune for translation or summarization.
|
||||
|
||||
### Generation
|
||||
|
||||
The `generate()` method can be used to generate text using GPTSAN-Japanese model.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
>>> import torch
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("Tanrei/GPTSAN-japanese")
|
||||
>>> model = AutoModel.from_pretrained("Tanrei/GPTSAN-japanese").cuda()
|
||||
>>> x_tok = tokenizer("は、", prefix_text="織田信長", return_tensors="pt")
|
||||
>>> torch.manual_seed(0)
|
||||
>>> gen_tok = model.generate(x_tok.input_ids.cuda(), token_type_ids=x_tok.token_type_ids.cuda(), max_new_tokens=20)
|
||||
>>> tokenizer.decode(gen_tok[0])
|
||||
'織田信長は、2004年に『戦国BASARA』のために、豊臣秀吉'
|
||||
```
|
||||
|
||||
## GPTSAN Features
|
||||
|
||||
GPTSAN has some unique features. It has a model structure of Prefix-LM. It works as a shifted Masked Language Model for Prefix Input tokens. Un-prefixed inputs behave like normal generative models.
|
||||
The Spout vector is a GPTSAN specific input. Spout is pre-trained with random inputs, but you can specify a class of text or an arbitrary vector during fine-tuning. This allows you to indicate the tendency of the generated text.
|
||||
GPTSAN has a sparse Feed Forward based on Switch-Transformer. You can also add other layers and train them partially. See the original GPTSAN repository for details.
|
||||
|
||||
### Prefix-LM Model
|
||||
|
||||
GPTSAN has the structure of the model named Prefix-LM in the `T5` paper. (The original GPTSAN repository calls it `hybrid`)
|
||||
In GPTSAN, the `Prefix` part of Prefix-LM, that is, the input position that can be referenced by both tokens, can be specified with any length.
|
||||
Arbitrary lengths can also be specified differently for each batch.
|
||||
This length applies to the text entered in `prefix_text` for the tokenizer.
|
||||
The tokenizer returns the mask of the `Prefix` part of Prefix-LM as `token_type_ids`.
|
||||
The model treats the part where `token_type_ids` is 1 as a `Prefix` part, that is, the input can refer to both tokens before and after.
|
||||
|
||||
Tips:
|
||||
|
||||
Specifying the Prefix part is done with a mask passed to self-attention.
|
||||
When token_type_ids=None or all zero, it is equivalent to regular causal mask
|
||||
|
||||
for example:
|
||||
|
||||
>>> x_token = tokenizer("アイウエ")
|
||||
input_ids: | SOT | SEG | ア | イ | ウ | エ |
|
||||
token_type_ids: | 1 | 0 | 0 | 0 | 0 | 0 |
|
||||
prefix_lm_mask:
|
||||
SOT | 1 0 0 0 0 0 |
|
||||
SEG | 1 1 0 0 0 0 |
|
||||
ア | 1 1 1 0 0 0 |
|
||||
イ | 1 1 1 1 0 0 |
|
||||
ウ | 1 1 1 1 1 0 |
|
||||
エ | 1 1 1 1 1 1 |
|
||||
|
||||
>>> x_token = tokenizer("", prefix_text="アイウエ")
|
||||
input_ids: | SOT | ア | イ | ウ | エ | SEG |
|
||||
token_type_ids: | 1 | 1 | 1 | 1 | 1 | 0 |
|
||||
prefix_lm_mask:
|
||||
SOT | 1 1 1 1 1 0 |
|
||||
ア | 1 1 1 1 1 0 |
|
||||
イ | 1 1 1 1 1 0 |
|
||||
ウ | 1 1 1 1 1 0 |
|
||||
エ | 1 1 1 1 1 0 |
|
||||
SEG | 1 1 1 1 1 1 |
|
||||
|
||||
>>> x_token = tokenizer("ウエ", prefix_text="アイ")
|
||||
input_ids: | SOT | ア | イ | SEG | ウ | エ |
|
||||
token_type_ids: | 1 | 1 | 1 | 0 | 0 | 0 |
|
||||
prefix_lm_mask:
|
||||
SOT | 1 1 1 0 0 0 |
|
||||
ア | 1 1 1 0 0 0 |
|
||||
イ | 1 1 1 0 0 0 |
|
||||
SEG | 1 1 1 1 0 0 |
|
||||
ウ | 1 1 1 1 1 0 |
|
||||
エ | 1 1 1 1 1 1 |
|
||||
|
||||
### Spout Vector
|
||||
|
||||
A Spout Vector is a special vector for controlling text generation.
|
||||
This vector is treated as the first embedding in self-attention to bring extraneous attention to the generated tokens.
|
||||
In the pre-trained model published from `Tanrei/GPTSAN-japanese`, the Spout Vector is a 128-dimensional vector that passes through 8 fully connected layers in the model and is projected into the space acting as external attention.
|
||||
The Spout Vector projected by the fully connected layer is split to be passed to all self-attentions.
|
||||
|
||||
## GPTSanJapaneseConfig
|
||||
|
||||
[[autodoc]] GPTSanJapaneseConfig
|
||||
|
||||
## GPTSanJapaneseTokenizer
|
||||
|
||||
[[autodoc]] GPTSanJapaneseTokenizer
|
||||
|
||||
## GPTSanJapaneseModel
|
||||
|
||||
[[autodoc]] GPTSanJapaneseModel
|
||||
|
||||
## GPTSanJapaneseForConditionalGeneration
|
||||
|
||||
[[autodoc]] GPTSanJapaneseForConditionalGeneration
|
||||
- forward
|
||||
@@ -29,7 +29,7 @@ The task illustrated in this tutorial is supported by the following model archit
|
||||
|
||||
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
|
||||
|
||||
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
|
||||
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [GPTSAN-japanese](../model_doc/gptsan-japanese), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
|
||||
|
||||
<!--End of the generated tip-->
|
||||
|
||||
|
||||
@@ -26,7 +26,7 @@ The task illustrated in this tutorial is supported by the following model archit
|
||||
|
||||
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
|
||||
|
||||
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
|
||||
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [GPTSAN-japanese](../model_doc/gptsan-japanese), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
|
||||
|
||||
<!--End of the generated tip-->
|
||||
|
||||
|
||||
Reference in New Issue
Block a user