[GPT2] Add SDPA support (#31172)

* `gpt2` sdpa support

* fix (at least) one test, style, repo consistency

* fix sdpa mask in forward --> fixes generation

* test

* test2

* test3

* test4

* simplify shapes for attn mask creation and small comments

* hub fail test

* benchmarks

* flash attn 2 mask should not be inverted on enc-dec setup

* fix comment

* apply some suggestion from code review

- only save _attn_implentation once
- remove unnecessary comment

* change elif logic

* [run-slow] gpt2

* modify `test_gpt2_sample_max_time` to follow previous assertion patterns
This commit is contained in:
Anton Vlasjuk
2024-06-19 09:40:57 +02:00
committed by GitHub
parent 22b41b3f8a
commit b275a41005
4 changed files with 191 additions and 11 deletions

View File

@@ -201,6 +201,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
* [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)
* [GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#transformers.GPTBigCodeModel)
* [JetMoe](https://huggingface.co/docs/transformers/model_doc/jetmoe#transformers.JetMoeModel)
* [Jamba](https://huggingface.co/docs/transformers/model_doc/jamba#transformers.JambaModel)