Add flash attention for gpt_bigcode (#26479)

* added flash attention of gpt_bigcode

* changed docs

* Update src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

* add FA-2 docs

* oops

* Update docs/source/en/perf_infer_gpu_one.md Last Nit

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* oops

* remove padding_mask

* change getattr->hasattr logic

* changed .md file

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Susnato Dhar
2023-10-31 16:51:02 +05:30
committed by GitHub
parent 9dc4ce9ea7
commit b5db8ca66f
3 changed files with 328 additions and 23 deletions

View File

@@ -34,6 +34,7 @@ We natively support Flash Attention 2 for the following models:
- Llama
- Mistral
- Falcon
- [GPTBigCode (Starcoder)](model_doc/gpt_bigcode#)
You can request to add Flash Attention 2 support for more models by opening an issue on GitHub, and even open a Pull Request to integrate the changes. The supported models can be used for inference and training, including training with padding tokens - *which is currently not supported for `BetterTransformer` API below.*