[ `gemma`] Adds support for Gemma 💎 (#29167)

* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗

* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀

* update all readmes

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>

This commit is contained in:

Arthur

2024-02-21 14:21:28 +01:00

committed by

GitHub

parent 58245ba6fb

commit 594c1277b2

40 changed files with 4811 additions and 6 deletions

									
										2

docs/source/en/perf_infer_gpu_one.md
									
												View File
												
				@@ -40,6 +40,7 @@ FlashAttention-2 is currently supported for the following architectures:

				* [Bark](https://huggingface.co/docs/transformers/model_doc/bark#transformers.BarkModel)

				* [Bart](https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartModel)

				* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)

				* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)

				* [GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#transformers.GPTBigCodeModel)

				* [GPTNeo](https://huggingface.co/docs/transformers/model_doc/gpt_neo#transformers.GPTNeoModel)

				* [GPTNeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox#transformers.GPTNeoXModel)

				@@ -171,6 +172,7 @@ For now, Transformers supports SDPA inference and training for the following arc

				* [Bart](https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartModel)

				* [GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#transformers.GPTBigCodeModel)

				* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)

				* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)

				* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)

				* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)

				* [Idefics](https://huggingface.co/docs/transformers/model_doc/idefics#transformers.IdeficsModel)

[ gemma] Adds support for Gemma 💎 (#29167)

2 docs/source/en/perf_infer_gpu_one.md Unescape Escape View File

[ `gemma`] Adds support for Gemma 💎 (#29167)

2

docs/source/en/perf_infer_gpu_one.md

View File