[Port] TensorFlow implementation of Mistral (#29708)

* chore: initial commit

* chore: adding imports and inits

* chore: adding the causal and classification code

* chore: adding names to the layers

* chore: using single self attn layer

* chore: built the model and layers

* chore: start with testing

* chore: docstring change, transpose fix

* fix: rotary embedding

* chore: adding cache implementation

* remove unused torch

* chore: fixing the indexing issue

* make fix-copies

* Use modeling_tf_utils.keras

* make fixup

* chore: fixing tests

* chore: adding past key value logic

* chore: adding multi label classfication test

* fix: switching on the built parameters in the layers

* fixing repo consistency

* ruff formats

* style changes

* fix: tf and pt equivalence

* removing returns from docstrings

* fix docstrings

* fix docstrings

* removing todos

* fix copies

* fix docstring

* fix docstring

* chore: using easier rotate_half

* adding integration tests

* chore: addressing review related to rotary embedding layer

* review changes

* [run-slow] mistral

* skip: test save load after resize token embedding

* style

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
This commit is contained in:
Aritra Roy Gosthipaty
2024-05-23 22:18:49 +05:30
committed by GitHub
parent 2a89673fe5
commit 965e98dc54
8 changed files with 1512 additions and 3 deletions

View File

@@ -200,7 +200,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Megatron-BERT](model_doc/megatron-bert) | ✅ | ❌ | ❌ |
| [Megatron-GPT2](model_doc/megatron_gpt2) | ✅ | ✅ | ✅ |
| [MGP-STR](model_doc/mgp-str) | ✅ | ❌ | ❌ |
| [Mistral](model_doc/mistral) | ✅ | | ✅ |
| [Mistral](model_doc/mistral) | ✅ | | ✅ |
| [Mixtral](model_doc/mixtral) | ✅ | ❌ | ❌ |
| [mLUKE](model_doc/mluke) | ✅ | ❌ | ❌ |
| [MMS](model_doc/mms) | ✅ | ✅ | ✅ |

View File

@@ -216,4 +216,19 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
## FlaxMistralForCausalLM
[[autodoc]] FlaxMistralForCausalLM
- __call__
- __call__
## TFMistralModel
[[autodoc]] TFMistralModel
- call
## TFMistralForCausalLM
[[autodoc]] TFMistralForCausalLM
- call
## TFMistralForSequenceClassification
[[autodoc]] TFMistralForSequenceClassification
- call