Aritra Roy Gosthipaty
965e98dc54
[Port] TensorFlow implementation of Mistral (#29708)
* chore: initial commit
* chore: adding imports and inits
* chore: adding the causal and classification code
* chore: adding names to the layers
* chore: using single self attn layer
* chore: built the model and layers
* chore: start with testing
* chore: docstring change, transpose fix
* fix: rotary embedding
* chore: adding cache implementation
* remove unused torch
* chore: fixing the indexing issue
* make fix-copies
* Use modeling_tf_utils.keras
* make fixup
* chore: fixing tests
* chore: adding past key value logic
* chore: adding multi label classfication test
* fix: switching on the built parameters in the layers
* fixing repo consistency
* ruff formats
* style changes
* fix: tf and pt equivalence
* removing returns from docstrings
* fix docstrings
* fix docstrings
* removing todos
* fix copies
* fix docstring
* fix docstring
* chore: using easier rotate_half
* adding integration tests
* chore: addressing review related to rotary embedding layer
* review changes
* [run-slow] mistral
* skip: test save load after resize token embedding
* style
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
2024-05-23 17:48:49 +01:00
..
2024-04-29 10:57:51 +01:00
2024-05-23 17:48:49 +01:00
2024-05-20 16:48:52 -07:00
2024-04-08 14:21:16 +01:00
2024-04-16 11:58:55 +02:00
2024-05-01 15:47:05 +01:00
2024-05-09 22:57:52 +02:00
2024-05-16 14:32:21 +01:00
2024-04-23 16:06:20 +01:00
2024-05-01 15:47:05 +01:00
2024-04-08 14:21:16 +01:00
2023-11-08 08:35:20 -05:00
2024-05-07 12:59:49 +02:00
2024-04-08 14:21:16 +01:00