Kian Sierra McGettigan
f7076cd346
Flax mistral (#26943)
* direct copy from llama work
* mistral modules forward pass working
* flax mistral forward pass with sliding window
* added tests
* added layer collection approach
* Revert "added layer collection approach"
This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
* Revert "Revert "added layer collection approach""
This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
* fixed attention outputs
* added mistral to init and auto
* fixed import name
* fixed layernorm weight dtype
* freeze initialized weights
* make sure conversion consideres bfloat16
* added backend
* added docstrings
* added cache
* fixed sliding window causal mask
* passes cache tests
* passed all tests
* applied make style
* removed commented out code
* applied fix-copies ignored other model changes
* applied make fix-copies
* removed unused functions
* passed generation integration test
* slow tests pass
* fixed slow tests
* changed default dtype from jax.numpy.float32 to float32 for docstring check
* skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
* updated checkpoint since from_pt not included
* applied black style
* removed unused args
* Applied styling and fixup
* changed checkpoint for doc back
* fixed rf after adding it to hf hub
* Add dummy ckpt
* applied styling
* added tokenizer to new ckpt
* fixed slice format
* fix init and slice
* changed ref for placeholder TODO
* added copies from Llama
* applied styling
* applied fix-copies
* fixed docs
* update weight dtype reconversion for sharded weights
* removed Nullable input ids
* Removed unnecessary output attentions in Module
* added embedding weight initialziation
* removed unused past_key_values
* fixed deterministic
* Fixed RMS Norm and added copied from
* removed input_embeds
* applied make style
* removed nullable input ids from sequence classification model
* added copied from GPTJ
* added copied from Llama on FlaxMistralDecoderLayer
* added copied from to FlaxMistralPreTrainedModel methods
* fix test deprecation warning
* freeze gpt neox random_params and fix copies
* applied make style
* fixed doc issue
* skipped docstring test to allign # copied from
* applied make style
* removed FlaxMistralForSequenceClassification
* removed unused padding_idx
* removed more sequence classification
* removed sequence classification
* applied styling and consistency
* added copied from in tests
* removed sequence classification test logic
* applied styling
* applied make style
* removed freeze and fixed copies
* undo test change
* changed repeat_kv to tile
* fixed to key value groups
* updated copyright year
* split casual_mask
* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
* went back to 2023 for tests_pr_documentation_tests
* went back to 2024
* changed tile to repeat
* applied make style
* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
..
2022-11-08 19:54:41 +00:00
2021-02-15 07:55:10 -05:00
2023-04-06 18:08:14 +02:00
2023-05-18 14:14:43 -04:00
2024-01-30 16:54:09 +00:00
2023-09-18 19:58:01 +02:00
2024-01-29 10:07:49 -08:00
2023-08-10 10:53:22 +02:00
2024-01-31 14:19:02 +01:00
2023-11-03 12:47:07 +01:00
2023-08-10 10:53:22 +02:00
2023-12-01 15:51:10 +01:00
2023-03-13 19:11:19 +01:00
2024-01-15 18:36:40 +00:00
2023-06-06 18:17:41 +02:00
2023-12-22 12:56:11 +01:00
2024-01-08 18:17:16 +01:00
2023-08-17 07:58:35 +02:00
2021-02-15 07:55:10 -05:00
2023-11-28 10:05:34 +01:00
2023-08-17 07:58:35 +02:00
2021-10-07 12:44:23 +05:30
2023-02-28 17:12:44 +01:00
2023-04-19 19:27:37 +02:00
2023-02-28 17:12:44 +01:00
2023-02-03 12:57:02 -05:00
2023-04-21 20:36:35 +02:00
2023-03-01 17:53:29 +01:00
2024-01-30 02:48:25 +01:00
2023-10-30 10:48:24 +01:00
2023-12-08 14:55:02 +01:00
2023-03-30 21:06:35 +02:00
2022-06-02 10:24:16 +02:00
2023-08-17 07:58:35 +02:00
2023-11-30 20:24:43 +01:00
2023-08-17 07:58:35 +02:00
2023-11-28 17:21:21 +01:00
2023-11-14 20:05:54 +00:00
2023-04-06 22:52:59 +02:00