Kian Sierra McGettigan
f7076cd346
Flax mistral (#26943)
* direct copy from llama work
* mistral modules forward pass working
* flax mistral forward pass with sliding window
* added tests
* added layer collection approach
* Revert "added layer collection approach"
This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
* Revert "Revert "added layer collection approach""
This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
* fixed attention outputs
* added mistral to init and auto
* fixed import name
* fixed layernorm weight dtype
* freeze initialized weights
* make sure conversion consideres bfloat16
* added backend
* added docstrings
* added cache
* fixed sliding window causal mask
* passes cache tests
* passed all tests
* applied make style
* removed commented out code
* applied fix-copies ignored other model changes
* applied make fix-copies
* removed unused functions
* passed generation integration test
* slow tests pass
* fixed slow tests
* changed default dtype from jax.numpy.float32 to float32 for docstring check
* skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
* updated checkpoint since from_pt not included
* applied black style
* removed unused args
* Applied styling and fixup
* changed checkpoint for doc back
* fixed rf after adding it to hf hub
* Add dummy ckpt
* applied styling
* added tokenizer to new ckpt
* fixed slice format
* fix init and slice
* changed ref for placeholder TODO
* added copies from Llama
* applied styling
* applied fix-copies
* fixed docs
* update weight dtype reconversion for sharded weights
* removed Nullable input ids
* Removed unnecessary output attentions in Module
* added embedding weight initialziation
* removed unused past_key_values
* fixed deterministic
* Fixed RMS Norm and added copied from
* removed input_embeds
* applied make style
* removed nullable input ids from sequence classification model
* added copied from GPTJ
* added copied from Llama on FlaxMistralDecoderLayer
* added copied from to FlaxMistralPreTrainedModel methods
* fix test deprecation warning
* freeze gpt neox random_params and fix copies
* applied make style
* fixed doc issue
* skipped docstring test to allign # copied from
* applied make style
* removed FlaxMistralForSequenceClassification
* removed unused padding_idx
* removed more sequence classification
* removed sequence classification
* applied styling and consistency
* added copied from in tests
* removed sequence classification test logic
* applied styling
* applied make style
* removed freeze and fixed copies
* undo test change
* changed repeat_kv to tile
* fixed to key value groups
* updated copyright year
* split casual_mask
* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
* went back to 2023 for tests_pr_documentation_tests
* went back to 2024
* changed tile to repeat
* applied make style
* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
..
2024-01-15 17:04:08 +00:00
2024-01-25 11:51:58 +00:00
2024-01-31 14:19:02 +01:00
2024-01-26 09:29:07 -08:00
2023-11-16 17:43:19 +01:00
2023-11-23 15:58:21 +00:00
2024-01-30 07:20:20 +01:00
2023-09-04 11:15:12 +01:00
2023-09-25 12:58:11 +02:00
2024-01-11 09:26:13 -08:00
2023-09-18 13:33:01 +02:00
2023-06-20 18:07:47 -04:00
2023-12-11 08:22:17 -05:00
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2023-09-04 11:15:12 +01:00
2023-11-27 18:40:10 +00:00
2023-09-04 11:16:34 +01:00
2022-04-04 10:25:46 -04:00
2023-11-06 19:45:03 +00:00
2024-01-22 13:46:04 +00:00
2023-09-05 12:27:20 +01:00
2024-01-24 08:31:28 -08:00
2024-01-24 11:18:59 -08:00
2023-06-20 18:07:47 -04:00
2023-12-20 10:37:23 -08:00
2023-12-20 18:55:35 +00:00
2023-12-13 09:21:59 -08:00
2024-01-30 02:48:25 +01:00
2023-11-13 14:20:54 +01:00
2024-01-31 14:19:02 +01:00
2024-01-04 09:36:16 -08:00
2023-12-09 05:38:14 +09:00
2023-12-07 15:11:35 +00:00
2023-09-04 11:15:12 +01:00
2023-09-04 11:15:12 +01:00
2023-06-20 18:07:47 -04:00
2023-08-18 07:58:59 +02:00
2022-04-04 10:25:46 -04:00
2023-12-08 10:32:18 -08:00
2023-11-14 10:32:57 +01:00
2023-11-27 16:26:33 +00:00
2023-10-31 09:44:51 -07:00
2024-01-17 16:02:22 +01:00
2023-08-10 13:25:00 +02:00
2024-01-11 08:55:48 -08:00
2024-01-24 09:07:13 -08:00
2024-01-25 17:55:11 +00:00
2024-01-16 11:30:26 +01:00
2023-12-15 13:17:29 -08:00
2023-06-20 18:07:47 -04:00
2023-10-31 09:44:51 -07:00
2023-06-20 18:07:47 -04:00
2023-11-06 19:45:03 +00:00
2023-09-29 18:32:37 +01:00
2023-08-18 10:17:44 +02:00
2023-10-04 15:13:37 +02:00
2024-01-26 11:58:57 +00:00
2024-01-30 02:48:25 +01:00
2023-08-16 08:03:23 +02:00
2023-11-27 16:26:33 +00:00
2023-11-06 19:45:03 +00:00
2023-06-20 18:07:47 -04:00
2023-12-18 15:06:54 -08:00
2023-06-20 18:07:47 -04:00
2024-01-24 08:31:28 -08:00
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2023-08-03 14:17:30 -07:00
2023-06-20 18:07:47 -04:00
2023-12-20 10:37:23 -08:00
2023-11-24 12:41:16 +01:00
2023-09-05 12:27:20 +01:00
2023-06-20 18:07:47 -04:00