Mayank Mishra
e472e077c2
Granitemoe (#33207)
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* add granitemoe
* add decoration
* remove moe from sequenceclassification
* fix test
* fix
* fix
* fix
* move rope?
* merge
* drop bias
* drop bias
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* drop
* drop
* fix
* fix
* cleanup
* cleanup
* fix
* fix granite tests
* fp32 test
* fix
* drop jitter
* fix
* rename
* rename
* fix config
* add gen test
---------
Co-authored-by: Yikang Shen <yikang.shn@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-21 01:43:50 +02:00
..
2024-08-29 11:51:09 +02:00
2024-09-18 11:07:51 +02:00
2024-09-21 01:43:50 +02:00
2024-09-09 10:47:24 +02:00
2024-09-16 13:08:31 +02:00
2024-04-08 14:21:16 +01:00
2024-05-28 18:29:22 +02:00
2024-09-21 01:43:50 +02:00
2024-09-09 10:47:24 +02:00
2024-04-24 09:38:18 +02:00
2024-04-16 15:34:04 +01:00
2024-09-18 11:07:51 +02:00
2024-09-18 11:07:51 +02:00
2024-02-08 14:13:35 -08:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-04-01 18:47:32 -07:00
2024-09-17 16:08:05 +01:00
2024-09-09 10:47:24 +02:00
2022-04-04 10:25:46 -04:00
2024-07-23 17:47:51 +01:00
2024-06-06 22:02:38 +01:00
2024-08-26 13:15:43 +02:00
2024-09-09 10:47:24 +02:00
2024-07-08 11:52:47 +01:00
2023-06-20 18:07:47 -04:00
2023-12-20 10:37:23 -08:00
2024-09-09 10:47:24 +02:00
2024-09-10 13:32:38 +02:00
2024-07-08 11:52:47 +01:00
2023-11-13 14:20:54 +01:00
2024-09-21 01:43:50 +02:00
2024-09-09 10:47:24 +02:00
2024-09-19 12:02:46 +01:00
2024-09-06 10:22:00 +01:00
2024-09-06 10:22:00 +01:00
2024-08-22 15:30:22 +02:00
2024-08-27 09:29:05 -07:00
2024-09-09 10:47:24 +02:00
2024-03-23 18:29:39 -07:00
2024-02-16 08:16:58 +01:00
2022-04-04 10:25:46 -04:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-02-02 08:45:00 +01:00
2024-09-21 01:43:50 +02:00
2024-07-29 10:50:43 +01:00
2024-08-14 09:36:43 -07:00
2024-02-16 08:16:58 +01:00
2024-06-18 11:00:26 -07:00
2024-09-09 10:47:24 +02:00
2024-02-16 08:16:58 +01:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-02-16 08:16:58 +01:00
2023-11-06 19:45:03 +00:00
2024-08-19 09:50:35 -07:00
2024-02-16 08:16:58 +01:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-06-12 11:33:00 +01:00
2024-09-12 10:16:12 -07:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-04-16 11:58:55 +02:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-07-29 10:50:43 +01:00
2024-02-16 08:16:58 +01:00
2024-09-06 14:24:02 +02:00
2024-06-03 16:52:23 -07:00
2024-09-09 10:47:24 +02:00
2024-09-09 10:47:24 +02:00
2024-05-14 18:45:06 +01:00
2024-02-16 08:16:58 +01:00