Mayank Mishra
e472e077c2
Granitemoe (#33207)
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* add granitemoe
* add decoration
* remove moe from sequenceclassification
* fix test
* fix
* fix
* fix
* move rope?
* merge
* drop bias
* drop bias
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* drop
* drop
* fix
* fix
* cleanup
* cleanup
* fix
* fix granite tests
* fp32 test
* fix
* drop jitter
* fix
* rename
* rename
* fix config
* add gen test
---------
Co-authored-by: Yikang Shen <yikang.shn@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-21 01:43:50 +02:00
..
2024-09-16 10:02:03 -07:00
2024-06-26 21:59:08 +01:00
2024-09-21 01:43:50 +02:00
2024-08-26 13:15:43 +02:00
2024-06-28 18:02:30 +02:00
2024-04-16 11:58:55 +02:00
2024-08-26 13:15:43 +02:00
2024-08-26 13:15:43 +02:00
2024-08-30 09:52:41 -07:00
2024-04-23 16:06:20 +01:00
2024-08-26 13:15:43 +02:00
2024-06-12 11:33:00 +01:00
2023-11-08 08:35:20 -05:00
2024-08-26 13:15:43 +02:00
2024-04-08 14:21:16 +01:00