Sukriti Sharma
471958b620
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at decoder
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* completion of layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* modeling class
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* adding hybrid class to imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix imports granitemoehybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid import
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix generated modeling file
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add some comments
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* minor fixes in layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add sharedMLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct layer names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* change name of MLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix seq mizer layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in param names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* enable hybrid model
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix config granite hybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix attention layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* cleanup to re-use mamba code
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* keep layer types
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* attention bias cleanup
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update mamba layer name
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* use granite attention
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix: self attn weights
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* pass at making pos_emb optional
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* initialize self_attn only as needed
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* overwrite forward to create HybridMambaCache
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* Log invalid layer types
* Add attention outputs test
* Only emit attentions/logits if not None
* Fix config test hidden size divisibility
* mark granitmoehybrid as stateful
* Initialize mamba convolutional layers
* Formatting fixes
* config docstring, removed some unused attrs
* Fix missing arg in models test
* Fix create and check decoder model test
* support logits to keep in granitemoe
* regen to pass logits_to_keep
* Allow None or rope
* Fix gradient checkpointing
* Add granitemoehybrid as special cache for generate check
* Remove unused MLA refs
* Fix mamba layer mask
* Remove logits to keep from config
* Minor docstring nits
* Update licenses
* Enable cache by default
* map layer types to layer block type
* First pass at granite moe hybrid docs
* Ignore granite moe hybrid in valid checkpoint check
* Align attention interfaces
* regenerate modular granitemoeshared attention interface
* Align granite moe hybrid attn interface
* run formatting
* Handle mamba initialization
* avoid conditional attr defs
* Move hybrid layer validation to config
* Add placeholder integration tests
* Docs nits / Update model names
* Clean up forward conditions
* Use gradient checkpointing layer
* Remove some copied bamba tests + inherit
align test init
delete more tests
Use common layer init with bamba tests
finish test consolidation
* avoid redundant intermediate std var
* use @can_return_tuple
* Remove unused moe state
* make skipped test names consistent
* Fix docstring order
* Add missing toc
* Always create the shared mlp
* Fix name in docstring
* link preview model in docs
---------
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-06 06:47:43 +02:00
..
2025-04-14 14:16:07 +01:00
2025-04-30 12:15:43 +01:00
2025-05-06 06:47:43 +02:00
2025-04-30 12:15:43 +01:00
2025-04-11 18:42:37 +01:00
2024-11-04 09:40:30 -08:00
2025-04-30 12:15:43 +01:00
2025-04-30 12:15:43 +01:00
2025-05-01 08:44:12 -07:00
2025-04-11 18:42:37 +01:00
2025-04-30 12:15:43 +01:00
2024-12-17 09:32:00 -08:00
2023-11-08 08:35:20 -05:00
2025-04-30 12:15:43 +01:00
2024-04-08 14:21:16 +01:00