Sukriti Sharma
471958b620
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at decoder
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* completion of layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* modeling class
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* adding hybrid class to imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix imports granitemoehybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid import
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix generated modeling file
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add some comments
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* minor fixes in layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add sharedMLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct layer names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* change name of MLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix seq mizer layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in param names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* enable hybrid model
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix config granite hybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix attention layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* cleanup to re-use mamba code
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* keep layer types
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* attention bias cleanup
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update mamba layer name
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* use granite attention
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix: self attn weights
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* pass at making pos_emb optional
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* initialize self_attn only as needed
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* overwrite forward to create HybridMambaCache
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* Log invalid layer types
* Add attention outputs test
* Only emit attentions/logits if not None
* Fix config test hidden size divisibility
* mark granitmoehybrid as stateful
* Initialize mamba convolutional layers
* Formatting fixes
* config docstring, removed some unused attrs
* Fix missing arg in models test
* Fix create and check decoder model test
* support logits to keep in granitemoe
* regen to pass logits_to_keep
* Allow None or rope
* Fix gradient checkpointing
* Add granitemoehybrid as special cache for generate check
* Remove unused MLA refs
* Fix mamba layer mask
* Remove logits to keep from config
* Minor docstring nits
* Update licenses
* Enable cache by default
* map layer types to layer block type
* First pass at granite moe hybrid docs
* Ignore granite moe hybrid in valid checkpoint check
* Align attention interfaces
* regenerate modular granitemoeshared attention interface
* Align granite moe hybrid attn interface
* run formatting
* Handle mamba initialization
* avoid conditional attr defs
* Move hybrid layer validation to config
* Add placeholder integration tests
* Docs nits / Update model names
* Clean up forward conditions
* Use gradient checkpointing layer
* Remove some copied bamba tests + inherit
align test init
delete more tests
Use common layer init with bamba tests
finish test consolidation
* avoid redundant intermediate std var
* use @can_return_tuple
* Remove unused moe state
* make skipped test names consistent
* Fix docstring order
* Add missing toc
* Always create the shared mlp
* Fix name in docstring
* link preview model in docs
---------
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-06 06:47:43 +02:00
..
2022-11-08 19:54:41 +00:00
2024-05-22 06:40:15 +02:00
2025-04-04 12:18:20 +02:00
2025-03-21 13:08:47 +01:00
2025-04-29 12:17:55 +01:00
2025-05-06 06:47:43 +02:00
2025-04-28 19:07:09 +02:00
2024-05-22 06:40:15 +02:00
2025-04-28 19:07:09 +02:00
2024-05-22 06:40:15 +02:00
2025-04-05 22:02:22 +02:00
2025-04-11 11:08:36 +02:00
2023-03-13 19:11:19 +01:00
2025-04-14 16:11:29 +01:00
2025-04-28 19:07:09 +02:00
2023-06-06 18:17:41 +02:00
2025-03-06 13:12:30 +00:00
2024-08-27 11:58:27 +01:00
2025-03-13 15:12:44 +00:00
2025-03-13 15:12:44 +00:00
2025-03-25 16:00:11 +01:00
2024-04-15 15:08:09 +02:00
2025-04-02 14:39:57 +02:00
2024-01-31 15:58:17 +01:00
2025-03-25 16:00:11 +01:00
2023-02-03 12:57:02 -05:00
2024-10-17 16:11:52 +02:00
2024-08-27 11:58:27 +01:00
2024-04-12 10:01:28 +02:00
2024-05-22 06:40:15 +02:00
2025-04-29 10:43:23 +02:00
2025-04-28 19:07:09 +02:00
2024-04-15 13:20:36 +02:00
2025-03-25 16:00:11 +01:00
2025-05-05 15:19:48 +02:00
2025-03-25 16:00:11 +01:00
2024-10-09 09:21:46 +02:00
2025-02-24 17:53:18 +01:00
2022-06-02 10:24:16 +02:00
2024-10-28 12:01:05 +01:00
2025-03-25 16:00:11 +01:00
2024-09-03 16:53:21 +02:00
2025-03-11 13:47:38 +00:00
2024-06-10 15:16:58 +02:00
2024-05-09 22:57:52 +02:00
2024-05-22 06:40:15 +02:00
2025-03-13 15:12:44 +00:00
2024-04-24 22:32:42 +02:00
2025-04-28 14:20:45 +01:00
2025-03-25 16:00:11 +01:00
2024-07-22 14:14:47 +01:00