Sukriti Sharma
471958b620
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at decoder
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* completion of layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* modeling class
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* adding hybrid class to imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix imports granitemoehybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid imports
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix granitehybrid import
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix generated modeling file
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add some comments
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* minor fixes in layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* add sharedMLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct layer names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* change name of MLP layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix seq mizer layers
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* correct mamba config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fixes in param names
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* enable hybrid model
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update config
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix config granite hybrid
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix attention layer
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* cleanup to re-use mamba code
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* keep layer types
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* attention bias cleanup
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* update mamba layer name
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* first pass at tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* use granite attention
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix: self attn weights
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* pass at making pos_emb optional
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* initialize self_attn only as needed
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* overwrite forward to create HybridMambaCache
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* Log invalid layer types
* Add attention outputs test
* Only emit attentions/logits if not None
* Fix config test hidden size divisibility
* mark granitmoehybrid as stateful
* Initialize mamba convolutional layers
* Formatting fixes
* config docstring, removed some unused attrs
* Fix missing arg in models test
* Fix create and check decoder model test
* support logits to keep in granitemoe
* regen to pass logits_to_keep
* Allow None or rope
* Fix gradient checkpointing
* Add granitemoehybrid as special cache for generate check
* Remove unused MLA refs
* Fix mamba layer mask
* Remove logits to keep from config
* Minor docstring nits
* Update licenses
* Enable cache by default
* map layer types to layer block type
* First pass at granite moe hybrid docs
* Ignore granite moe hybrid in valid checkpoint check
* Align attention interfaces
* regenerate modular granitemoeshared attention interface
* Align granite moe hybrid attn interface
* run formatting
* Handle mamba initialization
* avoid conditional attr defs
* Move hybrid layer validation to config
* Add placeholder integration tests
* Docs nits / Update model names
* Clean up forward conditions
* Use gradient checkpointing layer
* Remove some copied bamba tests + inherit
align test init
delete more tests
Use common layer init with bamba tests
finish test consolidation
* avoid redundant intermediate std var
* use @can_return_tuple
* Remove unused moe state
* make skipped test names consistent
* Fix docstring order
* Add missing toc
* Always create the shared mlp
* Fix name in docstring
* link preview model in docs
---------
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-06 06:47:43 +02:00
..
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2024-10-08 14:26:43 +02:00
2025-03-03 10:33:46 -08:00
2025-03-04 12:24:33 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-14 17:07:48 +02:00
2025-04-28 15:08:46 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-11 11:08:36 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-16 22:39:18 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-15 18:31:20 +02:00
2025-03-03 10:33:46 -08:00
2025-04-02 14:57:38 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-04-15 13:52:11 -07:00
2025-04-15 18:33:34 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-29 12:17:55 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-28 15:56:59 +01:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-03-03 10:33:46 -08:00
2024-10-22 15:50:54 +02:00
2025-04-04 11:36:05 -07:00
2025-03-11 09:41:41 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-05-01 08:54:22 -07:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-04-14 16:24:01 +02:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-03-03 10:33:46 -08:00
2025-04-16 21:59:24 +02:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-14 15:05:31 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2023-11-03 10:57:03 -04:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-04-30 12:50:54 -07:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-04-09 14:02:04 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-05-02 09:55:16 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-11 18:52:00 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-05-06 06:47:43 +02:00
2025-02-14 16:55:28 +01:00
2025-02-03 20:06:03 +01:00
2025-03-03 10:33:46 -08:00
2025-04-16 12:26:08 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-29 12:17:55 +01:00
2025-03-03 10:33:46 -08:00
2025-03-17 09:07:51 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-23 16:56:36 +02:00
2025-04-30 12:15:43 +01:00
2025-04-17 09:18:51 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-14 15:06:41 +02:00
2025-04-14 15:42:11 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-14 17:07:36 +02:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-04-05 22:02:22 +02:00
2025-04-30 12:15:43 +01:00
2025-03-11 09:41:41 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-22 12:26:47 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-18 13:08:12 -04:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-04-15 11:33:09 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-04-23 15:55:41 -04:00
2025-04-14 17:08:47 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2024-10-06 10:33:16 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-14 17:58:09 +02:00
2025-03-26 10:11:34 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-11 13:47:38 +00:00
2025-04-14 13:49:13 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-17 23:08:24 +02:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-23 15:55:33 -04:00
2025-03-03 10:33:46 -08:00
2025-03-20 16:12:44 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-23 15:55:20 -04:00
2025-03-11 13:47:38 +00:00
2025-04-28 11:56:32 +01:00
2025-04-30 13:32:21 +01:00
2025-03-20 10:54:51 +00:00
2025-03-03 10:33:46 -08:00
2025-04-17 09:38:12 +02:00
2025-04-30 12:15:43 +01:00
2025-03-31 09:50:49 +02:00
2025-03-31 09:50:49 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-28 19:07:09 +02:00
2025-03-31 11:45:07 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-03 16:26:29 +01:00
2025-04-25 12:46:17 -07:00
2025-04-18 13:30:41 -07:00
2025-03-03 10:33:46 -08:00
2024-05-28 18:07:07 +01:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-16 15:00:53 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-11 13:47:38 +00:00
2025-03-03 10:33:46 -08:00
2025-03-20 15:15:01 +00:00
2025-03-20 15:15:01 +00:00
2025-03-21 15:35:22 -07:00
2025-03-03 10:33:46 -08:00
2025-04-28 14:51:50 -04:00
2025-03-03 10:33:46 -08:00
2025-04-15 13:16:05 -07:00
2025-03-20 15:15:01 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-21 15:35:22 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-15 14:23:08 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00