Add the Bamba Model (#34982)
* initial commit for PR Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> * rename dynamic cache Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more unit tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Add modular bamba file * Remove trainer changes from unrelated PR * Modify modular and cofig to get model running * Fix some CI errors and beam search * Fix a plethora of bugs from CI/docs/etc * Add bamba to models with special caches * Updat to newer mamba PR for mamba sublayer * fix test_left_padding_compatibility Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix remaining tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * missed this test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * ran make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * move slow tag to integration obj Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * address comments Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * left out one part of modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * change model Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Make Rotary modular as well * Update bamba.md Added overview, update Model inference card and added config * Update bamba.md * Update bamba.md * Update bamba.md Minor fixes * Add docs for config and model back Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Add warning when using fast kernels * replaced generate example Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Address comments from PR Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Propagate attention fixes Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix attention interfaces to the new API Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix API for decoder layer Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Remove extra weights Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> Co-authored-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com> Co-authored-by: Antoni Viros <ani300@gmail.com>
This commit is contained in:
committed by
GitHub
parent
9a94dfe123
commit
9613933b02
@@ -2313,6 +2313,7 @@ class GenerationTesterMixin:
|
||||
# 2. We ignore models that have unique cache structures (e.g. mamba) or are in need of refatoring to match the
|
||||
# standard cache format (e.g.gptbigcode )
|
||||
models_without_standard_cache = (
|
||||
"bamba",
|
||||
"ctrl",
|
||||
"fsmt",
|
||||
"gptbigcode",
|
||||
|
||||
Reference in New Issue
Block a user