* init
* copied from remote
* add proper structure and llama like structure
* fixup
* revert to state that works
* get closer to llama
* slow and steady
* some removal
* masks work
* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm
* nice
* getting closer
* closer to transformers style
* let's simplify this, batching works now
* simplified
* working version with modular
* it is indeed the rotation per weights, make it complete llama style
* cleanup conversion, next to look at -> tokenizer
* remove llama artefacts
* fix modeling tests (common ones)
* style
* integration test + first look into tokenization (will need more work, focussing on modeling other models first)
* style
* working moe version, based on remote
* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)
* more cleanup
* refactor namings and remove addition forXXX classes
* our moe won't cut it it seems, correction bias seems to be missing in remote code version
* tokenization change (remote)
* our moe version works when adding normalization :D
* cleanup moe
* nits
* cleanup modeling -> let's get to modular next
* style
* modular v1
* minor things + attempt at conversion (which doesn't work)
* no conversion follow glm, fixup modular and other nits
* modular cleanup
* fixes
* tests, tests, tests + some moe dtype forcing
* simplify modular, fix fatal fa2 bug, remaining tests
* fix import issue?
* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float
* fix sdpa test, load on init dtype only
* fixup post merge
* style
* fix doc links
* tokenization cleanup beginnings
* simplify tokenizer by a lot as its basically llama
* tokenizer is full llama with different defaults + extra special tokens
* sync og special tokens of ernie
* fix decoding with numbers (also in remote done what a timing), begin of tok tests
* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)
* nits
* docs
* my daily post merge it is
* check
* tokenization update with explanations and conversion script
* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)
* post merge fixes
* fixup tokenization, llama fast is the way to go
* more fixups
* check
* import fixes
* correction bias following the paddle code
* fix
* fix TP plan, fix correction bias sharding during forward
* style
* whoops
* fix tied weights
* docs and last nit
* license
* flasky tests
* move repo id, update when merged on the hub
* Refactor DBRX tests to use CausalLMModelTest base classes
- Changed DbrxModelTester to inherit from CausalLMModelTester
- Changed DbrxModelTest to inherit from CausalLMModelTest
- Removed duplicate methods that are already in base classes
- Added required class attributes for model classes
- Updated pipeline_model_mapping to include feature-extraction
- Kept DBRX-specific configuration and test methods
- Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter
This refactoring reduces code duplication and follows the pattern established
in other causal LM model tests like Gemma.
* Apply style fixes
* Trigger tests
* Refactor DBRX test
* Make sure the DBRX-specific settings are handled
* Use the attribute_map
* Fix attribute map
---------
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* stash commit
* Experiment 1: Try just Gemma
* Experiment 1: Just try Gemma
* make fixup
* Trigger tests
* stash commit
* Try adding Gemma3 as well
* make fixup
* Correct attrib names
* Correct pipeline model mapping
* Add in all_model_classes for Gemma1 again
* Move the pipeline model mapping around again
* make fixup
* Revert Gemma3 changes since it's a VLM
* Let's try Falcon
* Correct attributes
* Correct attributes
* Let's try just overriding get_config() for now
* Do Nemotron too
* And Llama!
* Do llama/persimmon
* Correctly skip tests
* Fix Persimmon
* Include Phimoe
* Fix Gemma2
* Set model_tester_class correctly
* Add GLM
* More models!
* models models models
* make fixup
* Add Qwen3 + Qwen3MoE
* Correct import
* make fixup
* Add the QuestionAnswering classes
* Add the QuestionAnswering classes
* Move pipeline mapping to the right place
* Jetmoe too
* Stop RoPE testing models with no RoPE
* Fix up JetMOE a bit
* Fix up JetMOE a bit
* Can we just force pad_token_id all the time?
* make fixup
* fix starcoder2
* Move pipeline mapping
* Fix RoPE skipping
* Fix RecurrentGemma tests
* Fix Falcon tests
* Add MoE attributes
* Fix values for RoPE testing
* Make sure we set bos_token_id and eos_token_id in an appropriate range
* make fixup
* Fix GLM4
* Add mamba attributes
* Revert bits of JetMOE
* Re-add the JetMOE skips
* Update tests/causal_lm_tester.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add licence
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>