[Ernie 4.5] Add ernie text models (#39228)
* init * copied from remote * add proper structure and llama like structure * fixup * revert to state that works * get closer to llama * slow and steady * some removal * masks work * it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm * nice * getting closer * closer to transformers style * let's simplify this, batching works now * simplified * working version with modular * it is indeed the rotation per weights, make it complete llama style * cleanup conversion, next to look at -> tokenizer * remove llama artefacts * fix modeling tests (common ones) * style * integration test + first look into tokenization (will need more work, focussing on modeling other models first) * style * working moe version, based on remote * lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view) * more cleanup * refactor namings and remove addition forXXX classes * our moe won't cut it it seems, correction bias seems to be missing in remote code version * tokenization change (remote) * our moe version works when adding normalization :D * cleanup moe * nits * cleanup modeling -> let's get to modular next * style * modular v1 * minor things + attempt at conversion (which doesn't work) * no conversion follow glm, fixup modular and other nits * modular cleanup * fixes * tests, tests, tests + some moe dtype forcing * simplify modular, fix fatal fa2 bug, remaining tests * fix import issue? * some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float * fix sdpa test, load on init dtype only * fixup post merge * style * fix doc links * tokenization cleanup beginnings * simplify tokenizer by a lot as its basically llama * tokenizer is full llama with different defaults + extra special tokens * sync og special tokens of ernie * fix decoding with numbers (also in remote done what a timing), begin of tok tests * align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama) * nits * docs * my daily post merge it is * check * tokenization update with explanations and conversion script * review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio) * post merge fixes * fixup tokenization, llama fast is the way to go * more fixups * check * import fixes * correction bias following the paddle code * fix * fix TP plan, fix correction bias sharding during forward * style * whoops * fix tied weights * docs and last nit * license * flasky tests * move repo id, update when merged on the hub
This commit is contained in:
@@ -104,9 +104,11 @@ class CausalLMModelTester:
|
||||
is_decoder=False,
|
||||
scope=None,
|
||||
expert_interval=1,
|
||||
moe_layer_start_index=0,
|
||||
moe_intermediate_size=12,
|
||||
shared_expert_intermediate_size=36,
|
||||
shared_expert_gate=True,
|
||||
moe_num_shared_experts=2,
|
||||
num_experts_per_tok=2,
|
||||
num_experts=8,
|
||||
mamba_n_groups=1,
|
||||
@@ -146,9 +148,11 @@ class CausalLMModelTester:
|
||||
self.head_dim = self.hidden_size // self.num_attention_heads
|
||||
self.is_decoder = is_decoder
|
||||
self.expert_interval = expert_interval
|
||||
self.moe_layer_start_index = moe_layer_start_index
|
||||
self.moe_intermediate_size = moe_intermediate_size
|
||||
self.shared_expert_intermediate_size = shared_expert_intermediate_size
|
||||
self.shared_expert_gate = shared_expert_gate
|
||||
self.moe_num_shared_experts = moe_num_shared_experts
|
||||
self.num_experts_per_tok = num_experts_per_tok
|
||||
self.num_experts = num_experts
|
||||
self.mamba_n_groups = mamba_n_groups
|
||||
|
||||
Reference in New Issue
Block a user