Files
HuggingFace_transformer/tests/models
Anton Vlasjuk b4115a426e
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
[Ernie 4.5] Add ernie text models (#39228)
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
..
2025-06-11 17:28:06 +01:00
2025-06-24 15:05:29 +02:00
2025-06-11 17:28:06 +01:00
2025-05-16 13:26:54 +02:00
2025-04-28 15:08:46 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-08 11:44:29 +02:00
2025-06-27 16:54:11 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-02 12:25:26 +01:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-06-19 10:56:34 +02:00
2025-06-11 17:28:06 +01:00
2025-07-16 15:53:43 +02:00
2025-07-21 13:24:34 +02:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-07-21 12:38:05 +00:00
2025-05-23 18:29:31 +01:00
2025-06-26 20:07:17 +02:00
2025-06-11 17:28:06 +01:00
2025-07-14 12:02:59 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-24 18:01:15 +02:00
2025-04-15 11:33:09 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-08 17:15:37 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-10 05:18:44 +00:00
2025-07-10 05:18:44 +00:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-05-23 18:29:31 +01:00
2025-05-28 16:44:20 +01:00
2025-05-28 16:44:20 +01:00
2025-06-18 14:36:03 +02:00
2025-06-23 17:42:46 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-25 15:12:15 +00:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-04-08 17:15:37 +01:00
2025-06-11 17:28:06 +01:00
2025-07-18 00:02:04 +00:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00