Files
HuggingFace_transformer/docs/source/en/model_doc
Anton Vlasjuk b4115a426e
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
[Ernie 4.5] Add ernie text models (#39228)
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
..
2025-07-08 11:53:21 +02:00
2025-06-13 11:07:09 +00:00
2025-06-24 15:05:29 +02:00
2025-06-05 14:36:54 -07:00
2025-06-26 11:04:23 +00:00
2025-06-13 11:07:09 +00:00
2025-06-16 10:46:30 -07:00
2025-06-18 16:01:25 -07:00
2025-03-03 10:33:46 -08:00
2025-05-27 11:51:41 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-05-23 13:03:47 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-20 13:46:19 -07:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-06-10 09:30:05 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-02 12:58:01 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-05-27 17:03:55 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-07-03 10:02:58 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-12 10:26:50 -07:00
2025-07-08 11:44:29 +02:00
2025-06-25 11:38:25 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-02 12:25:26 +01:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-05-21 10:43:11 +02:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-05-07 17:47:51 +02:00
2025-06-26 14:40:45 -07:00
2025-07-07 14:41:33 +02:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-07-21 13:24:34 +02:00
2025-07-08 08:22:04 +02:00
2025-07-08 08:22:04 +02:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-11 10:59:41 -07:00
2025-06-13 11:07:09 +00:00
2025-06-04 09:56:47 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-07 15:56:57 -07:00
2025-06-13 11:07:09 +00:00
2025-07-10 16:07:33 +02:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-07-11 16:26:58 +00:00
2025-07-11 16:26:58 +00:00
2025-04-15 11:33:09 +01:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-05-27 16:24:36 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-11 11:07:32 +02:00
2025-03-03 10:33:46 -08:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-07-11 16:26:58 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-03-31 09:50:49 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-17 11:02:18 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-07-18 17:27:16 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-25 15:12:15 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-04-30 12:15:43 +01:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-21 15:35:22 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-02 07:56:29 -07:00
2025-04-15 13:16:05 -07:00
2025-06-13 11:07:09 +00:00
2025-07-18 00:02:04 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00