Anton Vlasjuk
b4115a426e
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
[Ernie 4.5] Add ernie text models ( #39228 )
...
* init
* copied from remote
* add proper structure and llama like structure
* fixup
* revert to state that works
* get closer to llama
* slow and steady
* some removal
* masks work
* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm
* nice
* getting closer
* closer to transformers style
* let's simplify this, batching works now
* simplified
* working version with modular
* it is indeed the rotation per weights, make it complete llama style
* cleanup conversion, next to look at -> tokenizer
* remove llama artefacts
* fix modeling tests (common ones)
* style
* integration test + first look into tokenization (will need more work, focussing on modeling other models first)
* style
* working moe version, based on remote
* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)
* more cleanup
* refactor namings and remove addition forXXX classes
* our moe won't cut it it seems, correction bias seems to be missing in remote code version
* tokenization change (remote)
* our moe version works when adding normalization :D
* cleanup moe
* nits
* cleanup modeling -> let's get to modular next
* style
* modular v1
* minor things + attempt at conversion (which doesn't work)
* no conversion follow glm, fixup modular and other nits
* modular cleanup
* fixes
* tests, tests, tests + some moe dtype forcing
* simplify modular, fix fatal fa2 bug, remaining tests
* fix import issue?
* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float
* fix sdpa test, load on init dtype only
* fixup post merge
* style
* fix doc links
* tokenization cleanup beginnings
* simplify tokenizer by a lot as its basically llama
* tokenizer is full llama with different defaults + extra special tokens
* sync og special tokens of ernie
* fix decoding with numbers (also in remote done what a timing), begin of tok tests
* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)
* nits
* docs
* my daily post merge it is
* check
* tokenization update with explanations and conversion script
* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)
* post merge fixes
* fixup tokenization, llama fast is the way to go
* more fixups
* check
* import fixes
* correction bias following the paddle code
* fix
* fix TP plan, fix correction bias sharding during forward
* style
* whoops
* fix tied weights
* docs and last nit
* license
* flasky tests
* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
..
2025-07-21 14:02:57 +02:00
2025-06-11 17:28:06 +01:00
2025-07-03 15:13:06 +02:00
2025-07-01 09:08:21 +00:00
2025-06-24 15:05:29 +02:00
2025-07-03 22:45:30 +02:00
2025-07-08 17:06:12 +02:00
2025-07-17 13:21:59 +00:00
2025-05-23 17:17:38 +02:00
2025-07-01 11:33:20 +00:00
2025-07-21 14:02:57 +02:00
2025-07-21 14:02:57 +02:00
2025-06-11 17:28:06 +01:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-06-25 14:31:20 +00:00
2025-07-16 14:00:17 +02:00
2025-04-28 11:39:11 +01:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-07-16 14:00:17 +02:00
2025-06-06 08:23:15 +00:00
2025-05-16 13:26:54 +02:00
2025-04-14 17:07:48 +02:00
2025-04-28 15:08:46 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-20 17:28:32 +02:00
2025-07-18 13:41:54 +02:00
2025-06-11 17:28:06 +01:00
2025-06-23 14:17:25 +00:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-06-11 17:28:06 +01:00
2025-06-06 20:17:37 +02:00
2025-07-08 10:18:26 +02:00
2025-04-15 18:31:20 +02:00
2025-07-21 14:43:52 +02:00
2025-07-21 12:42:00 +00:00
2025-04-08 14:12:08 +02:00
2025-07-08 17:06:12 +02:00
2025-06-02 15:57:32 +02:00
2025-04-08 14:12:08 +02:00
2025-06-12 16:39:33 +02:00
2025-07-17 13:12:32 +00:00
2025-07-15 17:23:54 +02:00
2025-06-10 11:46:52 +02:00
2025-07-01 20:54:31 +02:00
2025-06-11 17:28:06 +01:00
2025-07-01 20:54:31 +02:00
2025-06-11 17:28:06 +01:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-07-18 12:23:20 +00:00
2025-06-11 17:28:06 +01:00
2025-07-01 20:54:31 +02:00
2025-07-01 20:54:31 +02:00
2025-07-01 20:54:31 +02:00
2025-07-08 17:06:12 +02:00
2025-07-08 17:06:12 +02:00
2025-06-13 16:22:12 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-08 14:12:08 +02:00
2025-07-09 17:04:28 +02:00
2025-06-24 20:16:56 +02:00
2025-07-01 20:54:31 +02:00
2025-06-11 17:28:06 +01:00
2025-06-12 12:47:15 +02:00
2025-07-04 12:48:10 +02:00
2025-07-01 20:54:31 +02:00
2025-07-08 17:06:12 +02:00
2025-06-12 16:39:33 +02:00
2025-04-08 14:12:08 +02:00
2025-06-11 17:28:06 +01:00
2025-06-20 17:28:32 +02:00
2025-06-11 17:28:06 +01:00
2025-04-08 14:12:08 +02:00
2025-07-08 11:44:29 +02:00
2025-05-23 17:17:38 +02:00
2025-06-27 16:54:11 +02:00
2025-06-11 17:28:06 +01:00
2025-07-02 22:48:30 +02:00
2025-04-16 21:59:24 +02:00
2025-06-11 17:28:06 +01:00
2025-07-03 19:56:02 +02:00
2025-07-08 17:06:12 +02:00
2025-06-11 17:28:06 +01:00
2025-07-02 12:25:26 +01:00
2025-04-10 20:54:21 +02:00
2025-07-21 19:51:49 +02:00
2025-07-21 19:51:49 +02:00
2025-07-21 14:02:57 +02:00
2025-05-23 18:29:31 +01:00
2025-07-03 19:56:02 +02:00
2025-07-21 14:59:36 +02:00
2025-07-07 15:04:26 +02:00
2025-06-11 17:28:06 +01:00
2025-06-23 14:17:25 +00:00
2025-05-01 15:21:55 +02:00
2025-07-03 15:13:06 +02:00
2025-06-19 10:56:34 +02:00
2025-06-11 17:28:06 +01:00
2025-05-07 17:47:51 +02:00
2025-07-17 13:12:32 +00:00
2025-07-17 13:12:32 +00:00
2025-07-21 11:03:20 +02:00
2025-07-01 10:34:53 +02:00
2025-07-16 15:53:43 +02:00
2025-07-03 19:56:02 +02:00
2025-06-18 09:20:49 +02:00
2025-07-21 13:24:34 +02:00
2025-07-10 10:44:28 +02:00
2025-07-03 15:13:06 +02:00
2025-07-01 11:33:20 +00:00
2025-06-27 14:01:53 +02:00
2025-07-18 13:41:54 +02:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-04-10 20:54:21 +02:00
2025-04-08 14:12:08 +02:00
2025-06-11 17:28:06 +01:00
2025-05-26 10:18:53 +02:00
2025-07-08 17:06:12 +02:00
2025-05-26 10:18:53 +02:00
2025-05-26 10:18:53 +02:00
2025-05-26 10:18:53 +02:00
2025-07-21 12:42:00 +00:00
2025-07-21 12:38:05 +00:00
2025-07-03 19:56:02 +02:00
2025-04-08 14:12:08 +02:00
2025-04-29 12:17:55 +01:00
2025-07-03 15:13:06 +02:00
2025-07-21 12:25:52 +01:00
2025-04-08 14:12:08 +02:00
2025-07-01 11:33:20 +00:00
2025-07-01 11:33:20 +00:00
2025-07-01 11:33:20 +00:00
2025-04-08 14:12:08 +02:00
2025-04-22 11:07:34 +01:00
2025-05-23 17:17:38 +02:00
2025-07-08 10:38:25 +02:00
2025-07-15 09:34:06 +02:00
2025-07-01 11:33:20 +00:00
2025-06-13 16:14:58 +02:00
2025-07-01 11:33:20 +00:00
2025-05-23 18:29:31 +01:00
2025-07-01 11:33:20 +00:00
2025-07-21 14:02:57 +02:00
2025-07-01 09:08:21 +00:00
2025-06-25 14:31:20 +00:00
2025-06-26 20:07:17 +02:00
2025-06-25 14:31:20 +00:00
2025-06-11 17:28:06 +01:00
2025-07-03 15:13:06 +02:00
2025-07-14 12:02:59 +02:00
2025-07-03 15:13:06 +02:00
2025-04-08 14:12:08 +02:00
2025-07-17 13:12:32 +00:00
2025-06-30 11:49:03 +02:00
2025-07-01 11:33:20 +00:00
2025-07-01 11:33:20 +00:00
2025-07-02 23:41:14 +02:00
2025-07-01 11:33:20 +00:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-17 13:21:59 +00:00
2025-06-12 16:39:33 +02:00
2025-05-22 17:12:58 +02:00
2025-07-21 14:59:36 +02:00
2025-07-02 15:03:57 +02:00
2025-06-12 16:39:33 +02:00
2025-06-17 19:37:18 +01:00
2025-07-02 22:47:55 +02:00
2025-07-02 22:47:55 +02:00
2025-06-11 17:28:06 +01:00
2025-03-28 15:09:35 +01:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-07-03 15:13:06 +02:00
2025-06-24 18:01:15 +02:00
2025-07-05 11:34:28 +02:00
2025-06-23 17:07:18 +02:00
2025-07-01 11:33:20 +00:00
2025-07-03 15:13:06 +02:00
2025-04-15 11:33:09 +01:00
2025-07-16 14:00:17 +02:00
2025-04-28 11:39:11 +01:00
2025-06-11 17:28:06 +01:00
2025-07-02 22:47:55 +02:00
2025-07-07 13:34:59 -04:00
2025-07-07 13:34:59 -04:00
2025-07-02 22:47:55 +02:00
2025-04-08 14:12:08 +02:00
2025-07-15 10:40:41 +02:00
2025-07-08 17:06:12 +02:00
2025-06-12 16:39:33 +02:00
2025-06-11 17:28:06 +01:00
2025-07-03 15:13:06 +02:00
2025-04-08 14:12:08 +02:00
2025-06-11 17:28:06 +01:00
2025-07-21 14:02:57 +02:00
2025-07-21 14:02:57 +02:00
2025-07-16 14:00:17 +02:00
2025-04-08 17:15:37 +01:00
2025-05-26 10:18:53 +02:00
2025-03-28 15:09:35 +01:00
2025-04-28 11:39:11 +01:00
2025-06-27 14:39:43 +00:00
2025-04-08 14:12:08 +02:00
2025-07-17 13:12:32 +00:00
2025-07-17 13:12:32 +00:00
2025-04-08 14:12:08 +02:00
2025-07-02 15:03:57 +02:00
2025-07-02 22:48:30 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-10 05:18:44 +00:00
2025-07-10 05:18:44 +00:00
2025-07-03 19:56:02 +02:00
2025-07-01 11:33:20 +00:00
2025-05-22 17:12:58 +02:00
2025-04-25 16:57:09 +02:00
2025-06-11 17:28:06 +01:00
2025-05-23 17:17:38 +02:00
2025-06-25 14:31:20 +00:00
2025-07-16 14:00:17 +02:00
2025-05-23 18:29:31 +01:00
2025-05-23 18:29:31 +01:00
2025-07-17 13:12:32 +00:00
2025-07-08 17:06:12 +02:00
2025-05-28 16:44:20 +01:00
2025-04-08 14:12:08 +02:00
2025-06-20 17:28:32 +02:00
2025-06-23 14:17:25 +00:00
2025-05-22 17:12:58 +02:00
2025-07-02 22:48:30 +02:00
2025-06-25 17:29:10 +00:00
2025-06-12 12:47:15 +02:00
2025-07-16 14:00:17 +02:00
2025-07-02 22:48:30 +02:00
2025-07-02 22:48:30 +02:00
2025-07-17 13:12:32 +00:00
2025-07-21 14:11:46 +02:00
2025-07-21 12:19:15 +02:00
2025-06-23 13:44:50 +02:00
2025-05-28 16:44:20 +01:00
2025-07-21 12:19:15 +02:00
2025-07-17 13:12:32 +00:00
2025-06-18 14:36:03 +02:00
2025-06-23 17:42:46 +02:00
2025-07-07 15:13:25 +02:00
2025-04-08 14:12:08 +02:00
2025-07-04 13:35:53 +02:00
2025-06-11 17:28:06 +01:00
2025-07-02 22:48:30 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-10 20:54:21 +02:00
2025-06-11 17:28:06 +01:00
2025-07-02 22:47:55 +02:00
2025-07-02 22:47:55 +02:00
2025-05-23 17:17:38 +02:00
2025-07-21 12:42:00 +00:00
2025-07-21 12:42:00 +00:00
2025-06-02 09:17:28 +00:00
2025-06-02 09:17:28 +00:00
2025-07-02 22:47:55 +02:00
2025-07-02 22:48:30 +02:00
2025-07-08 17:06:12 +02:00
2025-07-08 17:06:12 +02:00
2025-07-18 12:23:20 +00:00
2025-07-21 14:02:57 +02:00
2025-07-21 14:02:57 +02:00
2025-06-25 15:12:15 +00:00
2025-07-02 12:05:10 +02:00
2025-06-11 17:28:06 +01:00
2025-07-08 17:06:12 +02:00
2025-07-21 18:18:14 +02:00
2025-07-01 09:08:21 +00:00
2025-04-08 14:12:08 +02:00
2025-05-23 18:29:31 +01:00
2025-07-03 19:56:02 +02:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-07-21 14:43:52 +02:00
2025-06-11 17:28:06 +01:00
2025-07-02 22:48:30 +02:00
2025-05-23 17:17:38 +02:00
2025-07-02 22:48:30 +02:00
2025-07-17 13:12:32 +00:00
2025-07-11 16:30:56 +02:00
2025-05-23 17:17:38 +02:00
2025-06-26 16:25:00 +01:00
2025-04-08 14:12:08 +02:00
2025-05-23 17:17:38 +02:00
2025-06-23 13:44:50 +02:00
2025-07-03 15:13:06 +02:00
2025-07-21 14:43:52 +02:00
2025-07-03 15:13:06 +02:00
2025-04-08 17:15:37 +01:00
2025-04-28 14:20:45 +01:00
2025-06-25 14:31:20 +00:00
2025-06-06 08:23:15 +00:00
2025-07-08 17:06:12 +02:00
2025-07-08 17:06:12 +02:00
2025-07-08 17:06:12 +02:00
2025-06-25 14:31:20 +00:00
2025-07-01 11:33:20 +00:00
2025-07-21 14:02:57 +02:00
2025-06-25 14:31:20 +00:00
2025-07-01 11:33:20 +00:00
2025-06-25 14:31:20 +00:00
2025-06-26 16:25:00 +01:00
2025-05-23 17:17:38 +02:00
2025-06-11 17:28:06 +01:00
2025-07-21 14:02:57 +02:00
2025-04-08 14:12:08 +02:00
2025-04-08 14:12:08 +02:00
2025-06-23 14:17:25 +00:00
2025-07-03 15:13:06 +02:00
2025-07-03 15:13:06 +02:00
2025-06-13 16:14:58 +02:00
2025-07-03 15:13:06 +02:00
2025-07-05 11:34:28 +02:00
2025-07-18 00:02:04 +00:00
2025-07-08 17:06:12 +02:00
2025-07-03 15:13:06 +02:00
2025-07-03 15:13:06 +02:00
2025-04-08 14:12:08 +02:00
2025-06-25 14:31:20 +00:00
2025-06-25 14:31:20 +00:00
2025-07-21 14:02:57 +02:00
2025-07-03 15:13:06 +02:00
2025-06-13 16:14:58 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-10 20:54:21 +02:00
2025-06-11 17:28:06 +01:00
2025-04-10 20:54:21 +02:00
2025-05-23 17:17:38 +02:00
2025-04-08 14:12:08 +02:00
2025-06-12 16:39:33 +02:00
2025-06-30 11:49:03 +02:00
2025-06-12 12:47:15 +02:00