Files
HuggingFace_transformer/docs/source/en/model_doc
Vladislav Bronzov c980904204 Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
..
2025-07-08 11:53:21 +02:00
2025-06-13 11:07:09 +00:00
2025-06-24 15:05:29 +02:00
2025-06-05 14:36:54 -07:00
2025-06-26 11:04:23 +00:00
2025-06-13 11:07:09 +00:00
2025-06-16 10:46:30 -07:00
2025-06-18 16:01:25 -07:00
2025-03-03 10:33:46 -08:00
2025-05-27 11:51:41 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-05-23 13:03:47 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-20 13:46:19 -07:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-06-13 11:07:09 +00:00
2025-06-10 09:30:05 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-02 12:58:01 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-05-27 17:03:55 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-07-03 10:02:58 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-12 10:26:50 -07:00
2025-07-08 11:44:29 +02:00
2025-06-25 11:38:25 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-02 12:25:26 +01:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-05-21 10:43:11 +02:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-05-07 17:47:51 +02:00
2025-06-26 14:40:45 -07:00
2025-07-07 14:41:33 +02:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-07-08 08:22:04 +02:00
2025-07-08 08:22:04 +02:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-04 09:56:47 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-07 15:56:57 -07:00
2025-06-13 11:07:09 +00:00
2025-06-17 18:10:23 +02:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-04-15 11:33:09 +01:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-05-27 16:24:36 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-26 14:40:45 -07:00
2025-06-26 14:40:45 -07:00
2025-03-31 09:50:49 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-17 11:02:18 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-25 15:12:15 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-04-30 12:15:43 +01:00
2025-07-08 19:08:48 +02:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-21 15:35:22 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-07-02 07:56:29 -07:00
2025-04-15 13:16:05 -07:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00