Files
HuggingFace_transformer/docs/source/en/model_doc
松本和真 96bf3d6cc5 Add diffllama (#34083)
* first adding diffllama

* add Diff Attention and other but still with errors

* complate make attention Diff-Attention

* fix some bugs which may be caused by transformer-cli while adding model

* fix a bug caused by forgetting KV cache...

* Update src/transformers/models/diffllama/modeling_diffllama.py

You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward.

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

new codes are more meaningful than before

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

new codes are more meaningful than before

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fix 2times divide by sqrt(self.head_dim)

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fix 2times divide by sqrt(self.head_dim)

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place.
and more visible

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* I found Attention missed implemented from paper still on e072544a3bfc69b8a903e062729f861108ffecd3.

* re-implemented

* adding groupnorm

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* align with transformers code style

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix typo

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* adding groupnorm

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* change SdpaAttention to DiffSdpaAttention

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix bug

* Update src/transformers/models/diffllama/modeling_diffllama.py

resolve "not same outputs" problem

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix bugs of places of "GroupNorm with scale" and etc

* Revert "fix bugs of places of "GroupNorm with scale" and etc"

This reverts commit 26307d92f6acd55e9fe89f2facff350f05760960.

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* remove missed type

* add diffllama model_doc

* apply make style/quality

* apply review comment about model

* apply review comment about test

* place diffllama alphabetically on the src/transformers/__init__.py

* fix forgot code

* Supports parameters that are not initialized with standard deviation 0 in the conventional method

* add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py

* remove unused property of config

* add to supported model list

* add to spda supported model list

* fix copyright, remove pretraining_tensor_parallel, and modify for initialization test

* remove unused import and etc.

* empty commit

* empty commit

* empty commit

* apply modular transformers but with bugs

* revert prev commit

* create src/transformers/model/diffllama/modular_diffllama.py

* run utils/modular_model_converter.py

* empty commit

* leaner modular diffllama

* remove more and more in modular_diffllama.pt

* remove more and more in modular_diffllama.pt

* resolve missing docstring entries

* force reset

* convert modular

---------

Co-authored-by: Minho Ryu <ryumin93@gmail.com>
2025-01-07 11:34:56 +01:00
..
2024-09-03 14:01:00 +01:00
2024-12-06 12:17:34 +01:00
2024-12-18 20:18:17 +01:00
2024-12-17 14:44:47 +01:00
2024-04-26 16:23:44 +01:00
2024-09-20 14:27:32 +01:00
2024-06-04 18:29:45 +02:00
2023-11-06 19:45:03 +00:00
2024-07-18 10:30:37 +05:30
2023-11-10 13:49:10 +00:00
2024-12-17 09:36:31 -08:00
2024-03-15 14:29:11 +01:00
2024-08-19 10:21:51 +01:00
2024-12-17 14:44:47 +01:00
2024-05-28 18:07:07 +01:00
2024-10-21 09:05:05 -04:00
2025-01-07 11:34:56 +01:00
2024-08-19 09:28:13 +01:00
2024-10-02 13:55:19 +01:00
2023-11-23 17:44:08 +00:00
2024-05-28 18:07:07 +01:00
2024-12-17 14:23:13 +01:00
2024-08-07 10:03:05 +05:00
2024-10-18 17:41:12 +02:00
2024-06-19 09:40:57 +02:00
2024-08-27 21:27:21 +02:00
2024-09-21 01:43:50 +02:00
2024-12-06 12:17:34 +01:00
2024-05-13 15:59:46 +01:00
2024-12-09 10:01:31 +01:00
2024-05-14 16:32:01 +02:00
2024-05-28 18:07:07 +01:00
2023-10-30 21:42:19 +01:00
2023-12-20 14:25:07 +05:30
2024-08-26 17:49:44 +02:00
2024-05-28 18:07:07 +01:00
2024-10-16 11:21:49 +02:00
2024-10-30 10:11:50 +01:00
2024-10-16 11:21:49 +02:00
2024-05-28 18:07:07 +01:00
2024-08-06 15:42:05 +02:00
2024-05-28 18:07:07 +01:00
2024-09-25 18:04:42 +01:00
2024-04-17 17:59:07 +02:00
2024-09-05 15:49:28 +02:00
2024-10-10 11:49:34 +02:00
2024-05-14 08:36:11 +02:00
2024-02-19 15:22:29 +01:00
2024-02-19 15:22:29 +01:00
2024-10-04 21:39:45 +02:00
2024-03-13 19:05:20 +00:00
2024-05-28 18:07:07 +01:00
2024-08-08 15:47:24 +02:00
2024-10-07 09:54:07 +02:00
2024-05-28 18:07:07 +01:00
2023-11-06 19:45:03 +00:00
2023-07-13 11:46:54 -04:00
2024-04-19 18:31:43 +01:00
2023-11-06 19:45:03 +00:00
2024-12-11 12:40:30 +00:00
2024-05-28 18:07:07 +01:00
2023-11-23 17:02:16 +00:00
2024-06-11 15:47:38 +01:00
2024-10-15 11:27:54 +02:00
2024-05-28 18:07:07 +01:00
2024-10-04 22:28:05 +02:00