Add DBRX Model (#29921)
* wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
@@ -25,7 +25,7 @@ Jump to the [Add new model like section](#add-new-model-like-command) to learn h
|
||||
|
||||
## Cookiecutter Templates
|
||||
|
||||
Using the `cookiecutter` utility requires to have all the `dev` dependencies installed. Let's first clone the
|
||||
Using the `cookiecutter` utility requires to have all the `dev` dependencies installed. Let's first clone the
|
||||
repository and install it in our environment:
|
||||
|
||||
```shell script
|
||||
@@ -53,20 +53,20 @@ This should launch the `cookiecutter` package which should prompt you to fill in
|
||||
The `modelname` should be cased according to the plain text casing, i.e., BERT, RoBERTa, DeBERTa.
|
||||
```
|
||||
modelname [<ModelNAME>]:
|
||||
uppercase_modelname [<MODEL_NAME>]:
|
||||
lowercase_modelname [<model_name>]:
|
||||
camelcase_modelname [<ModelName>]:
|
||||
uppercase_modelname [<MODEL_NAME>]:
|
||||
lowercase_modelname [<model_name>]:
|
||||
camelcase_modelname [<ModelName>]:
|
||||
```
|
||||
|
||||
Fill in the `authors` with your team members:
|
||||
```
|
||||
authors [The HuggingFace Team]:
|
||||
authors [The HuggingFace Team]:
|
||||
```
|
||||
|
||||
The checkpoint identifier is the checkpoint that will be used in the examples across the files. Put the name you wish,
|
||||
as it will appear on the modelhub. Do not forget to include the organisation.
|
||||
```
|
||||
checkpoint_identifier [organisation/<model_name>-base-cased]:
|
||||
checkpoint_identifier [organisation/<model_name>-base-cased]:
|
||||
```
|
||||
|
||||
The tokenizer should either be based on BERT if it behaves exactly like the BERT tokenizer, or a standalone otherwise.
|
||||
@@ -74,19 +74,19 @@ The tokenizer should either be based on BERT if it behaves exactly like the BERT
|
||||
Select tokenizer_type:
|
||||
1 - Based on BERT
|
||||
2 - Standalone
|
||||
Choose from 1, 2 [1]:
|
||||
Choose from 1, 2 [1]:
|
||||
```
|
||||
<!---
|
||||
Choose if your model is an encoder-decoder, or an encoder-only architecture.
|
||||
|
||||
If your model is an encoder-only architecture, the generated architecture will be based on the BERT model.
|
||||
If your model is an encoder-only architecture, the generated architecture will be based on the BERT model.
|
||||
If your model is an encoder-decoder architecture, the generated architecture will be based on the BART model. You can,
|
||||
of course, edit the files once the generation is complete.
|
||||
```
|
||||
Select is_encoder_decoder_model:
|
||||
1 - True
|
||||
2 - False
|
||||
Choose from 1, 2 [1]:
|
||||
Choose from 1, 2 [1]:
|
||||
```
|
||||
-->
|
||||
|
||||
@@ -97,8 +97,8 @@ src/transformers/models/<model_name>/configuration_<model_name>.py
|
||||
src/transformers/models/<model_name>/modeling_<model_name>.py
|
||||
src/transformers/models/<model_name>/modeling_tf_<model_name>.py
|
||||
src/transformers/models/<model_name>/tokenization_<model_name>.py
|
||||
tests/test_modeling_<model_name>.py
|
||||
tests/test_modeling_tf_<model_name>.py
|
||||
tests/models/<model_name>/test_modeling_<model_name>.py
|
||||
tests/models/<model_name>/test_modeling_tf_<model_name>.py
|
||||
```
|
||||
|
||||
You can run the tests to ensure that they all pass:
|
||||
@@ -107,9 +107,9 @@ You can run the tests to ensure that they all pass:
|
||||
python -m pytest ./tests/test_*<model_name>*.py
|
||||
```
|
||||
|
||||
Feel free to modify each file to mimic the behavior of your model.
|
||||
Feel free to modify each file to mimic the behavior of your model.
|
||||
|
||||
⚠ You should be careful about the classes preceded by the following line:️
|
||||
⚠ You should be careful about the classes preceded by the following line:️
|
||||
|
||||
```python
|
||||
# Copied from transformers.[...]
|
||||
@@ -119,8 +119,8 @@ This line ensures that the copy does not diverge from the source. If it *should*
|
||||
is different, this line needs to be deleted. If you don't delete this line and run `make fix-copies`,
|
||||
your changes will be overwritten.
|
||||
|
||||
Once you have edited the files to fit your architecture, simply re-run the tests (and edit them if a change
|
||||
is needed!) afterwards to make sure everything works as expected.
|
||||
Once you have edited the files to fit your architecture, simply re-run the tests (and edit them if a change
|
||||
is needed!) afterwards to make sure everything works as expected.
|
||||
|
||||
Once the files are generated and you are happy with your changes, here's a checklist to ensure that your contribution
|
||||
will be merged quickly:
|
||||
@@ -251,7 +251,7 @@ Once you're done, you can run the tests to ensure that they all pass:
|
||||
python -m pytest ./tests/test_*<model_name>*.py
|
||||
```
|
||||
|
||||
⚠ You should be careful about the classes preceded by the following line:️
|
||||
⚠ You should be careful about the classes preceded by the following line:️
|
||||
|
||||
```python
|
||||
# Copied from transformers.[...]
|
||||
@@ -261,8 +261,8 @@ This line ensures that the copy does not diverge from the source. If it *should*
|
||||
is different, this line needs to be deleted. If you don't delete this line and run `make fix-copies`,
|
||||
your changes will be overwritten.
|
||||
|
||||
Once you have edited the files to fit your architecture, simply re-run the tests (and edit them if a change
|
||||
is needed!) afterwards to make sure everything works as expected.
|
||||
Once you have edited the files to fit your architecture, simply re-run the tests (and edit them if a change
|
||||
is needed!) afterwards to make sure everything works as expected.
|
||||
|
||||
Once the files are generated and you are happy with your changes, here's a checklist to ensure that your contribution
|
||||
will be merged quickly:
|
||||
|
||||
Reference in New Issue
Block a user