Add Nemotron HF Support (#31699)

* Add nemotron support

* fix inference

* add unit test

* add layernorm1p as a class to avoid meta device mismatch

* test fixed

* Add copied_from statements

* remove pretraining_tp args

* remove nemotronlayernorm

* force LN computation done in FP32

* remove nemotrontokenizer and use llamatokenizer

* license update

* add option for kv_channels for minitron8b

* remove assert

* o_proj fixed

* o_proj reshape

* add gated_proj option

* typo

* remove todos

* fix broken test after merging latest main

* remove nezha/nat after meging main

* chnage default config to 15b model

* add nemo conversion script

* rename conversion script

* remove gate_proj option

* pr comment resolved

* fix unit test

* rename kv_channels to head_dim

* resolve PR issue

* add nemotron md

* fix broken tests

* refactor rope for nemotron

* test fix

* remove linearscaling

* whitespace and import

* fix some copied-from

* code style fix

* reformatted

* add position_embedding to nemotronattention

* rope refactor to only use config, copied-from fix

* format

* Run make fix-copies

* nemotron md with autodoc

* doc  fix

* fix order

* pass check_config_docstrings.py

* fix config_attributes

* remove all llama BC related code

* Use PreTrainedTokenizerFast

* ruff check examples

* conversion script update

* add nemotron to toctree
This commit is contained in:
Ao Tang
2024-08-06 09:42:05 -04:00
committed by GitHub
parent 36fd35e1cf
commit 6a03942db7
15 changed files with 2449 additions and 0 deletions

View File

@@ -67,6 +67,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
@@ -228,6 +229,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
* [ViT](https://huggingface.co/docs/transformers/model_doc/vit#transformers.ViTModel)
* [ViTHybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid#transformers.ViTHybridModel)
* [ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae#transformers.ViTMAEModel)