Ao Tang
6a03942db7
Add Nemotron HF Support (#31699)
* Add nemotron support
* fix inference
* add unit test
* add layernorm1p as a class to avoid meta device mismatch
* test fixed
* Add copied_from statements
* remove pretraining_tp args
* remove nemotronlayernorm
* force LN computation done in FP32
* remove nemotrontokenizer and use llamatokenizer
* license update
* add option for kv_channels for minitron8b
* remove assert
* o_proj fixed
* o_proj reshape
* add gated_proj option
* typo
* remove todos
* fix broken test after merging latest main
* remove nezha/nat after meging main
* chnage default config to 15b model
* add nemo conversion script
* rename conversion script
* remove gate_proj option
* pr comment resolved
* fix unit test
* rename kv_channels to head_dim
* resolve PR issue
* add nemotron md
* fix broken tests
* refactor rope for nemotron
* test fix
* remove linearscaling
* whitespace and import
* fix some copied-from
* code style fix
* reformatted
* add position_embedding to nemotronattention
* rope refactor to only use config, copied-from fix
* format
* Run make fix-copies
* nemotron md with autodoc
* doc fix
* fix order
* pass check_config_docstrings.py
* fix config_attributes
* remove all llama BC related code
* Use PreTrainedTokenizerFast
* ruff check examples
* conversion script update
* add nemotron to toctree
2024-08-06 15:42:05 +02:00
..
2024-06-26 21:59:08 +01:00
2024-08-06 15:42:05 +02:00
2024-07-24 17:36:32 +01:00
2024-06-28 18:02:30 +02:00
2024-04-16 11:58:55 +02:00
2024-05-30 16:47:35 +02:00
2024-07-24 17:36:32 +01:00
2024-07-22 08:27:13 -07:00
2024-04-23 16:06:20 +01:00
2024-05-29 11:55:43 +01:00
2024-06-12 11:33:00 +01:00
2023-11-08 08:35:20 -05:00
2024-07-25 09:01:06 -07:00
2024-04-08 14:21:16 +01:00