Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
..
2022-02-23 15:46:28 -05:00
2023-03-09 08:12:57 -08:00
2023-02-22 09:14:54 +01:00
2023-02-03 12:43:46 -05:00
2023-03-09 09:23:48 -05:00
2023-02-20 12:21:58 +00:00
2023-03-16 09:00:53 -04:00
2023-02-06 18:10:56 -05:00
2023-03-02 12:08:43 -05:00
2023-03-15 14:13:38 -04:00
2023-03-01 17:53:29 +01:00
2023-02-22 09:14:54 +01:00
2023-02-06 18:10:56 -05:00
2023-03-10 10:50:29 -05:00
2023-03-02 18:20:34 +01:00
2020-01-06 15:11:12 +01:00
2023-03-09 09:23:48 -05:00
2023-03-09 09:23:48 -05:00
2023-03-09 09:23:48 -05:00
2023-03-14 15:43:44 +00:00
2023-03-13 21:39:06 +01:00
2023-03-09 09:23:48 -05:00
2023-03-09 09:23:48 -05:00
2023-03-14 10:03:02 +01:00
2023-02-22 09:14:54 +01:00
2023-03-09 09:23:48 -05:00