Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
..
2023-02-22 08:32:35 -05:00
2023-03-07 16:20:21 +01:00
2023-02-13 09:24:56 -05:00
2023-02-10 22:52:12 +05:30
2023-03-14 12:08:14 +03:00
2023-03-16 09:00:53 -04:00
2023-03-06 13:13:43 -05:00
2023-02-27 08:36:36 +01:00
2023-02-06 18:10:56 -05:00
2023-02-27 08:36:36 +01:00
2023-02-27 08:36:36 +01:00
2023-02-27 08:36:36 +01:00
2023-02-27 08:36:36 +01:00
2023-02-27 08:36:36 +01:00
2023-02-27 08:36:36 +01:00
2023-01-27 13:19:28 -05:00
2023-03-13 10:57:17 -04:00
2023-02-28 10:23:08 -05:00