Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
..
2022-11-08 19:54:41 +00:00
2021-02-15 07:55:10 -05:00
2023-03-07 21:36:38 +01:00
2023-03-16 09:00:53 -04:00
2023-02-22 09:14:54 +01:00
2023-02-22 09:14:54 +01:00
2023-02-06 18:10:56 -05:00
2023-02-07 12:27:31 -05:00
2023-02-16 11:32:32 -05:00
2023-03-13 19:11:19 +01:00
2023-03-16 09:00:53 -04:00
2023-02-06 18:10:56 -05:00
2023-02-16 11:32:32 -05:00
2023-03-03 14:43:44 +01:00
2021-02-15 07:55:10 -05:00
2023-03-09 19:53:54 +01:00
2023-02-06 18:10:56 -05:00
2023-03-02 12:30:38 +00:00
2021-10-07 12:44:23 +05:30
2023-02-28 17:12:44 +01:00
2023-02-28 17:12:44 +01:00
2023-02-28 17:12:44 +01:00
2023-02-03 12:57:02 -05:00
2023-03-01 17:53:29 +01:00
2023-02-06 18:10:56 -05:00
2023-03-09 15:39:05 +01:00
2023-02-28 17:12:44 +01:00
2022-08-01 14:23:02 +02:00
2022-06-02 10:24:16 +02:00
2022-06-02 07:44:03 -04:00
2022-05-16 13:24:20 -04:00
2023-02-28 19:40:57 +01:00
2023-03-13 12:46:14 +03:00