Docs - update formatting of llama3 model card (#33438)
update formatting of llama3 content
This commit is contained in:
@@ -58,24 +58,25 @@ Tips:
|
|||||||
- The original model uses `pad_id = -1` which means that there is no padding token. We can't have the same logic, make sure to add a padding token using `tokenizer.add_special_tokens({"pad_token":"<pad>"})` and resize the token embedding accordingly. You should also set the `model.config.pad_token_id`. The `embed_tokens` layer of the model is initialized with `self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.config.padding_idx)`, which makes sure that encoding the padding token will output zeros, so passing it when initializing is recommended.
|
- The original model uses `pad_id = -1` which means that there is no padding token. We can't have the same logic, make sure to add a padding token using `tokenizer.add_special_tokens({"pad_token":"<pad>"})` and resize the token embedding accordingly. You should also set the `model.config.pad_token_id`. The `embed_tokens` layer of the model is initialized with `self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.config.padding_idx)`, which makes sure that encoding the padding token will output zeros, so passing it when initializing is recommended.
|
||||||
- The original checkpoint can be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py). The script can be called with the following (example) command:
|
- The original checkpoint can be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py). The script can be called with the following (example) command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
|
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
|
||||||
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path --llama_version 3
|
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path --llama_version 3
|
||||||
```
|
```
|
||||||
|
|
||||||
- After conversion, the model and tokenizer can be loaded via:
|
- After conversion, the model and tokenizer can be loaded via:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained("/output/path")
|
tokenizer = AutoTokenizer.from_pretrained("/output/path")
|
||||||
model = AutoModelForCausalLM.from_pretrained("/output/path")
|
model = AutoModelForCausalLM.from_pretrained("/output/path")
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
|
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
|
||||||
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed.
|
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed.
|
||||||
|
|
||||||
- When using Flash Attention 2 via `attn_implementation="flash_attention_2"`, don't pass `torch_dtype` to the `from_pretrained` class method and use Automatic Mixed-Precision training. When using `Trainer`, it is simply specifying either `fp16` or `bf16` to `True`. Otherwise, make sure you are using `torch.autocast`. This is required because the Flash Attention only support `fp16` and `bf16` data type.
|
- When using Flash Attention 2 via `attn_implementation="flash_attention_2"`, don't pass `torch_dtype` to the `from_pretrained` class method and use Automatic Mixed-Precision training. When using `Trainer`, it is simply specifying either `fp16` or `bf16` to `True`. Otherwise, make sure you are using `torch.autocast`. This is required because the Flash Attention only support `fp16` and `bf16` data type.
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
A ton of cool resources are already available on the documentation page of [Llama2](./llama2), inviting contributors to add new resources curated for Llama3 here! 🤗
|
A ton of cool resources are already available on the documentation page of [Llama2](./llama2), inviting contributors to add new resources curated for Llama3 here! 🤗
|
||||||
|
|||||||
Reference in New Issue
Block a user