[LlamaFamiliy] add a tip about dtype (#25794)

* add a warning=True tip to the Llama2 doc

* code llama needs a tip too

* doc nit

* build PR doc

* doc nits

Co-authored-by: Lysandre <lysandre@huggingface.co>

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
This commit is contained in:
Arthur
2023-08-28 12:07:31 +02:00
committed by GitHub
parent 686c68f64c
commit de139702a1
2 changed files with 23 additions and 2 deletions

View File

@@ -26,6 +26,16 @@ The abstract from the paper is the following:
Checkout all CodeLlama models [here](https://huggingface.co/models?search=code_llama) Checkout all CodeLlama models [here](https://huggingface.co/models?search=code_llama)
<Tip warning={true}>
The `Llama2` family models, on which Code Llama is based, were trained using `bfloat16`, but the original inference uses `float16. The checkpoints uploaded on the hub use `torch_dtype = 'float16'` which will be used by the `AutoModel` API to cast the checkpoints from `torch.float32` to `torch.float16`.
The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used.
Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.
</Tip>
Tips: Tips:
- These models have the same architcture as the `Llama2` models - These models have the same architcture as the `Llama2` models
@@ -75,8 +85,8 @@ If you only want the infilled part:
>>> from transformers import pipeline >>> from transformers import pipeline
>>> import torch >>> import torch
>>> pipeline = pipeline("text-generation",model="codellama/CodeLlama-7b-hf",torch_dtype=torch.float16, device_map="auto") >>> generator = pipeline("text-generation",model="codellama/CodeLlama-7b-hf",torch_dtype=torch.float16, device_map="auto")
>>> pipeline('def remove_non_ascii(s: str) -> str:\n """ <FILL_ME>\n return result', max_new_tokens = 128, return_type = 1) >>> generator('def remove_non_ascii(s: str) -> str:\n """ <FILL_ME>\n return result', max_new_tokens = 128, return_type = 1)
``` ```
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed. come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed.

View File

@@ -26,6 +26,17 @@ The abstract from the paper is the following:
Checkout all Llama2 models [here](https://huggingface.co/models?search=llama2) Checkout all Llama2 models [here](https://huggingface.co/models?search=llama2)
<Tip warning={true}>
The `Llama2` models were trained using `bfloat16`, but the original inference uses `float16. The checkpoints uploaded on the hub use `torch_dtype = 'float16'` which will be
used by the `AutoModel` API to cast the checkpoints from `torch.float32` to `torch.float16`.
The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used.
Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.
</Tip>
Tips: Tips:
- Weights for the Llama2 models can be obtained by filling out [this form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) - Weights for the Llama2 models can be obtained by filling out [this form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)