From 464aa74659b9711d2d64159ab82e1c49fa739fb7 Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Thu, 27 Jun 2024 10:32:51 -0700 Subject: [PATCH] [docs] Llama3 (#31662) quick usage to top --- docs/source/en/model_doc/llama3.md | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/docs/source/en/model_doc/llama3.md b/docs/source/en/model_doc/llama3.md index 067d2e9ba9..996be7f5c1 100644 --- a/docs/source/en/model_doc/llama3.md +++ b/docs/source/en/model_doc/llama3.md @@ -16,6 +16,15 @@ rendered properly in your Markdown viewer. # Llama3 +```py3 +import transformers +import torch + +model_id = "meta-llama/Meta-Llama-3-8B" + +pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto") +pipeline("Hey how are you doing today?") +``` ## Overview @@ -66,20 +75,7 @@ model = AutoModelForCausalLM.from_pretrained("/output/path") Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed. - - When using Flash Attention 2 via `attn_implementation="flash_attention_2"`, don't pass `torch_dtype` to the `from_pretrained` class method and use Automatic Mixed-Precision training. When using `Trainer`, it is simply specifying either `fp16` or `bf16` to `True`. Otherwise, make sure you are using `torch.autocast`. This is required because the Flash Attention only support `fp16` and `bf16` data type. -## Quick usage - -```py3 -import transformers -import torch - -model_id = "meta-llama/Meta-Llama-3-8B" - -pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto") -pipeline("Hey how are you doing today?") -``` - ## Resources -A ton of cool resources are already available on the documentation page of [~llama2], inviting contributors to add new resources curated for Llama3 here! 🤗 +A ton of cool resources are already available on the documentation page of [Llama2](./llama2), inviting contributors to add new resources curated for Llama3 here! 🤗