From 8455346c5c3a1accd86ea0497b0e7d9c16de174b Mon Sep 17 00:00:00 2001 From: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Date: Thu, 3 Aug 2023 15:08:39 +0200 Subject: [PATCH] Update bark doc (#25234) * add mention to optimization in Bark docs * add offload mention in docs * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update bark docs. * Update bark.md --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> --- docs/source/en/model_doc/bark.md | 62 +++++++++++++++++++++++++++++++- 1 file changed, 61 insertions(+), 1 deletion(-) diff --git a/docs/source/en/model_doc/bark.md b/docs/source/en/model_doc/bark.md index 4126f10c3e..63edb38dc9 100644 --- a/docs/source/en/model_doc/bark.md +++ b/docs/source/en/model_doc/bark.md @@ -26,8 +26,67 @@ Bark is made of 4 main models: It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice. +### Optimizing Bark -### Tips: +Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**. + +#### Using half-precision + +You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision. + +```python +from transformers import BarkModel +import torch + +device = "cuda" if torch.cuda.is_available() else "cpu" +model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device) +``` + +#### Using 🤗 Better Transformer + +Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer: + +```python +model = model.to_bettertransformer() +``` + +Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation) + +#### Using CPU offload + +As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle. + +If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code. + +```python +model.enable_cpu_offload() +``` + +Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install) + +#### Combining optimizaton techniques + +You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once. + +```python +from transformers import BarkModel +import torch + +device = "cuda" if torch.cuda.is_available() else "cpu" + +# load in fp16 +model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device) + +# convert to bettertransformer +model = BetterTransformer.transform(model, keep_original_model=False) + +# enable CPU offload +model.enable_cpu_offload() +``` + +Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one). + +### Tips Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c). These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings). @@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark). [[autodoc]] BarkModel - generate + - enable_cpu_offload ## BarkSemanticModel