Update bark doc (#25234)
* add mention to optimization in Bark docs * add offload mention in docs * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update bark docs. * Update bark.md --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
This commit is contained in:
@@ -26,8 +26,67 @@ Bark is made of 4 main models:
|
||||
|
||||
It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.
|
||||
|
||||
### Optimizing Bark
|
||||
|
||||
### Tips:
|
||||
Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**.
|
||||
|
||||
#### Using half-precision
|
||||
|
||||
You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision.
|
||||
|
||||
```python
|
||||
from transformers import BarkModel
|
||||
import torch
|
||||
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
|
||||
```
|
||||
|
||||
#### Using 🤗 Better Transformer
|
||||
|
||||
Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer:
|
||||
|
||||
```python
|
||||
model = model.to_bettertransformer()
|
||||
```
|
||||
|
||||
Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)
|
||||
|
||||
#### Using CPU offload
|
||||
|
||||
As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
|
||||
|
||||
If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.
|
||||
|
||||
```python
|
||||
model.enable_cpu_offload()
|
||||
```
|
||||
|
||||
Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
|
||||
|
||||
#### Combining optimizaton techniques
|
||||
|
||||
You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once.
|
||||
|
||||
```python
|
||||
from transformers import BarkModel
|
||||
import torch
|
||||
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
|
||||
# load in fp16
|
||||
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
|
||||
|
||||
# convert to bettertransformer
|
||||
model = BetterTransformer.transform(model, keep_original_model=False)
|
||||
|
||||
# enable CPU offload
|
||||
model.enable_cpu_offload()
|
||||
```
|
||||
|
||||
Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one).
|
||||
|
||||
### Tips
|
||||
|
||||
Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
|
||||
These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings).
|
||||
@@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark).
|
||||
|
||||
[[autodoc]] BarkModel
|
||||
- generate
|
||||
- enable_cpu_offload
|
||||
|
||||
## BarkSemanticModel
|
||||
|
||||
|
||||
Reference in New Issue
Block a user