add doc for (#20525)

This commit is contained in:
Younes Belkada
2022-12-01 16:52:13 +01:00
committed by GitHub
parent cdb7eeca46
commit 8b486c0310
3 changed files with 12 additions and 0 deletions

View File

@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
This guide focuses on inferencing large models efficiently on CPU.
## `BetterTransformer` for faster inference
We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
## PyTorch JIT-mode (TorchScript)
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.

View File

@@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU
Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.
</Tip>
## `BetterTransformer` for faster inference
We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.

View File

@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
## `BetterTransformer` for faster inference
We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
Note that this feature is also totally applicable in a multi GPU setup as well.