add doc for (#20525)
This commit is contained in:
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
|
|
||||||
This guide focuses on inferencing large models efficiently on CPU.
|
This guide focuses on inferencing large models efficiently on CPU.
|
||||||
|
|
||||||
|
## `BetterTransformer` for faster inference
|
||||||
|
|
||||||
|
We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
|
||||||
|
|
||||||
## PyTorch JIT-mode (TorchScript)
|
## PyTorch JIT-mode (TorchScript)
|
||||||
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
|
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
|
||||||
Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.
|
Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.
|
||||||
|
|||||||
@@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU
|
|||||||
Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.
|
Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
## `BetterTransformer` for faster inference
|
||||||
|
|
||||||
|
We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
|
||||||
|
|||||||
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
|
|
||||||
This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
|
This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
|
||||||
|
|
||||||
|
## `BetterTransformer` for faster inference
|
||||||
|
|
||||||
|
We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
|
||||||
|
|
||||||
## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
|
## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
|
||||||
|
|
||||||
Note that this feature is also totally applicable in a multi GPU setup as well.
|
Note that this feature is also totally applicable in a multi GPU setup as well.
|
||||||
|
|||||||
Reference in New Issue
Block a user