add doc for (#20525)

2022-12-01 16:52:13 +01:00
parent cdb7eeca46
commit 8b486c0310
3 changed files with 12 additions and 0 deletions
--- a/docs/source/en/perf_infer_cpu.mdx
+++ b/docs/source/en/perf_infer_cpu.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 This guide focuses on inferencing large models efficiently on CPU.
 ## `BetterTransformer` for faster inference
 We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
 ## PyTorch JIT-mode (TorchScript)
 TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
 Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.
--- a/docs/source/en/perf_infer_gpu_many.mdx
+++ b/docs/source/en/perf_infer_gpu_many.mdx
@@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU
 Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.
 </Tip>
 ## `BetterTransformer` for faster inference
 We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
--- a/docs/source/en/perf_infer_gpu_one.mdx
+++ b/docs/source/en/perf_infer_gpu_one.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
 ## `BetterTransformer` for faster inference
 We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
 ## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
 Note that this feature is also totally applicable in a multi GPU setup as well.