From 8b486c03106195e8f9703df696feb5743b776494 Mon Sep 17 00:00:00 2001 From: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu, 1 Dec 2022 16:52:13 +0100 Subject: [PATCH] add doc for (#20525) --- docs/source/en/perf_infer_cpu.mdx | 4 ++++ docs/source/en/perf_infer_gpu_many.mdx | 4 ++++ docs/source/en/perf_infer_gpu_one.mdx | 4 ++++ 3 files changed, 12 insertions(+) diff --git a/docs/source/en/perf_infer_cpu.mdx b/docs/source/en/perf_infer_cpu.mdx index faac08d6c1..a3df21e93a 100644 --- a/docs/source/en/perf_infer_cpu.mdx +++ b/docs/source/en/perf_infer_cpu.mdx @@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o This guide focuses on inferencing large models efficiently on CPU. +## `BetterTransformer` for faster inference + +We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details. + ## PyTorch JIT-mode (TorchScript) TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion. diff --git a/docs/source/en/perf_infer_gpu_many.mdx b/docs/source/en/perf_infer_gpu_many.mdx index b3331d1f12..d8a24d6ab8 100644 --- a/docs/source/en/perf_infer_gpu_many.mdx +++ b/docs/source/en/perf_infer_gpu_many.mdx @@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage. + +## `BetterTransformer` for faster inference + +We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details. diff --git a/docs/source/en/perf_infer_gpu_one.mdx b/docs/source/en/perf_infer_gpu_one.mdx index d794e6c8ec..086e2ff487 100644 --- a/docs/source/en/perf_infer_gpu_one.mdx +++ b/docs/source/en/perf_infer_gpu_one.mdx @@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu). +## `BetterTransformer` for faster inference + +We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details. + ## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition Note that this feature is also totally applicable in a multi GPU setup as well.