From 8b486c03106195e8f9703df696feb5743b776494 Mon Sep 17 00:00:00 2001
From: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date: Thu, 1 Dec 2022 16:52:13 +0100
Subject: [PATCH] add doc for (#20525)

---
 docs/source/en/perf_infer_cpu.mdx      | 4 ++++
 docs/source/en/perf_infer_gpu_many.mdx | 4 ++++
 docs/source/en/perf_infer_gpu_one.mdx  | 4 ++++
 3 files changed, 12 insertions(+)
diff --git a/docs/source/en/perf_infer_cpu.mdx b/docs/source/en/perf_infer_cpu.mdx
index faac08d6c1..a3df21e93a 100644
--- a/docs/source/en/perf_infer_cpu.mdx
+++ b/docs/source/en/perf_infer_cpu.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 
 This guide focuses on inferencing large models efficiently on CPU.
 
+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
+
 ## PyTorch JIT-mode (TorchScript)
 TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
 Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.
diff --git a/docs/source/en/perf_infer_gpu_many.mdx b/docs/source/en/perf_infer_gpu_many.mdx
index b3331d1f12..d8a24d6ab8 100644
--- a/docs/source/en/perf_infer_gpu_many.mdx
+++ b/docs/source/en/perf_infer_gpu_many.mdx
@@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU
 Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.
 
 </Tip>
+
+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
diff --git a/docs/source/en/perf_infer_gpu_one.mdx b/docs/source/en/perf_infer_gpu_one.mdx
index d794e6c8ec..086e2ff487 100644
--- a/docs/source/en/perf_infer_gpu_one.mdx
+++ b/docs/source/en/perf_infer_gpu_one.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 
 This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
 
+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
+
 ## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition
 
 Note that this feature is also totally applicable in a multi GPU setup as well.