Remove old benchmark code (#35730)

* remove traces of the old deprecated benchmarks * also remove old tf benchmark example, which uses deleted code * run doc builder
2025-01-21 17:56:43 +00:00
parent 870eb7b41b
commit 90b46e983f
31 changed files with 4 additions and 4224 deletions
--- a/docs/source/ar/_toctree.yml
+++ b/docs/source/ar/_toctree.yml
@@ -110,7 +110,7 @@
  title: أدلة المهام
 - sections:
  - local: fast_tokenizers
-    title: استخدم مجزئيات النصوص السريعة من 🤗 Tokenizers 
+    title: استخدم مجزئيات النصوص السريعة من 🤗 Tokenizers
  - local: multilingual
    title: الاستدلال باستخدام نماذج متعددة اللغات
  - local: create_a_model
@@ -129,8 +129,6 @@
    title: التصدير إلى TFLite
  - local: torchscript
    title: التصدير إلى TorchScript
-  - local: benchmarks
-    title: المعايير
  - local: notebooks
    title: دفاتر الملاحظات مع الأمثلة
  - local: community
@@ -883,7 +881,7 @@
 #     - local: internal/pipelines_utils
 #       title: مرافق خطوط الأنابيب
 #     - local: internal/tokenization_utils
-#       title: مرافق مقسم النصوص 
+#       title: مرافق مقسم النصوص
 #     - local: internal/trainer_utils
 #       title: مرافق المدرب
 #     - local: internal/generation_utils
--- a/docs/source/ar/benchmarks.md
+++ b/docs/source/ar/benchmarks.md
@@ -1,352 +0,0 @@
-# معايير الأداء
-<Tip warning={true}>
-
-أدوات قياس الأداء من Hugging Face أصبحت قديمة،ويُنصح باستخدام مكتبات خارجية لقياس سرعة وتعقيد الذاكرة لنماذج Transformer.
-
-</Tip>
-
-[[open-in-colab]]
-
-لنلق نظرة على كيفية تقييم أداء نماذج 🤗 Transformers، وأفضل الممارسات، ومعايير الأداء المتاحة بالفعل.
-
-يُمكن العثور على دفتر ملاحظات يشرح بالتفصيل كيفية قياس أداء نماذج 🤗 Transformers [هنا](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb).
-
-## كيفية قياس أداء نماذج 🤗 Transformers
-
-تسمح الفئتان [`PyTorchBenchmark`] و [`TensorFlowBenchmark`] بتقييم أداء نماذج 🤗 Transformers بمرونة. تتيح لنا فئات التقييم قياس الأداء قياس _الاستخدام الأقصى للذاكرة_ و _الوقت اللازم_ لكل من _الاستدلال_ و _التدريب_.
-
-<Tip>
-
-هنا، ييُعرَّف _الاستدلال_ بأنه تمريرة أمامية واحدة، ويتم تعريف _التدريب_ بأنه تمريرة أمامية واحدة وتمريرة خلفية واحدة.
-
-</Tip>
-
-تتوقع فئات تقييم الأداء [`PyTorchBenchmark`] و [`TensorFlowBenchmark`] كائنًا من النوع [`PyTorchBenchmarkArguments`] و [`TensorFlowBenchmarkArguments`]، على التوالي، للتنفيذ. [`PyTorchBenchmarkArguments`] و [`TensorFlowBenchmarkArguments`] هي فئات بيانات وتحتوي على جميع التكوينات ذات الصلة لفئة تقييم الأداء المقابلة. في المثال التالي، يتم توضيح كيفية تقييم أداء نموذج BERT من النوع _bert-base-cased_.
-
-<frameworkcontent>
-<pt>
-  
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
-
->>> args = PyTorchBenchmarkArguments(models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
->>> benchmark = PyTorchBenchmark(args)
-```
-</pt>
-<tf>
-  
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> benchmark = TensorFlowBenchmark(args)
-```
-</tf>
-</frameworkcontent>
-
-هنا، يتم تمرير ثلاثة معامﻻت إلى فئات بيانات حجة قياس الأداء، وهي `models` و `batch_sizes` و `sequence_lengths`. المعامل `models` مطلوبة وتتوقع `قائمة` من بمعرّفات النموذج من [مركز النماذج](https://huggingface.co/models) تحدد معامﻻت القائمة `batch_sizes` و `sequence_lengths` حجم `input_ids` الذي يتم قياس أداء النموذج عليه. هناك العديد من المعلمات الأخرى التي يمكن تكوينها عبر فئات بيانات معال قياس الأداء. لمزيد من التفاصيل حول هذه المعلمات، يمكنك إما الرجوع مباشرة إلى الملفات `src/transformers/benchmark/benchmark_args_utils.py`، `src/transformers/benchmark/benchmark_args.py` (لـ PyTorch) و `src/transformers/benchmark/benchmark_args_tf.py` (لـ Tensorflow). أو، بدلاً من ذلك، قم بتشغيل أوامر shell التالية من المجلد الرئيسي لطباعة قائمة وصفية بجميع المعلمات القابلة للتكوين لـ PyTorch و Tensorflow على التوالي.
-
-<frameworkcontent>
-<pt>
-  
-```bash
-python examples/pytorch/benchmarking/run_benchmark.py --help
-```
-
-يُمكن ببساطة تشغيل كائن التقييم الذي تم تهيئته عن طريق استدعاء `benchmark.run()`.
-
-```py
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.006     
-google-bert/bert-base-uncased          8               32            0.006     
-google-bert/bert-base-uncased          8              128            0.018     
-google-bert/bert-base-uncased          8              512            0.088     
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1227
-google-bert/bert-base-uncased          8               32            1281
-google-bert/bert-base-uncased          8              128            1307
-google-bert/bert-base-uncased          8              512            1539
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-  
-```bash
-python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
-```
-
-يُمكن بعد ذلك تشغيل كائن قياس الأداء الذي تم تهيئته عن طريق استدعاء `benchmark.run()`.
-
-```py
->>> results = benchmark.run()
->>> print(results)
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.005
-google-bert/bert-base-uncased          8               32            0.008
-google-bert/bert-base-uncased          8              128            0.022
-google-bert/bert-base-uncased          8              512            0.105
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1330
-google-bert/bert-base-uncased          8               32            1330
-google-bert/bert-base-uncased          8              128            1330
-google-bert/bert-base-uncased          8              512            1770
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 202.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-بشكل افتراضي، يتم تقييم _الوقت_ و _الذاكرة المطلوبة_ لـ _الاستدلال_. في مثال المخرجات أعلاه، يُظهر القسمان الأولان النتيجة المقابلة لـ _وقت الاستدلال_ و _ذاكرة الاستدلال_. بالإضافة إلى ذلك، يتم طباعة جميع المعلومات ذات الصلة حول بيئة الحوسبة، على سبيل المثال نوع وحدة معالجة الرسومات (GPU)، والنظام، وإصدارات المكتبة، وما إلى ذلك، في القسم الثالث تحت _معلومات البيئة_. يمكن حفظ هذه المعلومات بشكل اختياري في ملف _.csv_ عند إضافة المعامل `save_to_csv=True` إلى [`PyTorchBenchmarkArguments`] و [`TensorFlowBenchmarkArguments`] على التوالي. في هذه الحالة، يتم حفظ كل قسم في ملف _.csv_ منفصل. يمكن اختيارًا تحديد مسار كل ملف _.csv_ عبر فئات بيانات معامل قياس الأداء.
-
-بدلاً من تقييم النماذج المدربة مسبقًا عبر معرّف النموذج، على سبيل المثال `google-bert/bert-base-uncased`، يُمكن للمستخدم بدلاً من ذلك قياس أداء تكوين عشوائي لأي فئة نموذج متاحة. في هذه الحالة، يجب إدراج "قائمة" من التكوينات مع معامل قياس الأداء كما هو موضح أدناه.
-
-<frameworkcontent>
-<pt>
-  
-```py
->>> from transformers import PyTorchBenchmark، PyTorchBenchmarkArguments، BertConfig
-
->>> args = PyTorchBenchmarkArguments(
-...     models=["bert-base"، "bert-384-hid"، "bert-6-lay"]، batch_sizes=[8]، sequence_lengths=[8، 32، 128، 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = PyTorchBenchmark(args، configs=[config_base، config_384_hid، config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8              128            0.006
-bert-base                  8              512            0.006
-bert-base                  8              128            0.018     
-bert-base                  8              512            0.088     
-bert-384-hid              8               8             0.006     
-bert-384-hid              8               32            0.006     
-bert-384-hid              8              128            0.011     
-bert-384-hid              8              512            0.054     
-bert-6-lay                 8               8             0.003     
-bert-6-lay                 8               32            0.004     
-bert-6-lay                 8              128            0.009     
-bert-6-lay                 8              512            0.044
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB
-## نتائج اختبار الأداء
-
-في هذا القسم، يتم قياس _وقت الاستدلال_ و _الذاكرة المطلوبة_ للاستدلال، لمختلف تكوينات `BertModel`. يتم عرض النتائج في جدول، مع تنسيق مختلف قليلاً لكل من PyTorch و TensorFlow.
-
--------------------------------------------------------------------------------
-| اسم النموذج | حجم الدفعة | طول التسلسل | الذاكرة بالميغابايت |
--------------------------------------------------------------------------------
-| bert-base | 8 | 8 | 1277 |
-| bert-base | 8 | 32 | 1281 |
-| bert-base | 8 | 128 | 1307 |
-| bert-base | 8 | 512 | 1539 |
-| bert-384-hid | 8 | 8 | 1005 |
-| bert-384-hid | 8 | 32 | 1027 |
-| bert-384-hid | 8 | 128 | 1035 |
-| bert-384-hid | 8 | 512 | 1255 |
-| bert-6-lay | 8 | 8 | 1097 |
-| bert-6-lay | 8 | 32 | 1101 |
-| bert-6-lay | 8 | 128 | 1127 |
-| bert-6-lay | 8 | 512 | 1359 |
--------------------------------------------------------------------------------
-
-==================== معلومات البيئة ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-  
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-==================== نتائج السرعة في الاستدلال ====================
--------------------------------------------------------------------------------
-| اسم النموذج | حجم الدفعة | طول التسلسل | الوقت بالثانية |
--------------------------------------------------------------------------------
-| bert-base | 8 | 8 | 0.005 |
-| bert-base | 8 | 32 | 0.008 |
-| bert-base | 8 | 128 | 0.022 |
-| bert-base | 8 | 512 | 0.106 |
-| bert-384-hid | 8 | 8 | 0.005 |
-| bert-384-hid | 8 | 32 | 0.007 |
-| bert-384-hid | 8 | 128 | 0.018 |
-| bert-384-hid | 8 | 512 | 0.064 |
-| bert-6-lay | 8 | 8 | 0.002 |
-| bert-6-lay | 8 | 32 | 0.003 |
-| bert-6-lay | 8 | 128 | 0.0011 |
-| bert-6-lay | 8 | 512 | 0.074 |
--------------------------------------------------------------------------------
-
-==================== نتائج الذاكرة في الاستدلال ====================
--------------------------------------------------------------------------------
-| اسم النموذج | حجم الدفعة | طول التسلسل | الذاكرة بالميغابايت |
--------------------------------------------------------------------------------
-| اسم النموذج | حجم الدفعة | طول التسلسل | الذاكرة بالميغابايت |
--------------------------------------------------------------------------------
-| bert-base | 8 | 8 | 1330 |
-| bert-base | 8 | 32 | 1330 |
-| bert-base | 8 | 128 | 1330 |
-| bert-base | 8 | 512 | 1770 |
-| bert-384-hid | 8 | 8 | 1330 |
-| bert-384-hid | 8 | 32 | 1330 |
-| bert-384-hid | 8 | 128 | 1330 |
-| bert-384-hid | 8 | 512 | 1540 |
-| bert-6-lay | 8 | 8 | 1330 |
-| bert-6-lay | 8 | 32 | 1330 |
-| bert-6-lay | 8 | 128 | 1330 |
-| bert-6-lay | 8 | 512 | 1540 |
--------------------------------------------------------------------------------
-
-==================== معلومات البيئة ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-مرة أخرى، يتم قياس _وقت الاستدلال_ و _الذاكرة المطلوبة_ للاستدلال، ولكن هذه المرة لتكوينات مخصصة لـ `BertModel`. يمكن أن تكون هذه الميزة مفيدة بشكل خاص عند اتخاذ قرار بشأن التكوين الذي يجب تدريب النموذج عليه.
-
-## أفضل الممارسات في اختبار الأداء
-
-يسرد هذا القسم بعض أفضل الممارسات التي يجب مراعاتها عند إجراء اختبار الأداء لنموذج ما.
-
- حالياً، يتم دعم اختبار الأداء على جهاز واحد فقط. عند إجراء الاختبار على وحدة معالجة الرسوميات (GPU)، يوصى بأن يقوم المستخدم بتحديد الجهاز الذي يجب تشغيل التعليمات البرمجية عليه من خلال تعيين متغير البيئة `CUDA_VISIBLE_DEVICES` في الشل، على سبيل المثال `export CUDA_VISIBLE_DEVICES=0` قبل تشغيل التعليمات البرمجية.
- يجب تعيين الخيار `no_multi_processing` إلى `True` فقط لأغراض الاختبار والتصحيح. ولضمان قياس الذاكرة بدقة، يوصى بتشغيل كل اختبار ذاكرة في عملية منفصلة والتأكد من تعيين `no_multi_processing` إلى `True`.
- يجب دائمًا ذكر معلومات البيئة عند مشاركة نتائج تقييم النموذج. يُمكن أن تختلف النتائج اختلافًا كبيرًا بين أجهزة GPU المختلفة وإصدارات المكتبات، وما إلى ذلك، لذلك فإن نتائج الاختبار بمفردها ليست مفيدة جدًا للمجتمع.
-
-## مشاركة نتائج اختبار الأداء الخاص بك
-
-في السابق، تم إجراء اختبار الأداء لجميع النماذج الأساسية المتاحة (10 في ذلك الوقت) لقياس _وقت الاستدلال_، عبر العديد من الإعدادات المختلفة: باستخدام PyTorch، مع TorchScript وبدونها، باستخدام TensorFlow، مع XLA وبدونه. تم إجراء جميع هذه الاختبارات على وحدات المعالجة المركزية (CPU) (باستثناء XLA TensorFlow) ووحدات معالجة الرسوميات (GPU).
-
-يتم شرح هذا النهج بالتفصيل في [منشور المدونة هذا](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) وتتوفر النتائج [هنا](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
-
-مع أدوات اختبار الأداء الجديدة، أصبح من الأسهل من أي وقت مضى مشاركة نتائج اختبار الأداء الخاص بك مع المجتمع:
-
- [نتائج اختبار الأداء في PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/benchmarking/README.md).
- [نتائج اختبار الأداء في TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/benchmarking/README.md).
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -139,8 +139,6 @@
    title: Export to TFLite
  - local: torchscript
    title: Export to TorchScript
-  - local: benchmarks
-    title: Benchmarks
  - local: notebooks
    title: Notebooks with examples
  - local: community
--- a/docs/source/en/benchmarks.md
+++ b/docs/source/en/benchmarks.md
@@ -1,387 +0,0 @@
-<!--Copyright 2020 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-
-⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
-rendered properly in your Markdown viewer.
-
-->
-
-# Benchmarks
-
-<Tip warning={true}>
-
-Hugging Face's Benchmarking tools are deprecated and it is advised to use external Benchmarking libraries to measure the speed 
-and memory complexity of Transformer models.
-
-</Tip>
-
-[[open-in-colab]]
-
-Let's take a look at how 🤗 Transformers models can be benchmarked, best practices, and already available benchmarks.
-
-A notebook explaining in more detail how to benchmark 🤗 Transformers models can be found [here](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb).
-
-## How to benchmark 🤗 Transformers models
-
-The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformers models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_.
-
-<Tip>
-
-Here, _inference_ is defined by a single forward pass, and _training_ is defined by a single forward pass and
-backward pass.
-
-</Tip>
-
-The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an object of type [`PyTorchBenchmarkArguments`] and
-[`TensorFlowBenchmarkArguments`], respectively, for instantiation. [`PyTorchBenchmarkArguments`] and [`TensorFlowBenchmarkArguments`] are data classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it is shown how a BERT model of type _bert-base-cased_ can be benchmarked.
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
-
->>> args = PyTorchBenchmarkArguments(models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
->>> benchmark = PyTorchBenchmark(args)
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> benchmark = TensorFlowBenchmark(args)
-```
-</tf>
-</frameworkcontent>
-
-Here, three arguments are given to the benchmark argument data classes, namely `models`, `batch_sizes`, and
-`sequence_lengths`. The argument `models` is required and expects a `list` of model identifiers from the
-[model hub](https://huggingface.co/models) The `list` arguments `batch_sizes` and `sequence_lengths` define
-the size of the `input_ids` on which the model is benchmarked. There are many more parameters that can be configured
-via the benchmark argument data classes. For more detail on these one can either directly consult the files
-`src/transformers/benchmark/benchmark_args_utils.py`, `src/transformers/benchmark/benchmark_args.py` (for PyTorch)
-and `src/transformers/benchmark/benchmark_args_tf.py` (for Tensorflow). Alternatively, running the following shell
-commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow
-respectively.
-
-<frameworkcontent>
-<pt>
-```bash
-python examples/pytorch/benchmarking/run_benchmark.py --help
-```
-
-An instantiated benchmark object can then simply be run by calling `benchmark.run()`.
-
-```py
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.006     
-google-bert/bert-base-uncased          8               32            0.006     
-google-bert/bert-base-uncased          8              128            0.018     
-google-bert/bert-base-uncased          8              512            0.088     
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1227
-google-bert/bert-base-uncased          8               32            1281
-google-bert/bert-base-uncased          8              128            1307
-google-bert/bert-base-uncased          8              512            1539
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```bash
-python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
-```
-
-An instantiated benchmark object can then simply be run by calling `benchmark.run()`.
-
-```py
->>> results = benchmark.run()
->>> print(results)
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.005
-google-bert/bert-base-uncased          8               32            0.008
-google-bert/bert-base-uncased          8              128            0.022
-google-bert/bert-base-uncased          8              512            0.105
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1330
-google-bert/bert-base-uncased          8               32            1330
-google-bert/bert-base-uncased          8              128            1330
-google-bert/bert-base-uncased          8              512            1770
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-By default, the _time_ and the _required memory_ for _inference_ are benchmarked. In the example output above the first
-two sections show the result corresponding to _inference time_ and _inference memory_. In addition, all relevant
-information about the computing environment, _e.g._ the GPU type, the system, the library versions, etc... are printed
-out in the third section under _ENVIRONMENT INFORMATION_. This information can optionally be saved in a _.csv_ file
-when adding the argument `save_to_csv=True` to [`PyTorchBenchmarkArguments`] and
-[`TensorFlowBenchmarkArguments`] respectively. In this case, every section is saved in a separate
-_.csv_ file. The path to each _.csv_ file can optionally be defined via the argument data classes.
-
-Instead of benchmarking pre-trained models via their model identifier, _e.g._ `google-bert/bert-base-uncased`, the user can
-alternatively benchmark an arbitrary configuration of any available model class. In this case, a `list` of
-configurations must be inserted with the benchmark args as follows.
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
-
->>> args = PyTorchBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8              128            0.006
-bert-base                  8              512            0.006
-bert-base                  8              128            0.018     
-bert-base                  8              512            0.088     
-bert-384-hid              8               8             0.006     
-bert-384-hid              8               32            0.006     
-bert-384-hid              8              128            0.011     
-bert-384-hid              8              512            0.054     
-bert-6-lay                 8               8             0.003     
-bert-6-lay                 8               32            0.004     
-bert-6-lay                 8              128            0.009     
-bert-6-lay                 8              512            0.044
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1277
-bert-base                  8               32            1281
-bert-base                  8              128            1307     
-bert-base                  8              512            1539     
-bert-384-hid              8               8             1005     
-bert-384-hid              8               32            1027     
-bert-384-hid              8              128            1035     
-bert-384-hid              8              512            1255     
-bert-6-lay                 8               8             1097     
-bert-6-lay                 8               32            1101     
-bert-6-lay                 8              128            1127     
-bert-6-lay                 8              512            1359
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8               8             0.005
-bert-base                  8               32            0.008
-bert-base                  8              128            0.022
-bert-base                  8              512            0.106
-bert-384-hid              8               8             0.005
-bert-384-hid              8               32            0.007
-bert-384-hid              8              128            0.018
-bert-384-hid              8              512            0.064
-bert-6-lay                 8               8             0.002
-bert-6-lay                 8               32            0.003
-bert-6-lay                 8              128            0.0011
-bert-6-lay                 8              512            0.074
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1330
-bert-base                  8               32            1330
-bert-base                  8              128            1330
-bert-base                  8              512            1770
-bert-384-hid              8               8             1330
-bert-384-hid              8               32            1330
-bert-384-hid              8              128            1330
-bert-384-hid              8              512            1540
-bert-6-lay                 8               8             1330
-bert-6-lay                 8               32            1330
-bert-6-lay                 8              128            1330
-bert-6-lay                 8              512            1540
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-Again, _inference time_ and _required memory_ for _inference_ are measured, but this time for customized configurations
-of the `BertModel` class. This feature can especially be helpful when deciding for which configuration the model
-should be trained.
-
-
-## Benchmark best practices
-
-This section lists a couple of best practices one should be aware of when benchmarking a model.
-
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
-  specifies on which device the code should be run by setting the `CUDA_VISIBLE_DEVICES` environment variable in the
-  shell, _e.g._ `export CUDA_VISIBLE_DEVICES=0` before running the code.
- The option `no_multi_processing` should only be set to `True` for testing and debugging. To ensure accurate
-  memory measurement it is recommended to run each memory benchmark in a separate process by making sure
-  `no_multi_processing` is set to `True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
-  heavily between different GPU devices, library versions, etc., as a consequence, benchmark results on their own are not very
-  useful for the community.
-
-
-## Sharing your benchmark
-
-Previously all available core models (10 at the time) have been benchmarked for _inference time_, across many different
-settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
-done across CPUs (except for TensorFlow XLA) and GPUs.
-
-The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) and the results are
-available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
-
-With the new _benchmark_ tools, it is easier than ever to share your benchmark results with the community
-
- [PyTorch Benchmarking Results](https://github.com/huggingface/transformers/tree/main/examples/pytorch/benchmarking/README.md).
- [TensorFlow Benchmarking Results](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/benchmarking/README.md).
--- a/docs/source/ja/_toctree.yml
+++ b/docs/source/ja/_toctree.yml
@@ -117,8 +117,6 @@
    title: TFLite へのエクスポート
  - local: torchscript
    title: トーチスクリプトへのエクスポート
-  - local: benchmarks
-    title: ベンチマーク
  - local: community
    title: コミュニティリソース
  - local: custom_tools
--- a/docs/source/ja/benchmarks.md
+++ b/docs/source/ja/benchmarks.md
@@ -1,381 +0,0 @@
-<!--
-Copyright 2023 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-
-⚠️ このファイルはMarkdownですが、Hugging Faceのdoc-builder（MDXに類似）向けの特定の構文を含んでいるため、
-Markdownビューアでは正しく表示されないことに注意してください。
-->
-
-# Benchmarks
-
-<Tip warning={true}>
-
-Hugging Faceのベンチマークツールは非推奨であり、Transformerモデルの速度とメモリの複雑さを測定するために外部のベンチマークライブラリを使用することをお勧めします。
-
-</Tip>
-
-[[open-in-colab]]
-
-🤗 Transformersモデルをベンチマークし、ベストプラクティス、すでに利用可能なベンチマークについて見てみましょう。
-
-🤗 Transformersモデルをベンチマークする方法について詳しく説明したノートブックは[こちら](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb)で利用できます。
-
-## How to benchmark 🤗 Transformers models
-
-[`PyTorchBenchmark`]クラスと[`TensorFlowBenchmark`]クラスを使用すると、🤗 Transformersモデルを柔軟にベンチマークできます。
-ベンチマーククラスを使用すると、_ピークメモリ使用量_ および _必要な時間_ を _推論_ および _トレーニング_ の両方について測定できます。
-
-<Tip>
-
-ここでの _推論_ は、単一のフォワードパスによって定義され、 _トレーニング_ は単一のフォワードパスと
-バックワードパスによって定義されます。
-
-</Tip>
-
-ベンチマーククラス[`PyTorchBenchmark`]と[`TensorFlowBenchmark`]は、それぞれのベンチマーククラスに対する適切な設定を含む [`PyTorchBenchmarkArguments`] および [`TensorFlowBenchmarkArguments`] タイプのオブジェクトを必要とします。
-[`PyTorchBenchmarkArguments`] および [`TensorFlowBenchmarkArguments`] はデータクラスであり、それぞれのベンチマーククラスに対するすべての関連する設定を含んでいます。
-次の例では、タイプ _bert-base-cased_ のBERTモデルをベンチマークする方法が示されています。
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
-
->>> args = PyTorchBenchmarkArguments(models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
->>> benchmark = PyTorchBenchmark(args)
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> benchmark = TensorFlowBenchmark(args)
-```
-</tf>
-</frameworkcontent>
-
-
-ここでは、ベンチマーク引数のデータクラスに対して、`models`、`batch_sizes`
-および`sequence_lengths`の3つの引数が指定されています。引数`models`は必須で、
-[モデルハブ](https://huggingface.co/models)からのモデル識別子の`リスト`を期待し
-ます。`batch_sizes`と`sequence_lengths`の2つの`リスト`引数は
-モデルのベンチマーク対象となる`input_ids`のサイズを定義します。
-ベンチマーク引数データクラスを介して設定できる他の多くのパラメータがあります。これらの詳細については、直接ファイル
-`src/transformers/benchmark/benchmark_args_utils.py`、
-`src/transformers/benchmark/benchmark_args.py`（PyTorch用）、および`src/transformers/benchmark/benchmark_args_tf.py`（Tensorflow用）
-を参照するか、次のシェルコマンドをルートから実行すると、PyTorchとTensorflowのそれぞれに対して設定可能なすべてのパラメータの記述的なリストが表示されます。
-
-<frameworkcontent>
-<pt>
-```bash
-python examples/pytorch/benchmarking/run_benchmark.py --help
-```
-
-インスタンス化されたベンチマークオブジェクトは、単に `benchmark.run()` を呼び出すことで実行できます。
-
-
-```py
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.006     
-google-bert/bert-base-uncased          8               32            0.006     
-google-bert/bert-base-uncased          8              128            0.018     
-google-bert/bert-base-uncased          8              512            0.088     
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1227
-google-bert/bert-base-uncased          8               32            1281
-google-bert/bert-base-uncased          8              128            1307
-google-bert/bert-base-uncased          8              512            1539
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```bash
-python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
-```
-
-インスタンス化されたベンチマークオブジェクトは、単に `benchmark.run()` を呼び出すことで実行できます。
-
-
-
-```py
->>> results = benchmark.run()
->>> print(results)
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.005
-google-bert/bert-base-uncased          8               32            0.008
-google-bert/bert-base-uncased          8              128            0.022
-google-bert/bert-base-uncased          8              512            0.105
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1330
-google-bert/bert-base-uncased          8               32            1330
-google-bert/bert-base-uncased          8              128            1330
-google-bert/bert-base-uncased          8              512            1770
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-デフォルトでは、_推論時間_ と _必要なメモリ_ がベンチマークされます。
-上記の例の出力では、最初の2つのセクションが _推論時間_ と _推論メモリ_ 
-に対応する結果を示しています。さらに、計算環境に関するすべての関連情報、
-例えば GPU タイプ、システム、ライブラリのバージョンなどが、_ENVIRONMENT INFORMATION_ の下に表示されます。この情報は、[`PyTorchBenchmarkArguments`] 
-および [`TensorFlowBenchmarkArguments`] に引数 `save_to_csv=True` 
-を追加することで、オプションで _.csv_ ファイルに保存することができます。この場合、各セクションは別々の _.csv_ ファイルに保存されます。_.csv_ 
-ファイルへのパスは、データクラスの引数を使用してオプションで定義できます。
-
-モデル識別子、例えば `google-bert/bert-base-uncased` を使用して事前学習済みモデルをベンチマークする代わりに、利用可能な任意のモデルクラスの任意の設定をベンチマークすることもできます。この場合、ベンチマーク引数と共に設定の `list` を挿入する必要があります。
-
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
-
->>> args = PyTorchBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8              128            0.006
-bert-base                  8              512            0.006
-bert-base                  8              128            0.018     
-bert-base                  8              512            0.088     
-bert-384-hid              8               8             0.006     
-bert-384-hid              8               32            0.006     
-bert-384-hid              8              128            0.011     
-bert-384-hid              8              512            0.054     
-bert-6-lay                 8               8             0.003     
-bert-6-lay                 8               32            0.004     
-bert-6-lay                 8              128            0.009     
-bert-6-lay                 8              512            0.044
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1277
-bert-base                  8               32            1281
-bert-base                  8              128            1307     
-bert-base                  8              512            1539     
-bert-384-hid              8               8             1005     
-bert-384-hid              8               32            1027     
-bert-384-hid              8              128            1035     
-bert-384-hid              8              512            1255     
-bert-6-lay                 8               8             1097     
-bert-6-lay                 8               32            1101     
-bert-6-lay                 8              128            1127     
-bert-6-lay                 8              512            1359
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8               8             0.005
-bert-base                  8               32            0.008
-bert-base                  8              128            0.022
-bert-base                  8              512            0.106
-bert-384-hid              8               8             0.005
-bert-384-hid              8               32            0.007
-bert-384-hid              8              128            0.018
-bert-384-hid              8              512            0.064
-bert-6-lay                 8               8             0.002
-bert-6-lay                 8               32            0.003
-bert-6-lay                 8              128            0.0011
-bert-6-lay                 8              512            0.074
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1330
-bert-base                  8               32            1330
-bert-base                  8              128            1330
-bert-base                  8              512            1770
-bert-384-hid              8               8             1330
-bert-384-hid              8               32            1330
-bert-384-hid              8              128            1330
-bert-384-hid              8              512            1540
-bert-6-lay                 8               8             1330
-bert-6-lay                 8               32            1330
-bert-6-lay                 8              128            1330
-bert-6-lay                 8              512            1540
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-カスタマイズされたBertModelクラスの構成に対する推論時間と必要なメモリのベンチマーク
-
-この機能は、モデルをトレーニングする際にどの構成を選択すべきかを決定する際に特に役立つことがあります。
-
-## Benchmark best practices
-
-このセクションでは、モデルをベンチマークする際に注意すべきいくつかのベストプラクティスをリストアップしています。
-
- 現在、単一デバイスのベンチマークしかサポートされていません。GPUでベンチマークを実行する場合、コードを実行するデバイスをユーザーが指定することを推奨します。
-  これはシェルで`CUDA_VISIBLE_DEVICES`環境変数を設定することで行えます。例：`export CUDA_VISIBLE_DEVICES=0`を実行してからコードを実行します。
- `no_multi_processing`オプションは、テストおよびデバッグ用にのみ`True`に設定すべきです。正確なメモリ計測を確保するために、各メモリベンチマークを別々のプロセスで実行することをお勧めします。これにより、`no_multi_processing`が`True`に設定されます。
- モデルのベンチマーク結果を共有する際には、常に環境情報を記述するべきです。異なるGPUデバイス、ライブラリバージョンなどでベンチマーク結果が大きく異なる可能性があるため、ベンチマーク結果単体ではコミュニティにとってあまり有用ではありません。
-
-## Sharing your benchmark
-
-以前、すべての利用可能なコアモデル（当時10モデル）に対して、多くの異なる設定で推論時間のベンチマークが行われました：PyTorchを使用し、TorchScriptの有無、TensorFlowを使用し、XLAの有無などです。これらのテストはすべてCPUで行われました（TensorFlow XLAを除く）。
-
-このアプローチの詳細については、[次のブログポスト](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2)に詳しく説明されており、結果は[こちら](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing)で利用できます。
-
-新しいベンチマークツールを使用すると、コミュニティとベンチマーク結果を共有することがこれまで以上に簡単になります。
-
- [PyTorchベンチマーク結果](https://github.com/huggingface/transformers/tree/main/examples/pytorch/benchmarking/README.md)。
- [TensorFlowベンチマーク結果](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/benchmarking/README.md)。
--- a/docs/source/ko/_toctree.yml
+++ b/docs/source/ko/_toctree.yml
@@ -127,8 +127,6 @@
    title: TFLite로 내보내기
  - local: torchscript
    title: TorchScript로 내보내기
-  - local: in_translation
-    title: (번역중) Benchmarks
  - local: in_translation
    title: (번역중) Notebooks with examples
  - local: community
@@ -152,7 +150,7 @@
  - local: in_translation
    title: (번역중) AQLM
  - local: in_translation
-    title: (번역중) VPTQ 
+    title: (번역중) VPTQ
  - local: quantization/quanto
    title: Quanto
  - local: quantization/eetq
--- a/docs/source/ms/_toctree.yml
+++ b/docs/source/ms/_toctree.yml
@@ -95,8 +95,6 @@
      title: Eksport ke ONNX
    - local: torchscript
      title: Eksport ke TorchScript
-    - local: benchmarks
-      title: Penanda aras
    - local: Buku nota dengan contoh
      title: Notebooks with examples
    - local: Sumber komuniti
--- a/docs/source/zh/_toctree.yml
+++ b/docs/source/zh/_toctree.yml
@@ -52,8 +52,6 @@
    title: 导出为 TFLite
  - local: torchscript
    title: 导出为 TorchScript
-  - local: benchmarks
-    title: 对模型进行基准测试
  - local: gguf
    title: 与 GGUF 格式的互操作性
  - local: tiktoken
@@ -166,7 +164,4 @@
    - local: internal/time_series_utils
      title: 时序数据工具
    title: 内部辅助工具
-  title: 应用程序接口 (API) 
-
-
-
+  title: 应用程序接口 (API)
--- a/docs/source/zh/benchmarks.md
+++ b/docs/source/zh/benchmarks.md
@@ -1,377 +0,0 @@
-<!--Copyright 2020 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-
-⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
-rendered properly in your Markdown viewer.
-
-->
-
-# 基准测试
-
-<Tip warning={true}>
-
-小提示：Hugging Face的基准测试工具已经不再更新，建议使用外部基准测试库来衡量Transformer模
-型的速度和内存复杂度。
-
-</Tip>
-
-[[open-in-colab]]
-
-让我们来看看如何对🤗 Transformers模型进行基准测试，以及进行测试的推荐策略和已有的基准测试结果。
-
-如果您需要更详细的回答，可以在[这里](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb)找到更多关于基准测试的内容。
-
-
-## 如何对🤗 Transformers模型进行基准测试
-
-使用[`PyTorchBenchmark`]和[`TensorFlowBenchmark`]类可以灵活地对🤗 Transformers模型进行基准测试。这些基准测试类可以衡量模型在**推理**和**训练**过程中所需的**峰值内存**和**时间**。
-
-<Tip>
-
-这里的**推理**指的是一次前向传播(forward pass)，而训练则指一次前向传播和反向传播(backward pass)。
-
-</Tip>
-
-
-基准测试类 [`PyTorchBenchmark`] 和 [`TensorFlowBenchmark`] 需要分别传入 [`PyTorchBenchmarkArguments`] 和 [`TensorFlowBenchmarkArguments`] 类型的对象来进行实例化。这些类是数据类型，包含了所有相关的配置参数，用于其对应的基准测试类。
-
-在下面的示例中，我们展示了如何对类型为 **bert-base-cased** 的BERT模型进行基准测试：
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
-
->>> args = PyTorchBenchmarkArguments(models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
->>> benchmark = PyTorchBenchmark(args)
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> benchmark = TensorFlowBenchmark(args)
-```
-</tf>
-</frameworkcontent>
-
-在这里，基准测试的参数数据类接受了三个主要的参数，即 `models`、`batch_sizes` 和`sequence_lengths`。其中，`models` 是必需的参数，它期望一个来自[模型库](https://huggingface.co/models)的模型标识符列表。`batch_sizes` 和 `sequence_lengths` 是列表类型的参数，定义了进行基准测试时 `input_ids` 的批量大小和序列长度。
-
-这些是基准测试数据类中可以配置的一些主要参数。除此之外，基准测试数据类中还可以配置很多其他参数。如需要查看更详细的配置参数，可以直接查看以下文件：
-
-* `src/transformers/benchmark/benchmark_args_utils.py`
-* `src/transformers/benchmark/benchmark_args.py`（针对 PyTorch）
-* `src/transformers/benchmark/benchmark_args_tf.py`（针对 TensorFlow）
-  
-另外，您还可以通过在根目录下运行以下命令，查看针对 PyTorch 和 TensorFlow 的所有可配置参数的描述列表：
-``` bash python examples/pytorch/benchmarking/run_benchmark.py --help ```
-这些命令将列出所有可以配置的参数，它们可以帮助您更加灵活地进行基准测试。
-
-
-
-<frameworkcontent>
-<pt>
-
-以下代码通过`PyTorchBenchmarkArguments`设置模型批处理大小和序列长度，然后调用`benchmark.run()`执行基准测试。
-
-```py
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.006     
-google-bert/bert-base-uncased          8               32            0.006     
-google-bert/bert-base-uncased          8              128            0.018     
-google-bert/bert-base-uncased          8              512            0.088     
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1227
-google-bert/bert-base-uncased          8               32            1281
-google-bert/bert-base-uncased          8              128            1307
-google-bert/bert-base-uncased          8              512            1539
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```bash
-python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
-```
-
-接下来，只需要调用 `benchmark.run()` 就能轻松运行已经实例化的基准测试对象。
-
-```py
->>> results = benchmark.run()
->>> print(results)
->>> results = benchmark.run()
->>> print(results)
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length     Time in s                  
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             0.005
-google-bert/bert-base-uncased          8               32            0.008
-google-bert/bert-base-uncased          8              128            0.022
-google-bert/bert-base-uncased          8              512            0.105
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length    Memory in MB 
--------------------------------------------------------------------------------
-google-bert/bert-base-uncased          8               8             1330
-google-bert/bert-base-uncased          8               32            1330
-google-bert/bert-base-uncased          8              128            1330
-google-bert/bert-base-uncased          8              512            1770
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-
-
-在一般情况下，基准测试会测量推理（inference）的**时间**和**所需内存**。在上面的示例输出中，前两部分显示了与**推理时间**和**推理内存**对应的结果。与此同时，关于计算环境的所有相关信息（例如 GPU 类型、系统、库版本等）会在第三部分的**环境信息**中打印出来。你可以通过在 [`PyTorchBenchmarkArguments`] 和 [`TensorFlowBenchmarkArguments`] 中添加 `save_to_csv=True`参数，将这些信息保存到一个 .csv 文件中。在这种情况下，每一部分的信息会分别保存在不同的 .csv 文件中。每个 .csv 文件的路径也可以通过参数数据类进行定义。
-
-
-您可以选择不通过预训练模型的模型标识符（如 `google-bert/bert-base-uncased`）进行基准测试，而是对任何可用模型类的任意配置进行基准测试。在这种情况下，我们必须将一系列配置与基准测试参数一起传入，方法如下：
-
-<frameworkcontent>
-<pt>
-```py
->>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
-
->>> args = PyTorchBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8              128            0.006
-bert-base                  8              512            0.006
-bert-base                  8              128            0.018     
-bert-base                  8              512            0.088     
-bert-384-hid              8               8             0.006     
-bert-384-hid              8               32            0.006     
-bert-384-hid              8              128            0.011     
-bert-384-hid              8              512            0.054     
-bert-6-lay                 8               8             0.003     
-bert-6-lay                 8               32            0.004     
-bert-6-lay                 8              128            0.009     
-bert-6-lay                 8              512            0.044
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1277
-bert-base                  8               32            1281
-bert-base                  8              128            1307     
-bert-base                  8              512            1539     
-bert-384-hid              8               8             1005     
-bert-384-hid              8               32            1027     
-bert-384-hid              8              128            1035     
-bert-384-hid              8              512            1255     
-bert-6-lay                 8               8             1097     
-bert-6-lay                 8               32            1101     
-bert-6-lay                 8              128            1127     
-bert-6-lay                 8              512            1359
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</pt>
-<tf>
-```py
->>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
-
->>> args = TensorFlowBenchmarkArguments(
-...     models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
-... )
->>> config_base = BertConfig()
->>> config_384_hid = BertConfig(hidden_size=384)
->>> config_6_lay = BertConfig(num_hidden_layers=6)
-
->>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
->>> benchmark.run()
-====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length       Time in s                  
--------------------------------------------------------------------------------
-bert-base                  8               8             0.005
-bert-base                  8               32            0.008
-bert-base                  8              128            0.022
-bert-base                  8              512            0.106
-bert-384-hid              8               8             0.005
-bert-384-hid              8               32            0.007
-bert-384-hid              8              128            0.018
-bert-384-hid              8              512            0.064
-bert-6-lay                 8               8             0.002
-bert-6-lay                 8               32            0.003
-bert-6-lay                 8              128            0.0011
-bert-6-lay                 8              512            0.074
--------------------------------------------------------------------------------
-
-====================      INFERENCE - MEMORY - RESULT       ====================
--------------------------------------------------------------------------------
-Model Name             Batch Size     Seq Length      Memory in MB 
--------------------------------------------------------------------------------
-bert-base                  8               8             1330
-bert-base                  8               32            1330
-bert-base                  8              128            1330
-bert-base                  8              512            1770
-bert-384-hid              8               8             1330
-bert-384-hid              8               32            1330
-bert-384-hid              8              128            1330
-bert-384-hid              8              512            1540
-bert-6-lay                 8               8             1330
-bert-6-lay                 8               32            1330
-bert-6-lay                 8              128            1330
-bert-6-lay                 8              512            1540
--------------------------------------------------------------------------------
-
-====================        ENVIRONMENT INFORMATION         ====================
-
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
-```
-</tf>
-</frameworkcontent>
-
-
- **推理时间**和**推理所需内存**会被重新测量，不过这次是针对 `BertModel` 类的自定义配置进行基准测试。这个功能在决定模型应该使用哪种配置进行训练时尤其有用。
-
-
-## 基准测试的推荐策略
-本节列出了一些在对模型进行基准测试时比较推荐的策略：
-
-* 目前，该模块只支持单设备基准测试。在进行 GPU 基准测试时，建议用户通过设置 `CUDA_VISIBLE_DEVICES` 环境变量来指定代码应在哪个设备上运行，例如在运行代码前执行 `export CUDA_VISIBLE_DEVICES=0`。
-* `no_multi_processing` 选项仅应在测试和调试时设置为 `True`。为了确保内存测量的准确性，建议将每个内存基准测试单独运行在一个进程中，并确保 `no_multi_processing` 设置为 `True`。
-* 当您分享模型基准测试结果时，应始终提供环境信息。由于 GPU 设备、库版本等之间可能存在较大差异，单独的基准测试结果对社区的帮助有限。
-
-
-## 分享您的基准测试结果
-
-先前的所有可用的核心模型（当时有10个）都已针对 **推理时间** 进行基准测试，涵盖了多种不同的设置：使用 PyTorch（包不包含 TorchScript），使用 TensorFlow（包不包含 XLA）。所有的测试都在 CPU（除了 TensorFlow XLA）和 GPU 上进行。
-
-这种方法的详细信息可以在 [这篇博客](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) 中找到，测试结果可以在 [这里](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing) 查看。
-
-
-您可以借助新的 **基准测试** 工具比以往任何时候都更容易地分享您的基准测试结果！
-
- [PyTorch 基准测试结果](https://github.com/huggingface/transformers/tree/main/examples/pytorch/benchmarking/README.md)
- [TensorFlow 基准测试结果](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/benchmarking/README.md)
-
-