Convert model files from rst to mdx (#14865)
* First pass * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
72
docs/source/model_doc/ibert.mdx
Normal file
72
docs/source/model_doc/ibert.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# I-BERT
|
||||
|
||||
## Overview
|
||||
|
||||
The I-BERT model was proposed in [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by
|
||||
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
|
||||
inference up to four times faster.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
|
||||
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
|
||||
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
|
||||
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
|
||||
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
|
||||
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
|
||||
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
|
||||
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
|
||||
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
|
||||
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
|
||||
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
|
||||
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
||||
been open-sourced.*
|
||||
|
||||
This model was contributed by [kssteven](https://huggingface.co/kssteven). The original code can be found [here](https://github.com/kssteven418/I-BERT).
|
||||
|
||||
|
||||
## IBertConfig
|
||||
|
||||
[[autodoc]] IBertConfig
|
||||
|
||||
## IBertModel
|
||||
|
||||
[[autodoc]] IBertModel
|
||||
- forward
|
||||
|
||||
## IBertForMaskedLM
|
||||
|
||||
[[autodoc]] IBertForMaskedLM
|
||||
- forward
|
||||
|
||||
## IBertForSequenceClassification
|
||||
|
||||
[[autodoc]] IBertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## IBertForMultipleChoice
|
||||
|
||||
[[autodoc]] IBertForMultipleChoice
|
||||
- forward
|
||||
|
||||
## IBertForTokenClassification
|
||||
|
||||
[[autodoc]] IBertForTokenClassification
|
||||
- forward
|
||||
|
||||
## IBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] IBertForQuestionAnswering
|
||||
- forward
|
||||
Reference in New Issue
Block a user