Doc styling (#8067)
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
This commit is contained in:
@@ -14,23 +14,23 @@ The abstract from the paper is the following:
|
||||
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
|
||||
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
|
||||
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
|
||||
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied
|
||||
to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
|
||||
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward
|
||||
networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated
|
||||
BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that
|
||||
MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known
|
||||
benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7
|
||||
(0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task,
|
||||
MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
|
||||
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to
|
||||
various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
|
||||
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks.
|
||||
To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE
|
||||
model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is
|
||||
4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the
|
||||
natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms
|
||||
latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of
|
||||
90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
|
||||
|
||||
Tips:
|
||||
|
||||
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||
the right rather than the left.
|
||||
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
|
||||
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
|
||||
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
|
||||
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
|
||||
than the left.
|
||||
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore
|
||||
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
|
||||
with a causal language modeling (CLM) objective are better in that regard.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user