ADD BORT (#9813)
* tests: add integration tests for new Bort model * bort: add conversion script from Gluonnlp to Transformers 🚀 * bort: minor cleanup (BORT -> Bort) * add docs * make fix-copies * clean doc a bit * correct docs * Update docs/source/model_doc/bort.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/bort.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct dialogpt doc * correct link * Update docs/source/model_doc/bort.rst * Update docs/source/model_doc/dialogpt.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
46
docs/source/model_doc/bort.rst
Normal file
46
docs/source/model_doc/bort.rst
Normal file
@@ -0,0 +1,46 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BORT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by
|
||||
Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the
|
||||
authors refer to as "Bort".
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We extract an optimal subset of architectural parameters for the BERT architecture from Devlin et al. (2018) by
|
||||
applying recent breakthroughs in algorithms for neural architecture search. This optimal subset, which we refer to as
|
||||
"Bort", is demonstrably smaller, having an effective (that is, not counting the embedding layer) size of 5.5% the
|
||||
original BERT-large architecture, and 16% of the net size. Bort is also able to be pretrained in 288 GPU hours, which
|
||||
is 1.2% of the time required to pretrain the highest-performing BERT parametric architectural variant, RoBERTa-large
|
||||
(Liu et al., 2019), and about 33% of that of the world-record, in GPU hours, required to train BERT-large on the same
|
||||
hardware. It is also 7.9x faster on a CPU, as well as being better performing than other compressed variants of the
|
||||
architecture, and some of the non-compressed variants: it obtains performance improvements of between 0.3% and 31%,
|
||||
absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks.*
|
||||
|
||||
Tips:
|
||||
|
||||
- BORT's model architecture is based on BERT, so one can refer to :doc:`BERT's documentation page <bert>` for the
|
||||
model's API as well as usage examples.
|
||||
- BORT uses the RoBERTa tokenizer instead of the BERT tokenizer, so one can refer to :doc:`RoBERTa's documentation page
|
||||
<roberta>` for the tokenizer's API as well as usage examples.
|
||||
- BORT requires a specific fine-tuning algorithm, called `Agora
|
||||
<https://adewynter.github.io/notes/bort_algorithms_and_applications.html#fine-tuning-with-algebraic-topology>`__ ,
|
||||
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
|
||||
algorithm to make BORT fine-tuning work.
|
||||
|
||||
The original code can be found `here <https://github.com/alexa/bort/>`__.
|
||||
Reference in New Issue
Block a user