Fixed spelling of training (#4416)
This commit is contained in:
@@ -6,7 +6,7 @@ Overview
|
|||||||
|
|
||||||
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
|
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
|
||||||
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
|
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
|
||||||
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
|
two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
|
||||||
|
|
||||||
- Splitting the embedding matrix into two smaller matrices
|
- Splitting the embedding matrix into two smaller matrices
|
||||||
- Using repeating layers split among groups
|
- Using repeating layers split among groups
|
||||||
|
|||||||
Reference in New Issue
Block a user