@@ -27,7 +27,7 @@ The abstract from the paper is the following:
|
|||||||
Tips:
|
Tips:
|
||||||
|
|
||||||
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
|
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
|
||||||
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfiig.use_chunking and control chunk size with MegaConfig.chunk_size
|
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfig.use_chunking and control chunk size with MegaConfig.chunk_size
|
||||||
|
|
||||||
This model was contributed by [mnaylor](https://huggingface.co/mnaylor).
|
This model was contributed by [mnaylor](https://huggingface.co/mnaylor).
|
||||||
The original code can be found [here](https://github.com/facebookresearch/mega).
|
The original code can be found [here](https://github.com/facebookresearch/mega).
|
||||||
|
|||||||
Reference in New Issue
Block a user