Rebase ESM PR and update all file formats (#19055)
* Rebase ESM PR and update all file formats * Fix test relative imports * Add __init__.py to the test dir * Disable gradient checkpointing * Remove references to TFESM... FOR NOW >:| * Remove completed TODOs from tests * Convert docstrings to mdx, fix-copies from BERT * fix-copies for the README and index * Update ESM's __init__.py to the modern format * Add to _toctree.yml * Ensure we correctly copy the pad_token_id from the original ESM model * Ensure we correctly copy the pad_token_id from the original ESM model * Tiny grammar nitpicks * Make the layer norm after embeddings an optional flag * Make the layer norm after embeddings an optional flag * Update the conversion script to handle other model classes * Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py * Break the copied from link from BertModel.forward to remove token_type_ids * Remove debug array saves * Begin ESM-2 porting * Add a hacky workaround for the precision issue in original repo * Code cleanup * Remove unused checkpoint conversion code * Remove unused checkpoint conversion code * Fix copyright notices * Get rid of all references to the TF weights conversion * Remove token_type_ids from the tests * Fix test code * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add credit * Remove _ args and __ kwargs in rotary embedding * Assertively remove asserts * Replace einsum with torch.outer() * Fix docstring formatting * Remove assertions in tokenization * Add paper citation to ESMModel docstring * Move vocab list to single line * Remove ESMLayer from init * Add Facebook copyrights * Clean up RotaryEmbedding docstring * Fix docstring formatting * Fix docstring for config object * Add explanation for new config methods * make fix-copies * Rename all the ESM- classes to Esm- * Update conversion script to allow pushing to hub * Update tests to point at my repo for now * Set config properly for tests * Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted * make fixup * Update expected values for slow tests * make fixup * Remove EsmForCausalLM for now * Remove EsmForCausalLM for now * Fix padding idx test * Updated README and docs with ESM-1b and ESM-2 separately (#19221) * Updated README and docs with ESM-1b and ESM-2 separately * Update READMEs, longer entry with 3 citations * make fix-copies Co-authored-by: Your Name <you@example.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Tom Sercu <tsercu@fb.com> Co-authored-by: Your Name <you@example.com>
This commit is contained in:
@@ -241,6 +241,8 @@
|
||||
title: Encoder Decoder Models
|
||||
- local: model_doc/ernie
|
||||
title: ERNIE
|
||||
- local: model_doc/esm
|
||||
title: ESM
|
||||
- local: model_doc/flaubert
|
||||
title: FlauBERT
|
||||
- local: model_doc/fnet
|
||||
|
||||
@@ -90,6 +90,7 @@ The documentation is organized into five sections:
|
||||
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||
1. **[ESM](model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
@@ -239,6 +240,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| ESM | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
|
||||
109
docs/source/en/model_doc/esm.mdx
Normal file
109
docs/source/en/model_doc/esm.mdx
Normal file
@@ -0,0 +1,109 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ESM
|
||||
|
||||
## Overview
|
||||
This page provides code and pre-trained weights for Transformer protein language models from Meta AI's Fundamental
|
||||
AI Research Team, providing the state-of-the-art ESM-2, and the previously released ESM-1b and ESM-1v. Transformer
|
||||
protein language models were introduced in the paper [Biological structure and function emerge from scaling
|
||||
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by
|
||||
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott,
|
||||
C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.
|
||||
The first version of this paper was [preprinted in 2019](https://www.biorxiv.org/content/10.1101/622803v1?versioned=true).
|
||||
|
||||
ESM-2 outperforms all tested single-sequence protein language models across a range of structure prediction tasks,
|
||||
and enables atomic resolution structure prediction.
|
||||
It was released with the paper [Language models of protein sequences at the scale of evolution enable accurate
|
||||
structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie,
|
||||
Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido and Alexander Rives.
|
||||
|
||||
|
||||
The abstract from
|
||||
"Biological structure and function emerge from scaling unsupervised learning to 250
|
||||
million protein sequences" is
|
||||
|
||||
|
||||
*In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised
|
||||
learning has led to major advances in representation learning and statistical generation. In the life sciences, the
|
||||
anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling
|
||||
at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To
|
||||
this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250
|
||||
million protein sequences spanning evolutionary diversity. The resulting model contains information about biological
|
||||
properties in its representations. The representations are learned from sequence data alone. The learned representation
|
||||
space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to
|
||||
remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and
|
||||
can be identified by linear projections. Representation learning produces features that generalize across a range of
|
||||
applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and
|
||||
improving state-of-the-art features for long-range contact prediction.*
|
||||
|
||||
|
||||
The abstract from
|
||||
"Language models of protein sequences at the scale of evolution enable accurate structure prediction" is
|
||||
|
||||
*Large language models have recently been shown to develop emergent capabilities with scale, going beyond
|
||||
simple pattern matching to perform higher level reasoning and generate lifelike images and text. While
|
||||
language models trained on protein sequences have been studied at a smaller scale, little is known about
|
||||
what they learn about biology as they are scaled up. In this work we train models up to 15 billion parameters,
|
||||
the largest language models of proteins to be evaluated to date. We find that as models are scaled they learn
|
||||
information enabling the prediction of the three-dimensional structure of a protein at the resolution of
|
||||
individual atoms. We present ESMFold for high accuracy end-to-end atomic level structure prediction directly
|
||||
from the individual sequence of a protein. ESMFold has similar accuracy to AlphaFold2 and RoseTTAFold for
|
||||
sequences with low perplexity that are well understood by the language model. ESMFold inference is an
|
||||
order of magnitude faster than AlphaFold2, enabling exploration of the structural space of metagenomic
|
||||
proteins in practical timescales.*
|
||||
|
||||
|
||||
|
||||
|
||||
Tips:
|
||||
|
||||
- ESM models are trained with a masked language modeling (MLM) objective.
|
||||
|
||||
The original code can be found [here](https://github.com/facebookresearch/esm) and was
|
||||
was developed by the Fundamental AI Research team at Meta AI.
|
||||
This model was contributed to huggingface by [jasonliu](https://huggingface.co/jasonliu)
|
||||
and [Matt](https://huggingface.co/Rocketknight1).
|
||||
|
||||
## EsmConfig
|
||||
|
||||
[[autodoc]] EsmConfig
|
||||
- all
|
||||
|
||||
## EsmTokenizer
|
||||
|
||||
[[autodoc]] EsmTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
|
||||
## EsmModel
|
||||
|
||||
[[autodoc]] EsmModel
|
||||
- forward
|
||||
|
||||
## EsmForMaskedLM
|
||||
|
||||
[[autodoc]] EsmForMaskedLM
|
||||
- forward
|
||||
|
||||
## EsmForSequenceClassification
|
||||
|
||||
[[autodoc]] EsmForSequenceClassification
|
||||
- forward
|
||||
|
||||
## EsmForTokenClassification
|
||||
|
||||
[[autodoc]] EsmForTokenClassification
|
||||
- forward
|
||||
Reference in New Issue
Block a user