Add REALM (#13292)
* REALM initial commit * Retriever OK (Update new_gelu). * Encoder prediction score OK * Encoder pretrained model OK * Update retriever comments * Update docs, tests, and imports * Prune unused models * Make embedder as a module `RealmEmbedder` * Add RealmRetrieverOutput * Update tokenization * Pass all tests in test_modeling_realm.py * Prune RealmModel * Update docs * Add training test. * Remove completed TODO * Style & Quality * Prune `RealmModel` * Fixup * Changes: 1. Remove RealmTokenizerFast 2. Update docstrings 3. Add a method to RealmTokenizer to handle candidates tokenization. * Fix up * Style * Add tokenization tests * Update `from_pretrained` tests * Apply suggestions * Style & Quality * Copy BERT model * Fix comment to avoid docstring copying * Make RealmBertModel private * Fix bug * Style * Basic QA * Save * Complete reader logits * Add searcher * Complete searcher & reader * Move block records init to constructor * Fix training bug * Add some outputs to RealmReader * Add finetuned checkpoint variable names parsing * Fix bug * Update REALM config * Add RealmForOpenQA * Update convert_tfrecord logits * Fix bugs * Complete imports * Update docs * Update naming * Add brute-force searcher * Pass realm model tests * Style * Exclude RealmReader from common tests * Fix * Fix * convert docs * up * up * more make style * up * upload * up * Fix * Update src/transformers/__init__.py * adapt testing * change modeling code * fix test * up * up * up * correct more * make retriever work * update * make style * finish main structure * Resolve merge conflict * Make everything work * Style * Fixup * Fixup * Update training test * fix retriever * remove hardcoded path * Fix * Fix modeling test * Update model links * Initial retrieval test * Fix modeling test * Complete retrieval tests * Fix * style * Fix tests * Fix docstring example * Minor fix of retrieval test * Update license headers and docs * Apply suggestions from code review * Style * Apply suggestions from code review * Add an example to RealmEmbedder * Fix Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
b25067d807
commit
22454ae492
@@ -35,6 +35,7 @@ PATH_TO_DOC = "docs/source"
|
||||
# Update this list with models that are supposed to be private.
|
||||
PRIVATE_MODELS = [
|
||||
"DPRSpanPredictor",
|
||||
"RealmBertModel",
|
||||
"T5Stack",
|
||||
"TFDPRSpanPredictor",
|
||||
]
|
||||
@@ -73,6 +74,10 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
|
||||
"PegasusDecoderWrapper", # Building part of bigger (tested) model.
|
||||
"DPREncoder", # Building part of bigger (tested) model.
|
||||
"ProphetNetDecoderWrapper", # Building part of bigger (tested) model.
|
||||
"RealmBertModel", # Building part of bigger (tested) model.
|
||||
"RealmReader", # Not regular model.
|
||||
"RealmScorer", # Not regular model.
|
||||
"RealmForOpenQA", # Not regular model.
|
||||
"ReformerForMaskedLM", # Needs to be setup as decoder.
|
||||
"Speech2Text2DecoderWrapper", # Building part of bigger (tested) model.
|
||||
"TFDPREncoder", # Building part of bigger (tested) model.
|
||||
@@ -129,6 +134,10 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"RagModel",
|
||||
"RagSequenceForGeneration",
|
||||
"RagTokenForGeneration",
|
||||
"RealmEmbedder",
|
||||
"RealmForOpenQA",
|
||||
"RealmScorer",
|
||||
"RealmReader",
|
||||
"TFDPRReader",
|
||||
"TFGPT2DoubleHeadsModel",
|
||||
"TFOpenAIGPTDoubleHeadsModel",
|
||||
|
||||
Reference in New Issue
Block a user