[LayoutLMv3] Add TensorFlow implementation (#18678)
Co-authored-by: Esben Toke Christensen <esben.christensen@visma.com> Co-authored-by: Lasse Reedtz <lasse.reedtz@visma.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
This commit is contained in:
committed by
GitHub
parent
7320d95d98
commit
de8548ebf3
@@ -38,7 +38,7 @@ The documentation is organized in five parts:
|
||||
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
|
||||
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
|
||||
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
|
||||
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
|
||||
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
|
||||
- **API** describes each class and function, grouped in:
|
||||
|
||||
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
|
||||
@@ -245,7 +245,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| LayoutLMv3 | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| LayoutLMv3 | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
|
||||
@@ -26,18 +26,18 @@ Tips:
|
||||
|
||||
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
|
||||
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
|
||||
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
|
||||
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
|
||||
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
|
||||
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
|
||||
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
|
||||
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
|
||||
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png"
|
||||
alt="drawing" width="600"/>
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [chriskoo](https://huggingface.co/chriskoo), [tokec](https://huggingface.co/tokec), and [lre](https://huggingface.co/lre). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
|
||||
|
||||
|
||||
## LayoutLMv3Config
|
||||
@@ -84,3 +84,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
|
||||
|
||||
[[autodoc]] LayoutLMv3ForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFLayoutLMv3Model
|
||||
|
||||
[[autodoc]] TFLayoutLMv3Model
|
||||
- call
|
||||
|
||||
## TFLayoutLMv3ForSequenceClassification
|
||||
|
||||
[[autodoc]] TFLayoutLMv3ForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFLayoutLMv3ForTokenClassification
|
||||
|
||||
[[autodoc]] TFLayoutLMv3ForTokenClassification
|
||||
- call
|
||||
|
||||
## TFLayoutLMv3ForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFLayoutLMv3ForQuestionAnswering
|
||||
- call
|
||||
|
||||
Reference in New Issue
Block a user