[Docs] Model_doc structure/clarity improvements (#26876)
* first batch of structure improvements for model_docs * second batch of structure improvements for model_docs * more structure improvements for model_docs * more structure improvements for model_docs * structure improvements for cv model_docs * more structural refactoring * addressed feedback about image processors
This commit is contained in:
@@ -29,12 +29,10 @@ alt="drawing" width="600"/>
|
||||
|
||||
<small> MGP-STR architecture. Taken from the <a href="https://arxiv.org/abs/2209.03592">original paper</a>. </small>
|
||||
|
||||
Tips:
|
||||
MGP-STR is trained on two synthetic datasets [MJSynth]((http://www.robots.ox.ac.uk/~vgg/data/text/)) (MJ) and SynthText(http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST) without fine-tuning on other datasets. It achieves state-of-the-art results on six standard Latin scene text benchmarks, including 3 regular text datasets (IC13, SVT, IIIT) and 3 irregular ones (IC15, SVTP, CUTE).
|
||||
This model was contributed by [yuekun](https://huggingface.co/yuekun). The original code can be found [here](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR).
|
||||
|
||||
- MGP-STR is trained on two synthetic datasets [MJSynth]((http://www.robots.ox.ac.uk/~vgg/data/text/)) (MJ) and SynthText(http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST) without fine-tuning on other datasets. It achieves state-of-the-art results on six standard Latin scene text benchmarks, including 3 regular text datasets (IC13, SVT, IIIT) and 3 irregular ones (IC15, SVTP, CUTE).
|
||||
- This model was contributed by [yuekun](https://huggingface.co/yuekun). The original code can be found [here](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR).
|
||||
|
||||
## Inference
|
||||
## Inference example
|
||||
|
||||
[`MgpstrModel`] accepts images as input and generates three types of predictions, which represent textual information at different granularities.
|
||||
The three types of predictions are fused to give the final prediction result.
|
||||
@@ -46,7 +44,7 @@ into a single instance to both extract the input features and decode the predict
|
||||
|
||||
- Step-by-step Optical Character Recognition (OCR)
|
||||
|
||||
``` py
|
||||
```py
|
||||
>>> from transformers import MgpstrProcessor, MgpstrForSceneTextRecognition
|
||||
>>> import requests
|
||||
>>> from PIL import Image
|
||||
|
||||
Reference in New Issue
Block a user