Update old existing feature extractor references (#24552)

* Update old existing feature extractor references

* Typo

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Address comments from review - update 'feature extractor'
Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
This commit is contained in:
amyeroberts
2023-06-29 10:17:36 +01:00
committed by GitHub
parent 10c2ac7bc6
commit ae454f41d4
138 changed files with 762 additions and 743 deletions

View File

@@ -150,23 +150,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
## Usage: LayoutLMv2Processor
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
combines a image processor ([`LayoutLMv2ImageProcessor`]) and a tokenizer
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The image processor
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
modality.
```python
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
from transformers import LayoutLMv2ImageProcessor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
image_processor = LayoutLMv2ImageProcessor() # apply_ocr is set to True by default
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
processor = LayoutLMv2Processor(image_processor, tokenizer)
```
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
and it will create the inputs expected by the model. Internally, the processor first uses
[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
[`LayoutLMv2ImageProcessor`] to apply OCR on the image to get a list of words and normalized
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
@@ -176,7 +176,7 @@ which are turned into token-level `labels`.
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
choice, and provide the words and normalized boxes yourself. This requires initializing
[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
[`LayoutLMv2ImageProcessor`] with `apply_ocr` set to `False`.
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
@@ -184,7 +184,7 @@ use cases work for both batched and non-batched inputs (we illustrate them for n
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
True**
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
This is the simplest case, in which the processor (actually the image processor) will perform OCR on the image to get
the words and normalized bounding boxes.
```python
@@ -205,7 +205,7 @@ print(encoding.keys())
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
In case one wants to do OCR themselves, one can initialize the image processor with `apply_ocr` set to
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
the processor.