Add DocumentQuestionAnswering pipeline (#18414)

* [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models

* Fixup

* Use the full encoding

* Basic refactoring to DocumentQuestionAnsweringPipeline

* Cleanup

* Improve args, docs, and implement preprocessing

* Integrate OCR

* Refactor question_answering pipeline

* Use refactored QA code in the document qa pipeline

* Fix tests

* Some small cleanups

* Use a string type annotation for Image.Image

* Update encoding with image features

* Wire through the basic docs

* Handle invalid response

* Handle empty word_boxes properly

* Docstring fix

* Integrate Donut model

* Fixup

* Incorporate comments

* Address comments

* Initial incorporation of tests

* Address Comments

* Change assert to ValueError

* Comments

* Wrap `score` in float to make it JSON serializable

* Incorporate AutoModeLForDocumentQuestionAnswering changes

* Fixup

* Rename postprocess function

* Fix auto import

* Applying comments

* Improve docs

* Remove extra assets and add copyright

* Address comments

Co-authored-by: Ankur Goyal <ankur@impira.com>
This commit is contained in:
Ankur Goyal
2022-09-07 10:38:49 -07:00
committed by GitHub
parent 3059d80d80
commit 2ef7742117
18 changed files with 962 additions and 139 deletions

View File

@@ -25,6 +25,7 @@ There are two categories of pipeline abstractions to be aware about:
- [`AudioClassificationPipeline`]
- [`AutomaticSpeechRecognitionPipeline`]
- [`ConversationalPipeline`]
- [`DocumentQuestionAnsweringPipeline`]
- [`FeatureExtractionPipeline`]
- [`FillMaskPipeline`]
- [`ImageClassificationPipeline`]
@@ -342,6 +343,12 @@ That should enable you to do all the custom code you want.
- __call__
- all
### DocumentQuestionAnsweringPipeline
[[autodoc]] DocumentQuestionAnsweringPipeline
- __call__
- all
### FeatureExtractionPipeline
[[autodoc]] FeatureExtractionPipeline

View File

@@ -114,6 +114,10 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
[[autodoc]] AutoModelForTableQuestionAnswering
## AutoModelForDocumentQuestionAnswering
[[autodoc]] AutoModelForDocumentQuestionAnswering
## AutoModelForImageClassification
[[autodoc]] AutoModelForImageClassification
@@ -214,6 +218,10 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
[[autodoc]] TFAutoModelForTableQuestionAnswering
## TFAutoModelForDocumentQuestionAnswering
[[autodoc]] TFAutoModelForDocumentQuestionAnswering
## TFAutoModelForTokenClassification
[[autodoc]] TFAutoModelForTokenClassification