VQA task guide (#25244)

* initial commit * semi-finished task guide draft * image link * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_question_answering.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * feedback addressed * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits addressed --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-08-09 08:29:06 -04:00
parent eb3ded16f7
commit f2a43c7383
2 changed files with 403 additions and 0 deletions
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -75,6 +75,8 @@
        title: Image captioning
      - local: tasks/document_question_answering
        title: Document Question Answering
+      - local: tasks/visual_question_answering
+        title: Visual Question Answering
      - local: tasks/text-to-speech
        title: Text to speech
    title: Multimodal