Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

07/25/2022
by   Pietro Bongini, et al.
0

The use of Deep Learning and Computer Vision in the Cultural Heritage domain is becoming highly relevant in the last few years with lots of applications about audio smart guides, interactive museums and augmented reality. All these technologies require lots of data to work effectively and be useful for the user. In the context of artworks, such data is annotated by experts in an expensive and time consuming process. In particular, for each artwork, an image of the artwork and a description sheet have to be collected in order to perform common tasks like Visual Question Answering. In this paper we propose a method for Visual Question Answering that allows to generate at runtime a description sheet that can be used for answering both visual and contextual questions about the artwork, avoiding completely the image and the annotation process. For this purpose, we investigate on the use of GPT-3 for generating descriptions for artworks analyzing the quality of generated descriptions through captioning metrics. Finally we evaluate the performance for Visual Question Answering and captioning tasks.

READ FULL TEXT
research
09/23/2018

Textually Enriched Neural Module Networks for Visual Question Answering

Problems at the intersection of language and vision, like visual questio...
research
03/22/2020

Visual Question Answering for Cultural Heritage

Technology and the fruition of cultural heritage are becoming increasing...
research
05/31/2015

Visual Madlibs: Fill in the blank Image Generation and Question Answering

In this paper, we introduce a new dataset consisting of 360,001 focused ...
research
11/29/2022

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

We propose a PiggyBack, a Visual Question Answering platform that allows...
research
04/27/2018

Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

Image description task has been invariably examined in a static manner w...
research
07/10/2016

Annotation Methodologies for Vision and Language Dataset Creation

Annotated datasets are commonly used in the training and evaluation of t...
research
11/09/2020

CapWAP: Captioning with a Purpose

The traditional image captioning task uses generic reference captions to...

Please sign up or login with your details

Forgot password? Click here to reset