LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

07/26/2022
by   Zhuo Chen, et al.
1

Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge. Most existing methods exploit pre-trained language models or/and unstructured text, but the knowledge in these resources are often incomplete and noisy. Some methods prefer to use knowledge graphs (KGs) which often have intensive structured knowledge, but the research is still quite preliminary. In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm. In the evaluation with OKVQA datasets, our method achieves state-of-the-art results.

READ FULL TEXT

page 3

page 7

research
09/15/2021

Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

Integrating outside knowledge for reasoning in visio-linguistic tasks su...
research
12/13/2021

Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task req...
research
09/10/2021

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Knowledge-based visual question answering (VQA) involves answering quest...
research
01/15/2021

Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

The limits of applicability of vision-and-language models are defined by...
research
06/13/2023

AVIS: Autonomous Visual Information Seeking with Large Language Models

In this paper, we propose an autonomous information seeking visual quest...
research
06/13/2018

Learning Visual Knowledge Memory Networks for Visual Question Answering

Visual question answering (VQA) requires joint comprehension of images a...
research
02/25/2023

Medical visual question answering using joint self-supervised learning

Visual Question Answering (VQA) becomes one of the most active research ...

Please sign up or login with your details

Forgot password? Click here to reset