Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

06/03/2023
by   Hidetaka Kamigaito, et al.
0

In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision Language (V L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.

READ FULL TEXT

page 1

page 2

page 8

page 10

page 12

research
02/27/2022

A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models

Pre-trained language models (PLMs) cannot well recall rich factual knowl...
research
05/05/2022

Entity Cloze By Date: What LMs Know About Unseen Entities

Language models (LMs) are typically trained once on a large-scale corpus...
research
04/28/2022

Instilling Type Knowledge in Language Models via Multi-Task QA

Understanding human language often necessitates understanding entities a...
research
08/18/2021

Table Caption Generation in Scholarly Documents Leveraging Pre-trained Language Models

This paper addresses the problem of generating table captions for schola...
research
09/09/2019

Improving Neural Question Generation using World Knowledge

In this paper, we propose a method for incorporating world knowledge (li...
research
02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...
research
02/03/2022

Towards Coherent and Consistent Use of Entities in Narrative Generation

Large pre-trained language models (LMs) have demonstrated impressive cap...

Please sign up or login with your details

Forgot password? Click here to reset