Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework

12/01/2019
by   Lisai Zhang, et al.
0

Integrating visual features has been proved useful in language representation learning. Nevertheless, in most existing multi-modality models, alignment of visual and textual data is prerequisite. In this paper, we propose a novel semi-supervised visual integration framework for sentence level language representation. The uniqueness include: 1) the integration is conducted via a semi-supervised approach, which can bring image to textual NLU tasks by pre-training a visualization network, 2) visual representations are dynamically integrated in both training and predicting stages. To verify the efficacy of the proposed framework, we conduct the experiments on the SemEval 2018 Task 11 and reach new state-of-the-art on this reading comprehension task. Considering that the visual integration framework only requires image database, and no extra alignment is required for training and prediction, it provides an efficient and feasible method for multi-modality language learning.

READ FULL TEXT

page 4

page 6

research
09/04/2021

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Pre-training visual and textual representations from large-scale image-t...
research
01/29/2022

MVPTR: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment

In this paper, we propose a Multi-stage Vision-language Pre-TRaining (MV...
research
02/23/2021

Enhanced Modality Transition for Image Captioning

Image captioning model is a cross-modality knowledge discovery task, whi...
research
12/20/2020

Transductive Visual Verb Sense Disambiguation

Verb Sense Disambiguation is a well-known task in NLP, the aim is to fin...
research
06/11/2020

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

We present VILLA, the first known effort on large-scale adversarial trai...
research
06/04/2019

A Cross-Sentence Latent Variable Model for Semi-Supervised Text Sequence Matching

We present a latent variable model for predicting the relationship betwe...
research
08/20/2018

Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension

Cloze-style reading comprehension has been a popular task for measuring ...

Please sign up or login with your details

Forgot password? Click here to reset