Accurate Word Representations with Universal Visual Guidance

12/30/2020
by   Zhuosheng Zhang, et al.
0

Word representation is a fundamental component in neural language understanding models. Recently, pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally give more accurate contextualized word representations than non-contextualized models do, they are still subject to a sequence of text contexts without diverse hints for word representation from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. The texts and paired images are encoded in parallel, followed by an attention layer to integrate the multimodal representations. We show that the method substantially improves the accuracy of disambiguation. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.

READ FULL TEXT
research
11/07/2019

Probing Contextualized Sentence Representations with Visual Awareness

We present a universal framework to model contextualized sentence repres...
research
01/22/2019

Enhancing Semantic Word Representations by Embedding Deeper Word Relationships

Word representations are created using analogy context-based statistics ...
research
10/13/2021

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Pre-trained language models (PLMs) aim to learn universal language repre...
research
11/01/2018

How2: A Large-scale Dataset for Multimodal Language Understanding

In this paper, we introduce How2, a multimodal collection of instruction...
research
11/09/2017

Learning Multi-Modal Word Representation Grounded in Visual Context

Representing the semantics of words is a long-standing problem for the n...
research
04/15/2020

lamBERT: Language and Action Learning Using Multimodal BERT

Recently, the bidirectional encoder representations from transformers (B...
research
05/25/2017

Max-Cosine Matching Based Neural Models for Recognizing Textual Entailment

Recognizing textual entailment is a fundamental task in a variety of tex...

Please sign up or login with your details

Forgot password? Click here to reset