Visually Grounded Compound PCFGs

09/25/2020
by   Yanpeng Zhao, et al.
0

Exploiting visual groundings for language understanding has recently been drawing much attention. In this work, we study visually grounded grammar induction and learn a constituency parser from both unlabeled text and its visual groundings. Existing work on this task (Shi et al., 2019) optimizes a parser via Reinforce and derives the learning signal only from the alignment of images and sentences. While their model is relatively accurate overall, its error distribution is very uneven, with low performance on certain constituents types (e.g., 26.2 recall on noun phrases, NPs). This is not surprising as the learning signal is likely insufficient for deriving all aspects of phrase-structure syntax and gradient estimates are noisy. We show that using an extension of probabilistic context-free grammar model we can do fully-differentiable end-to-end visually grounded learning. Additionally, this enables us to complement the image-text alignment loss with a language modeling objective. On the MSCOCO test captions, our model establishes a new state of the art, outperforming its non-grounded version and, thus, confirming the effectiveness of visual groundings in constituency grammar induction. It also substantially outperforms the previous grounded model, with largest improvements on more `abstract' categories (e.g., +55.1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

VLGrammar: Grounded Grammar Induction of Vision and Language

Cognitive grammar suggests that the acquisition of language grammar is g...
research
04/09/2021

Video-aided Unsupervised Grammar Induction

We investigate video-aided grammar induction, which learns a constituenc...
research
10/08/2020

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Given a simple request (e.g., Put a washed apple in the kitchen fridge),...
research
05/04/2020

What is Learned in Visually Grounded Neural Syntax Acquisition

Visual features are a promising signal for learning bootstrap textual mo...
research
02/23/2018

Unsupervised Grammar Induction with Depth-bounded PCFG

There has been recent interest in applying cognitively or empirically mo...
research
06/04/2019

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

Common language models typically predict the next word given the context...
research
05/30/2017

Generative Models of Visually Grounded Imagination

It is easy for people to imagine what a man with pink hair looks like, e...

Please sign up or login with your details

Forgot password? Click here to reset