VLGrammar: Grounded Grammar Induction of Vision and Language

03/24/2021
by   Yining Hong, et al.
0

Cognitive grammar suggests that the acquisition of language grammar is grounded within visual structures. While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure. In this work, we study grounded grammar induction of vision and language in a joint learning framework. Specifically, we present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously. We propose a novel contrastive learning framework to guide the joint learning of both modules. To provide a benchmark for the grounded grammar induction task, we collect a large-scale dataset, PartIt, which contains human-written sentences that describe part-level semantics for 3D objects. Experiments on the PartIt dataset show that VLGrammar outperforms all baselines in image grammar induction and language grammar induction. The learned VLGrammar naturally benefits related downstream tasks. Specifically, it improves the image unsupervised clustering accuracy by 30%, and performs well in image retrieval and text retrieval. Notably, the induced grammar shows superior generalizability by easily generalizing to unseen categories.

READ FULL TEXT
research
09/25/2020

Visually Grounded Compound PCFGs

Exploiting visual groundings for language understanding has recently bee...
research
11/16/2017

One Model for the Learning of Language

A major target of linguistics and cognitive science has been to understa...
research
05/26/2020

Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities

A novel approach to automated learning of syntactic rules governing natu...
research
06/15/2019

A weakly supervised sequence tagging and grammar induction approach to semantic frame slot filling

This paper describes continuing work on semantic frame slot filling for ...
research
02/23/2018

Unsupervised Grammar Induction with Depth-bounded PCFG

There has been recent interest in applying cognitively or empirically mo...
research
03/27/2022

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

Understanding realistic visual scene images together with language descr...
research
10/15/2020

Montague Grammar Induction

We propose a computational modeling framework for inducing combinatory c...

Please sign up or login with your details

Forgot password? Click here to reset