DeepAI AI Chat
Log In Sign Up

A Corpus for Reasoning About Natural Language Grounded in Photographs

11/01/2018
by   Alane Suhr, et al.
2

We introduce a new dataset for joint reasoning about language and vision. The data contains 107,296 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a photograph. We present an approach for finding visually complex images and crowdsourcing linguistically diverse captions. Qualitative analysis shows the data requires complex reasoning about quantities, comparisons, and relationships between objects. Evaluation of state-of-the-art visual reasoning methods shows the data is a challenge for current methods.

READ FULL TEXT

page 1

page 3

page 4

page 15

page 16

11/29/2018

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

We study the problem of jointly reasoning about language and vision thro...
10/02/2017

Visual Reasoning with Natural Language

Natural language provides a widely accessible and expressive interface f...
04/18/2018

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g.,...
06/04/2019

How Large Are Lions? Inducing Distributions over Quantitative Attributes

Most current NLP systems have little knowledge about quantitative attrib...
03/31/2013

A cookbook of translating English to Xapi

The Xapagy cognitive architecture had been designed to perform narrative...
11/20/2018

QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships

Many natural language questions require recognizing and reasoning with q...
09/30/2020

TaxiNLI: Taking a Ride up the NLU Hill

Pre-trained Transformer-based neural architectures have consistently ach...