Log In Sign Up

"Show me the cup": Reference with Continuous Representations

by   Gemma Boleda, et al.

One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties.


page 1

page 3

page 5


Communicating Semantics: Reference by Description

Messages often refer to entities such as people, places and events. Corr...

Living a discrete life in a continuous world: Reference with distributed representations

Reference is a crucial property of language that allows us to connect li...

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

This is the reproducibility report for the paper "Learning To Count Obje...

Characterization of Visual Object Representations in Rat Primary Visual Cortex

For most animal species, quick and reliable identification of visual obj...

Language Grounding with 3D Objects

Seemingly simple natural language requests to a robot are generally unde...

Resolving References to Objects in Photographs using the Words-As-Classifiers Model

A common use of language is to refer to visually present objects. Modell...

Overestimation learning with guarantees

We describe a complete method that learns a neural network which is guar...

Code Repositories


From sense to reference

view repo