Grounding Symbols in Multi-Modal Instructions

06/01/2017
by   Yordan Hristov, et al.
0

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users' contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input---i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations---to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user's notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
11/13/2021

Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning

In natural language processing, most models try to learn semantic repres...
research
08/02/2023

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

While language-guided image manipulation has made remarkable progress, t...
research
02/04/2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Despite the abundance of multi-modal data, such as image-text pairs, the...
research
04/12/2018

Cross-Modal Retrieval with Implicit Concept Association

Traditional cross-modal retrieval assumes explicit association of concep...
research
06/29/2023

KITE: Keypoint-Conditioned Policies for Semantic Manipulation

While natural language offers a convenient shared interface for humans a...
research
10/07/2019

Adversarial reconstruction for Multi-modal Machine Translation

Even with the growing interest in problems at the intersection of Comput...
research
10/18/2019

Towards Learning Cross-Modal Perception-Trace Models

Representation learning is a key element of state-of-the-art deep learni...

Please sign up or login with your details

Forgot password? Click here to reset