Comparing Trajectory and Vision Modalities for Verb Representation

03/08/2023
by   Dylan Ebert, et al.
0

Three-dimensional trajectories, or the 3D position and rotation of objects over time, have been shown to encode key aspects of verb semantics (e.g., the meanings of roll vs. slide). However, most multimodal models in NLP use 2D images as representations of the world. Given the importance of 3D space in formal models of verb semantics, we expect that these 2D images would result in impoverished representations that fail to capture nuanced differences in meaning. This paper tests this hypothesis directly in controlled experiments. We train self-supervised image and trajectory encoders, and then evaluate them on the extent to which each learns to differentiate verb concepts. Contrary to our initial expectations, we find that 2D visual modalities perform similarly well to 3D trajectories. While further work should be conducted on this question, our initial findings challenge the conventional wisdom that richer environment representations necessarily translate into better representation learning for language.

READ FULL TEXT
research
06/23/2022

Do Trajectories Encode Verb Meaning?

Distributional models learn representations of words from text, but are ...
research
07/05/2022

Pretraining on Interactions for Learning Grounded Affordance Representations

Lexical semantics and cognitive science point to affordances (i.e. the a...
research
06/27/2023

Semi-supervised Multimodal Representation Learning through a Global Workspace

Recent deep learning models can efficiently combine inputs from differen...
research
06/06/2023

Diversifying Joint Vision-Language Tokenization Learning

Building joint representations across images and text is an essential st...
research
09/25/2015

Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories

A robot operating in a real-world environment needs to perform reasoning...
research
03/06/2020

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

One of the key factors of enabling machine learning models to comprehend...
research
10/07/2022

Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

Self-supervised representation learning techniques utilize large dataset...

Please sign up or login with your details

Forgot password? Click here to reset