Log In Sign Up

METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams

by   Amila Silva, et al.

Many learning tasks involve multi-modal data streams, where continuous data from different modes convey a comprehensive description about objects. A major challenge in this context is how to efficiently interpret multi-modal information in complex environments. This has motivated numerous studies on learning unsupervised representations from multi-modal data streams. These studies aim to understand higher-level contextual information (e.g., a Twitter message) by jointly learning embeddings for the lower-level semantic units in different modalities (e.g., text, user, and location of a Twitter message). However, these methods directly associate each low-level semantic unit with a continuous embedding vector, which results in high memory requirements. Hence, deploying and continuously learning such models in low-memory devices (e.g., mobile devices) becomes a problem. To address this problem, we present METEOR, a novel MEmory and Time Efficient Online Representation learning technique, which: (1) learns compact representations for multi-modal data by sharing parameters within semantically meaningful groups and preserves the domain-agnostic semantics; (2) can be accelerated using parallel processes to accommodate different stream rates while capturing the temporal changes of the units; and (3) can be easily extended to capture implicit/explicit external knowledge related to multi-modal data streams. We evaluate METEOR using two types of multi-modal data streams (i.e., social media streams and shopping transaction streams) to demonstrate its ability to adapt to different domains. Our results show that METEOR preserves the quality of the representations while reducing memory usage by around 80 memory-intensive embeddings.


page 1

page 2

page 3

page 4


Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation

We present novel method for image-text multi-modal representation learni...

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

While many approaches exist in the literature to learn representations f...

Multi-modal gated recurrent units for image description

Using a natural language sentence to describe the content of an image is...

Logographic Information Aids Learning Better Representations for Natural Language Inference

Statistical language models conventionally implement representation lear...

Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

When creating 3D content, highly specialized skills are generally needed...

Bharatanatyam Dance Transcription using Multimedia Ontology and Machine Learning

Indian Classical Dance is an over 5000 years' old multi-modal language f...

Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

Robot localization remains a challenging task in GPS denied environments...