Embed Everything: A Method for Efficiently Co-Embedding Multi-Modal Spaces

10/09/2021
by   Sarah Di, et al.
0

Any general artificial intelligence system must be able to interpret, operate on, and produce data in a multi-modal latent space that can represent audio, imagery, text, and more. In the last decade, deep neural networks have seen remarkable success in unimodal data distributions, while transfer learning techniques have seen a massive expansion of model reuse across related domains. However, training multi-modal networks from scratch remains expensive and illusive, while heterogeneous transfer learning (HTL) techniques remain relatively underdeveloped. In this paper, we propose a novel and cost-effective HTL strategy for co-embedding multi-modal spaces. Our method avoids cost inefficiencies by preprocessing embeddings using pretrained models for all components, without passing gradients through these models. We prove the use of this system in a joint image-audio embedding task. Our method has wide-reaching applications, as successfully bridging the gap between different latent spaces could provide a framework for the promised "universal" embedding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2016

Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation

We present novel method for image-text multi-modal representation learni...
research
04/14/2021

StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

We propose a novel approach for multi-modal Image-to-image (I2I) transla...
research
04/21/2018

Multi-modal space structure: a new kind of latent correlation for multi-modal entity resolution

Multi-modal data is becoming more common than before because of big data...
research
11/30/2021

Sound-Guided Semantic Image Manipulation

The recent success of the generative model shows that leveraging the mul...
research
05/22/2022

Classification of Quasars, Galaxies, and Stars in the Mapping of the Universe Multi-modal Deep Learning

In this paper, the fourth version the Sloan Digital Sky Survey (SDSS-4),...
research
11/13/2021

Memotion Analysis through the Lens of Joint Embedding

Joint embedding (JE) is a way to encode multi-modal data into a vector s...
research
07/04/2017

Conditional generation of multi-modal data using constrained embedding space mapping

We present a conditional generative model that maps low-dimensional embe...

Please sign up or login with your details

Forgot password? Click here to reset