HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

09/19/2019
by   Devanshu Arya, et al.
52

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations.

READ FULL TEXT

page 2

page 5

page 7

page 8

research
12/16/2021

Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization

Multimodal summarization with multimodal output (MSMO) generates a summa...
research
07/15/2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Learning multimodal representations involves integrating information fro...
research
02/26/2021

A Universal Model for Cross Modality Mapping by Relational Reasoning

With the aim of matching a pair of instances from two different modaliti...
research
11/26/2021

Geometric Multimodal Deep Learning with Multi-Scaled Graph Wavelet Convolutional Network

Multimodal data provide complementary information of a natural phenomeno...
research
07/06/2012

Multimodal similarity-preserving hashing

We introduce an efficient computational framework for hashing data belon...
research
04/20/2021

HMS: Hierarchical Modality Selection for Efficient Video Recognition

Videos are multimodal in nature. Conventional video recognition pipeline...
research
04/03/2019

Multimodal Representation Learning using Deep Multiset Canonical Correlation

We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an ex...

Please sign up or login with your details

Forgot password? Click here to reset