Deep Multi-Modal Sets

03/03/2020
by   Austin Reiter, et al.
27

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby multiple feature types are encoded and concatenated and then a multi layer perceptron (MLP) combines the fused embedding to make predictions. This has several limitations, such as an unnatural enforcement that all features be present at all times as well as constraining only a constant number of occurrences of a feature modality at any given time. Furthermore, as more modalities are added, the concatenated embedding grows. To mitigate this, we propose Deep Multi-Modal Sets: a technique that represents a collection of features as an unordered set rather than one long ever-growing fixed-size vector. The set is constructed so that we have invariance both to permutations of the feature modalities as well as to the cardinality of the set. We will also show that with particular choices in our model architecture, we can yield interpretable feature performance such that during inference time we can observe which modalities are most contributing to the prediction.With this in mind, we demonstrate a scalable, multi-modal framework that reasons over different modalities to learn various types of tasks. We demonstrate new state-of-the-art performance on two multi-modal datasets (Ads-Parallelity [34] and MM-IMDb [1]).

READ FULL TEXT

page 7

page 8

research
03/08/2021

Self-Augmented Multi-Modal Feature Embedding

Oftentimes, patterns can be represented through different modalities. Fo...
research
04/30/2022

SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities

As deep learning advances, there is an ever-growing demand for models ca...
research
07/16/2020

Memory Based Attentive Fusion

The use of multi-modal data for deep machine learning has shown promise ...
research
04/23/2019

Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods

We consider the problem of 3D shape reconstruction from multi-modal data...
research
03/16/2023

Multi-modal Differentiable Unsupervised Feature Selection

Multi-modal high throughput biological data presents a great scientific ...
research
06/15/2020

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data h...
research
06/24/2022

Multi-modal Sensor Data Fusion for In-situ Classification of Animal Behavior Using Accelerometry and GNSS Data

We examine using data from multiple sensing modes, i.e., accelerometry a...

Please sign up or login with your details

Forgot password? Click here to reset