XFlow: 1D-2D Cross-modal Deep Neural Networks for Audiovisual Classification

09/02/2017
by   Cătălina Cangea, et al.
0

We propose two multimodal deep learning architectures that allow for cross-modal dataflow (XFlow) between the feature extractors, thereby extracting more interpretable features and obtaining a better representation than through unimodal learning, for the same amount of training data. These models can usefully exploit correlations between audio and visual data, which have a different dimensionality and are therefore nontrivially exchangeable. Our work improves on existing multimodal deep learning metholodogies in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections, which only transfer information between streams that process compatible data. Both cross-modal architectures outperformed their baselines (by up to 7.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

Cross-Modal Contrastive Representation Learning for Audio-to-Image Generation

Multiple modalities for certain information provide a variety of perspec...
research
04/19/2019

EmbraceNet: A robust deep learning architecture for multimodal classification

Classification using multimodal data arises in many machine learning app...
research
07/01/2023

SHARCS: Shared Concept Space for Explainable Multimodal Learning

Multimodal learning is an essential paradigm for addressing complex real...
research
08/14/2019

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Multimodal learning aims to discover the relationship between multiple m...
research
06/17/2022

Multimodal Attention-based Deep Learning for Alzheimer's Disease Diagnosis

Alzheimer's Disease (AD) is the most common neurodegenerative disorder w...
research
07/02/2023

Deep Cross-Modal Steganography Using Neural Representations

Steganography is the process of embedding secret data into another messa...
research
10/13/2020

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Modeling expressive cross-modal interactions seems crucial in multimodal...

Please sign up or login with your details

Forgot password? Click here to reset