Multimodal Deep Learning

01/12/2023
by   Cem Akkus, et al.
0

This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.

READ FULL TEXT
research
06/04/2020

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality...
research
03/12/2023

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

This paper proposes a unified diffusion framework (dubbed UniDiffuser) t...
research
03/02/2022

HighMMT: Towards Modality and Task Generalization for High-Modality Representation Learning

Learning multimodal representations involves discovering correspondences...
research
11/14/2022

PMR: Prototypical Modal Rebalance for Multimodal Learning

Multimodal learning (MML) aims to jointly exploit the common priors of d...
research
12/23/2020

Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent Representations

Multi-modal generative models represent an important family of deep mode...
research
04/05/2023

Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping

Jointly harnessing complementary features of multi-modal input data in a...
research
06/25/2023

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

The ability to model intra-modal and inter-modal interactions is fundame...

Please sign up or login with your details

Forgot password? Click here to reset