A Review on Methods and Applications in Multimodal Deep Learning

02/18/2022
by   Jabeen Summaira, et al.
0

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the last five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth. Lastly, main issues are highlighted separately for each domain, along with their possible future research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2021

Recent Advances and Trends in Multimodal Deep Learning: A Review

Deep Learning has implemented a wide range of applications and has becom...
research
05/22/2022

Deep Learning for Visual Speech Analysis: A Survey

Visual speech, referring to the visual domain of speech, has attracted i...
research
11/10/2019

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning has revolutionized speech recognition, image recognition, ...
research
06/08/2021

What Makes Multimodal Learning Better than Single (Provably)

The world provides us with data of multiple modalities. Intuitively, mod...
research
10/16/2020

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

The focus of this survey is on the analysis of two modalities of multimo...
research
10/05/2022

Vision+X: A Survey on Multimodal Learning in the Light of Data

We are perceiving and communicating with the world in a multisensory man...
research
05/02/2023

Multimodal Neural Databases

The rise in loosely-structured data available through text, images, and ...

Please sign up or login with your details

Forgot password? Click here to reset