Recent Advances and Trends in Multimodal Deep Learning: A Review

05/24/2021
by   Jabeen Summaira, et al.
0

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.

READ FULL TEXT
research
02/18/2022

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...
research
10/16/2020

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

The focus of this survey is on the analysis of two modalities of multimo...
research
07/29/2021

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like t...
research
06/09/2023

Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions

The current study focuses on systematically analyzing the recent advance...
research
06/06/2018

Rethinking Radiology: An Analysis of Different Approaches to BraTS

This paper discusses the deep learning architectures currently used for ...
research
05/11/2022

Deep Depth Completion: A Survey

Depth completion aims at predicting dense pixel-wise depth from a sparse...
research
05/14/2019

Strong and Simple Baselines for Multimodal Utterance Embeddings

Human language is a rich multimodal signal consisting of spoken words, f...

Please sign up or login with your details

Forgot password? Click here to reset