New Ideas and Trends in Deep Multimodal Content Understanding: A Review

10/16/2020
by   Wei Chen, et al.
0

The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research.

READ FULL TEXT

page 4

page 13

page 14

page 17

page 21

page 24

page 25

page 33

research
05/24/2021

Recent Advances and Trends in Multimodal Deep Learning: A Review

Deep Learning has implemented a wide range of applications and has becom...
research
06/20/2019

Understanding, Categorizing and Predicting Semantic Image-Text Relations

Two modalities are often used to convey information in a complementary a...
research
11/10/2019

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning has revolutionized speech recognition, image recognition, ...
research
07/14/2023

A scoping review on multimodal deep learning in biomedical images and texts

Computer-assisted diagnostic and prognostic systems of the future should...
research
11/04/2022

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

As multimodal learning finds applications in a wide variety of high-stak...
research
02/18/2022

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...
research
05/17/2021

A Review on Explainability in Multimodal Deep Neural Nets

Artificial Intelligence techniques powered by deep neural nets have achi...

Please sign up or login with your details

Forgot password? Click here to reset