Multi-modal Deep Analysis for Multimedia

10/11/2019
by   Wenwu Zhu, et al.
0

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.

READ FULL TEXT
research
09/17/2020

Multi-modal Summarization for Video-containing Documents

Summarization of multimedia data becomes increasingly significant as it ...
research
06/15/2020

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data h...
research
05/27/2014

A Topic Model Approach to Multi-Modal Similarity

Calculating similarities between objects defined by many heterogeneous d...
research
06/20/2022

Explicit and implicit models in infrared and visible image fusion

Infrared and visible images, as multi-modal image pairs, show significan...
research
07/28/2020

Families In Wild Multimedia (FIW-MM): A Multi-Modal Database for Recognizing Kinship

Recognizing kinship - a soft biometric with vast applications - in photo...
research
03/25/2022

Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review

The rapid development of diagnostic technologies in healthcare is leadin...
research
08/17/2023

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

Body language (BL) refers to the non-verbal communication expressed thro...

Please sign up or login with your details

Forgot password? Click here to reset