Multi-modal Deep Analysis for Multimedia

10/11/2019
by   Wenwu Zhu, et al.
0

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

09/17/2020

Multi-modal Summarization for Video-containing Documents

Summarization of multimedia data becomes increasingly significant as it ...
06/15/2020

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data h...
05/27/2014

A Topic Model Approach to Multi-Modal Similarity

Calculating similarities between objects defined by many heterogeneous d...
07/28/2020

Families In Wild Multimedia (FIW-MM): A Multi-Modal Database for Recognizing Kinship

Recognizing kinship - a soft biometric with vast applications - in photo...
06/10/2019

Commuting Conditional GANs for Robust Multi-Modal Fusion

This paper presents a data driven approach to multi-modal fusion, where ...
03/09/2021

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...
04/08/2021

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

Optimizing the quality of result (QoR) and the quality of service (QoS) ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.