Multi-modal Deep Analysis for Multimedia

by   Wenwu Zhu, et al.

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.



There are no comments yet.



Multi-modal Summarization for Video-containing Documents

Summarization of multimedia data becomes increasingly significant as it ...

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data h...

A Topic Model Approach to Multi-Modal Similarity

Calculating similarities between objects defined by many heterogeneous d...

Families In Wild Multimedia (FIW-MM): A Multi-Modal Database for Recognizing Kinship

Recognizing kinship - a soft biometric with vast applications - in photo...

Commuting Conditional GANs for Robust Multi-Modal Fusion

This paper presents a data driven approach to multi-modal fusion, where ...

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

Optimizing the quality of result (QoR) and the quality of service (QoS) ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.