Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

09/07/2022
by   Paul Pu Liang, et al.
7

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.

READ FULL TEXT
research
07/29/2021

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like t...
research
09/18/2021

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Multimodal classification research has been gaining popularity in many d...
research
07/15/2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Learning multimodal representations involves integrating information fro...
research
12/31/2019

Intrinsic motivations and open-ended learning

There is a growing interest and literature on intrinsic motivations and ...
research
11/29/2022

Multimodal Learning for Multi-Omics: A Survey

With advanced imaging, sequencing, and profiling technologies, multiple ...
research
01/15/2023

AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry

In the insurance industry detecting fraudulent claims is a critical task...
research
08/30/2018

MMDF2018 Workshop Report

Driven by the recent advances in smart, miniaturized, and mass produced ...

Please sign up or login with your details

Forgot password? Click here to reset