What is Multimodality?

03/10/2021
by   Letitia Parcalabescu, et al.
0

The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2018

Multimodal Grounding for Language Processing

This survey discusses how recent developments in multimodal processing f...
research
01/29/2023

Global Flood Prediction: a Multimodal Machine Learning Approach

Flooding is one of the most destructive and costly natural disasters, an...
research
06/10/2023

Modality Influence in Multimodal Machine Learning

Multimodal Machine Learning has emerged as a prominent research directio...
research
08/07/2019

Recent Trends in Deep Learning Based Personality Detection

In the recent times, automatic detection of human personality traits has...
research
10/05/2022

Vision+X: A Survey on Multimodal Learning in the Light of Data

We are perceiving and communicating with the world in a multisensory man...
research
09/18/2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

This paper presents a comprehensive survey of the taxonomy and evolution...
research
07/18/2023

Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature

Scholarly articles in mathematical fields feature mathematical statement...

Please sign up or login with your details

Forgot password? Click here to reset