Unsupervised Multimodal Language Representations using Convolutional Autoencoders

10/06/2021
by   Panagiotis Koromilas, et al.
0

Multimodal Language Analysis is a demanding area of research, since it is associated with two requirements: combining different modalities and capturing temporal information. During the last years, several works have been proposed in the area, mostly centered around supervised learning in downstream tasks. In this paper we propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks. Towards this end, we map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets. Extensive experimentation on Sentiment Analysis (MOSEI) and Emotion Recognition (IEMOCAP) indicate that the learned representations can achieve near-state-of-the-art performance with just the use of a Logistic Regression algorithm for downstream classification. It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters. The proposed multimodal representation models are open-sourced and will help grow the applicability of Multimodal Language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2017

Benchmarking Multimodal Sentiment Analysis

We propose a framework for multimodal sentiment analysis and emotion rec...
research
07/11/2018

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis

Multimodal machine learning is a core research area spanning the languag...
research
08/10/2022

An Empirical Exploration of Cross-domain Alignment between Language and Electroencephalogram

Electroencephalography (EEG) and language have been widely explored inde...
research
10/22/2020

MTGAT: Multimodal Temporal Graph Attention Networks for Unaligned Human Multimodal Language Sequences

Human communication is multimodal in nature; it is through multiple moda...
research
11/13/2019

Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Multimodal language analysis often considers relationships between featu...
research
07/01/2023

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

Multimodal multitask learning has attracted an increasing interest in re...
research
11/23/2018

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Humans convey their intentions through the usage of both verbal and nonv...

Please sign up or login with your details

Forgot password? Click here to reset