Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

03/29/2023
by   Yujin WU, et al.
0

Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 8

page 12

page 13

page 16

research
04/08/2022

Transformer-Based Self-Supervised Learning for Emotion Recognition

In order to exploit representations of time-series signals, such as phys...
research
11/20/2020

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of...
research
10/27/2021

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Multimodal emotion recognition study is hindered by the lack of labelled...
research
05/05/2023

A vector quantized masked autoencoder for audiovisual speech emotion recognition

While fully-supervised models have been shown to be effective for audiov...
research
06/12/2023

Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

The emotion detection technology to enhance human decision-making is an ...
research
06/10/2023

TS-MoCo: Time-Series Momentum Contrast for Self-Supervised Physiological Representation Learning

Limited availability of labeled physiological data often prohibits the u...
research
06/09/2022

AttX: Attentive Cross-Connections for Fusion of Wearable Signals in Emotion Recognition

We propose cross-modal attentive connections, a new dynamic and effectiv...

Please sign up or login with your details

Forgot password? Click here to reset