Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

01/15/2022
by   Yi Zhang, et al.
0

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels, which neglects the diversity of each modality and fails to capture richer semantic information for each label from different perspectives. Besides, associated relationships of modalities and labels have not been fully exploited. In this paper, we propose versaTile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label. Specifically, we design an adversarial multi-modal refinement module to sufficiently explore the commonality among different modalities and strengthen the diversity of each modality. To further exploit label-modal dependence, we devise a BERT-like cross-modal encoder to gradually fuse private and common modality representations in a granularity descent way, as well as a label-guided decoder to adaptively generate a tailored representation for each label with the guidance of label semantics. In addition, we conduct experiments on the benchmark MMER dataset CMU-MOSEI in both aligned and unaligned settings, which demonstrate the superiority of TAILOR over the state-of-the-arts. Code is available at https://github.com/kniter1/TAILOR.

READ FULL TEXT
research
11/03/2020

Robust Latent Representations via Cross-Modal Translation and Alignment

Multi-modal learning relates information across observation modalities o...
research
02/18/2022

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Humans express their emotions via facial expressions, voice intonation a...
research
04/17/2021

Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

Complex objects are usually with multiple labels, and can be represented...
research
07/20/2023

MSQNet: Actor-agnostic Action Recognition with Multi-modal Query

Existing action recognition methods are typically actor-specific due to ...
research
03/26/2023

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

Movie highlights stand out of the screenplay for efficient browsing and ...
research
07/27/2019

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

With the emergence of diverse data collection techniques, objects in rea...
research
03/09/2021

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...

Please sign up or login with your details

Forgot password? Click here to reset