MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition

06/12/2023
by   Haiyang Sun, et al.
0

Speech emotion recognition aims to identify and analyze emotional states in target speech similar to humans. Perfect emotion recognition can greatly benefit a wide range of human-machine interaction tasks. Inspired by the human process of understanding emotions, we demonstrate that compared to quantized modeling, understanding speech content from a continuous perspective, akin to human-like comprehension, enables the model to capture more comprehensive emotional information. Additionally, considering that humans adjust their perception of emotional words in textual semantic based on certain cues present in speech, we design a novel search space and search for the optimal fusion strategy for the two types of information. Experimental results further validate the significance of this perception adjustment. Building on these observations, we propose a novel framework called Multiple perspectives Fusion Architecture Search (MFAS). Specifically, we utilize continuous-based knowledge to capture speech semantic and quantization-based knowledge to learn textual semantic. Then, we search for the optimal fusion strategy for them. Experimental results demonstrate that MFAS surpasses existing models in comprehensively capturing speech emotion information and can automatically adjust fusion strategy.

READ FULL TEXT

page 2

page 5

research
03/21/2018

Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the spe...
research
03/25/2022

EmotionNAS: Two-stream Architecture Search for Speech Emotion Recognition

Speech emotion recognition (SER) is a crucial research topic in human-co...
research
08/17/2023

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Recent advancements in transformer-based speech representation models ha...
research
07/20/2023

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

This paper presents a paradigm that adapts general large-scale pretraine...
research
03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...
research
11/14/2022

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Speech emotion recognition (SER) plays a vital role in improving the int...
research
05/23/2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search

Speech Emotion Recognition (SER) is a critical enabler of emotion-aware ...

Please sign up or login with your details

Forgot password? Click here to reset