ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

04/12/2022
by   Zi Huang, et al.
0

Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43 and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2021

Transformer-based approach towards music emotion recognition from lyrics

The task of identifying emotions from a given music track has been an ac...
research
08/22/2020

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Considerable attention has been paid for physiological signal-based emot...
research
03/31/2022

MMER: Multimodal Multi-task learning for Emotion Recognition in Spoken Utterances

Emotion Recognition (ER) aims to classify human utterances into differen...
research
09/15/2022

Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

The technical report presents our emotion recognition pipeline for high-...
research
03/12/2023

Focus on Change: Mood Prediction by Learning Emotion Changes via Spatio-Temporal Attention

While emotion and mood interchangeably used, they differ in terms of dur...
research
11/19/2020

Deep Residual Local Feature Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) is becoming a key role in global busine...
research
02/28/2018

Pop Music Highlighter: Marking the Emotion Keypoints

The goal of music highlight extraction is to get a short consecutive seg...

Please sign up or login with your details

Forgot password? Click here to reset