DeepAI AI Chat
Log In Sign Up

Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition

09/07/2020
by   Gizem Soğancıoğlu, et al.
0

Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set performances of various models, we also propose alternative training and decision fusion strategies to better estimate and improve the generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/05/2020

Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

This paper aims to bring a new lightweight yet powerful solution for the...
09/23/2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Deep learning has emerged as a powerful alternative to hand-crafted meth...
03/02/2021

Investigations on Audiovisual Emotion Recognition in Noisy Conditions

In this paper we explore audiovisual emotion recognition under noisy aco...
04/18/2018

Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in Affective Computing

In this work, we investigate a recently proposed regularization techniqu...
04/07/2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-leve...