Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

03/03/2023
by   Vikramjit Mitra, et al.
0

Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the use of pre-trained model representations for estimating dimensional emotions, such as activation, valence, and dominance, from speech. We observed that while valence may rely heavily on lexical representations, activation and dominance rely mostly on acoustic information. In this work, we used multi-modal fusion representations from pre-trained models to generate state-of-the-art speech emotion estimation, and we showed a 100 on valence estimation compared to standard acoustic and lexical baselines. Finally, we investigated the robustness of pre-trained model representations against noise and reverberation degradation and noticed that lexical and acoustic representations are impacted differently. We discovered that lexical representations are more robust to distortions compared to acoustic representations, and demonstrated that knowledge distillation from a multi-modal model helps to improve the noise-robustness of acoustic-based models.

READ FULL TEXT
research
07/02/2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

Estimating dimensional emotions, such as activation, valence and dominan...
research
05/26/2023

Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation

In end-to-end speech translation, speech and text pre-trained models imp...
research
07/16/2020

Translate Reverberated Speech to Anechoic Ones: Speech Dereverberation with BERT

Single channel speech dereverberation is considered in this work. Inspir...
research
07/23/2023

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

Numerous examples in the literature proved that deep learning models hav...
research
02/28/2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Automatic recognition of disordered and elderly speech remains a highly ...
research
10/25/2020

Probing Acoustic Representations for Phonetic Properties

Pre-trained acoustic representations such as wav2vec and DeCoAR have att...
research
12/17/2020

COVID-19 Emotion Monitoring as a Tool to Increase Preparedness for Disease Outbreaks in Developing Regions

The COVID-19 pandemic brought many challenges, from hospital-occupation ...

Please sign up or login with your details

Forgot password? Click here to reset