ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

05/15/2020
by   Mostafa M. Mohamed, et al.
0

Packet loss is a common problem in data transmission, including speech data transmission. This may affect a wide range of applications that stream audio data, like streaming applications or speech emotion recognition (SER). Packet Loss Concealment (PLC) is any technique of facing packet loss. Simple PLC baselines are 0-substitution or linear interpolation. In this paper, we present a concealment wrapper, which can be used with stacked recurrent neural cells. The concealment cell can provide a recurrent neural network (ConcealNet), that performs real-time step-wise end-to-end PLC at inference time. Additionally, extending this with an end-to-end emotion prediction neural network provides a network that performs SER from audio with lost frames, end-to-end. The proposed model is compared against the fore-mentioned baselines. Additionally, a bidirectional variant with better performance is utilised. For evaluation, we chose the public RECOLA dataset given its long audio tracks with continuous emotion labels. ConcealNet is evaluated on the reconstruction of the audio and the quality of corresponding emotions predicted after that. The proposed ConcealNet model has shown considerable improvement, for both audio reconstruction and the corresponding emotion prediction, in environments that do not have losses with long duration, even when the losses occur frequently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2020

"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

In applications that use emotion recognition via speech, frame-loss can ...
research
04/11/2022

INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge

Audio Packet Loss Concealment (PLC) is the hiding of gaps in audio strea...
research
01/04/2018

A pairwise discriminative task for speech emotion recognition

Speech emotion recognition is an important task in human-machine interac...
research
05/22/2019

Effects of Packet Loss and Jitter on VoLTE Call Quality

This work performs a preliminary, comparative analysis of the end-to-end...
research
07/14/2020

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

Networked Music Performance (NMP) is envisioned as a potential game chan...
research
06/07/2017

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Deep convolutional neural networks are being actively investigated in a ...
research
05/02/2018

OMG Emotion Challenge - ExCouple Team

The proposed model is only for the audio module. All videos in the OMG E...

Please sign up or login with your details

Forgot password? Click here to reset