Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

01/19/2019
by   Dario Bertero, et al.
0

We propose an end-to-end affect recognition approach using a Convolutional Neural Network (CNN) that handles multiple languages, with applications to emotion and personality recognition from speech. We lay the foundation of a universal model that is trained on multiple languages at once. As affect is shared across all languages, we are able to leverage shared information between languages and improve the overall performance for each one. We obtained an average improvement of 12.8 with the same model trained on each language only. It is end-to-end because we directly take narrow-band raw waveforms as input. This allows us to accept as input audio recorded from any source and to avoid the overhead and information loss of feature extraction. It outperforms a similar CNN using spectrograms as input by 12.8 Analysis of the network parameters and layers activation shows that the network learns and extracts significant features in the first layer, in particular pitch, energy and contour variations. Subsequent convolutional layers instead capture language-specific representations through the analysis of supra-segmental features. Our model represents an important step for the development of a fully universal affect recognizer, able to recognize additional descriptors, such as stress, and for the future implementation into affective interactive systems.

READ FULL TEXT
research
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
research
02/22/2023

UML: A Universal Monolingual Output Layer for Multilingual ASR

Word-piece models (WPMs) are commonly used subword units in state-of-the...
research
12/01/2017

Utilizing Domain Knowledge in End-to-End Audio Processing

End-to-end neural network based approaches to audio modelling are genera...
research
02/28/2023

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

In this paper, we propose a language-universal adapter learning framewor...
research
04/17/2020

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 ...
research
09/16/2023

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

We propose a first step toward multilingual end-to-end automatic speech ...
research
07/06/2021

End-To-End Data-Dependent Routing in Multi-Path Neural Networks

Neural networks are known to give better performance with increased dept...

Please sign up or login with your details

Forgot password? Click here to reset