SAR-Net: A End-to-End Deep Speech Accent Recognition Network

11/25/2020
by   Wei Wang, et al.
0

This paper proposes a end-to-end deep network to recognize kinds of accents under the same language, where we develop and transfer the deep architecture in speaker-recognition area to accent classification task for learning utterance-level accent representation. Compared with the individual-level feature in speaker-recognition, accent recognition throws a more challenging issue in acquiring compact group-level features for the speakers with the same accent, hence a good discriminative accent feature space is desired. Our deep framework adopts multitask-learning mechanism and mainly consists of three modules: a shared CNNs and RNNs based front-end encoder, a core accent recognition branch, and an auxiliary speech recognition branch, where we take speech spectrogram as input. More specifically, with the sequential descriptors learned from a shared encoder, the accent recognition branch first condenses all descriptors into an embedding vector, and then explores different discriminative loss functions which are popular in face recognition domain to enhance embedding discrimination. Additionally, due to the accent is a speaking-related timbre, adding speech recognition branch effectively curbs the over-fitting phenomenon in accent recognition during training. We show that our network without any data-augment preproccessings is significantly ahead of the baseline system on the accent classification track in the Accented English Speech Recognition Challenge 2020 (AESRC2020), where the state-of-the-art loss function Circle-Loss achieves the best discriminative optimization for accent representation.

READ FULL TEXT

page 1

page 3

page 8

research
10/07/2020

Domain Adversarial Neural Networks for Dysarthric Speech Recognition

Speech recognition systems have improved dramatically over the last few ...
research
04/01/2022

Filter-based Discriminative Autoencoders for Children Speech Recognition

Children speech recognition is indispensable but challenging due to the ...
research
02/19/2021

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

A key desiderata for inclusive and accessible speech recognition technol...
research
04/05/2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

End-to-end training of deep learning-based models allows for implicit le...
research
01/14/2022

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

Automatic recognition of disordered speech remains a highly challenging ...
research
05/15/2018

A Purely End-to-end System for Multi-speaker Speech Recognition

Recently, there has been growing interest in multi-speaker speech recogn...
research
05/27/2022

Contrastive Siamese Network for Semi-supervised Speech Recognition

This paper introduces contrastive siamese (c-siam) network, an architect...

Please sign up or login with your details

Forgot password? Click here to reset