Accent Recognition with Hybrid Phonetic Features

05/05/2021
by   Zhan Zhang, et al.
0

The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, the frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with the language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the Accented English Speech Recognition Challenge (AESRC) 2020 dataset. The results demonstrate that our approach can obtain a 6.57 on the validation set. We also get a 7.28 test set for this competition, showing the merits of the proposed method.

READ FULL TEXT

page 3

page 4

page 8

page 11

research
12/14/2016

Recurrent Deep Stacking Networks for Speech Recognition

This paper presented our work on applying Recurrent Deep Stacking Networ...
research
09/16/2022

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

An independent, automated method of decoding and transcribing oral speec...
research
04/07/2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-leve...
research
02/28/2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Automatic recognition of disordered and elderly speech remains a highly ...
research
08/02/2023

Careful Whisper – leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

This paper presents a fully automated approach for identifying speech an...
research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...

Please sign up or login with your details

Forgot password? Click here to reset