A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

05/29/2023
by   Fu-An Chao, et al.
0

Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships among them. In light of this, a novel hierarchical approach is proposed in this paper for multi-aspect and multi-granular APA. Specifically, we first introduce the notion of sup-phonemes to explore more subtle semantic traits of L2 speakers. Second, a depth-wise separable convolution layer is exploited to better encapsulate the local context cues at the sub-word level. Finally, we use a score-restraint attention pooling mechanism to predict the sentence-level scores and optimize the component models with a multitask learning (MTL) framework. Extensive experiments carried out on a publicly-available benchmark dataset, viz. speechocean762, demonstrate the efficacy of our approach in relation to some cutting-edge baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2022

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

As an indispensable ingredient of computer-assisted pronunciation traini...
research
11/15/2022

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Automatic pronunciation assessment is a major component of a computer-as...
research
07/27/2018

A Hierarchical Approach to Neural Context-Aware Modeling

We present a new recurrent neural network topology to enhance state-of-t...
research
10/26/2020

Improving pronunciation assessment via ordinal regression with anchored reference samples

Sentence level pronunciation assessment is important for Computer Assist...
research
04/08/2022

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT...
research
08/27/2021

DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Text-to-image synthesis refers to generating an image from a given text ...
research
12/21/2020

Image Annotation based on Deep Hierarchical Context Networks

Context modeling is one of the most fertile subfields of visual recognit...

Please sign up or login with your details

Forgot password? Click here to reset