Neural Representations for Modeling Variation in English Speech

11/25/2020
by   Martijn Bartelds, et al.
0

Variation in speech is often represented and investigated using phonetic transcriptions, but transcribing speech is time-consuming and error prone. To create reliable representations of speech independent from phonetic transcriptions, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and evaluate these differences by comparing them with human native-likeness judgments. We show that Transformer-based speech representations lead to significant performance gains over the use of phonetic transcriptions, and find that feature-based use of Transformer models is most effective with one or more middle layers instead of the final layer. We also demonstrate that these neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot be represented by a set of discrete symbols used in phonetic transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2023

Mispronunciation detection using self-supervised speech representations

In recent years, self-supervised learning (SSL) models have produced pro...
research
05/31/2022

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

Our native language influences the way we perceive speech sounds, affect...
research
03/01/2023

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

The awareness for biased ASR datasets or models has increased notably in...
research
01/02/2021

What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

In recent times, BERT based transformer models have become an inseparabl...
research
06/07/2021

Weakly-supervised word-level pronunciation error detection in non-native English speech

We propose a weakly-supervised model for word-level mispronunciation det...
research
09/13/2023

Native Language Identification with Big Bird Embeddings

Native Language Identification (NLI) intends to classify an author's nat...
research
04/20/2019

Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

Self-imitating feedback is an effective and learner-friendly method for ...

Please sign up or login with your details

Forgot password? Click here to reset