Automatic Dialect Density Estimation for African American English

04/03/2022
by   Alexander Johnson, et al.
5

In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We investigate several acoustic and language modeling features, including the commonly used X-vector representation and ComParE feature set, in addition to information extracted from ASR transcripts of the audio files and prosodic information. To address issues of limited labeled data, we use a weakly supervised model to project prosodic and X-vector features into low-dimensional task-relevant representations. An XGBoost model is then used to predict the speaker's dialect density from these features and show which are most significant during inference. We evaluate the utility of these features both alone and in combination for the given task. This work, which does not rely on hand-labeled transcripts, is performed on audio segments from the CORAAL database. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database and propose this work as a tool for explaining and mitigating bias in speech technology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with super...
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
research
12/03/2019

Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

We propose a novel approach to semi-supervised automatic speech recognit...
research
07/04/2018

Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

Automatic pronunciation evaluation plays an important role in pronunciat...
research
12/11/2020

DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

Recent success in speech representation learning enables a new way to le...
research
11/15/2014

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

In this paper, we defined the viseme (visual speech element) and describ...
research
08/03/2017

Estimating speech from lip dynamics

The goal of this project is to develop a limited lip reading algorithm f...

Please sign up or login with your details

Forgot password? Click here to reset