Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

06/18/2021
by   Vikramjit Mitra, et al.
0

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6% better domain recognition and 1.7% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2023

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Consumer speech recognition systems do not work as well for many people ...
research
02/15/2022

Nonverbal Sound Detection for Disordered Speech

Voice assistants have become an essential tool for people with various d...
research
05/10/2021

What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice

This technical report describes the methods and results of a three-week ...
research
03/17/2022

Robust and Complex Approach of Pathological Speech Signal Analysis

This paper presents a study of the approaches in the state-of-the-art in...
research
09/02/2020

Convolutional Speech Recognition with Pitch and Voice Quality Features

The effects of adding pitch and voice quality features such as jitter an...
research
09/04/2020

Silent Speech Interfaces for Speech Restoration: A Review

This review summarises the status of silent speech interface (SSI) resea...
research
03/15/2023

A large-scale multimodal dataset of human speech recognition

Nowadays, non-privacy small-scale motion detection has attracted an incr...

Please sign up or login with your details

Forgot password? Click here to reset