Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

by   Arun Das, et al.

Speech disorders such as stuttering disrupt the normal fluency of speech by involuntary repetitions, prolongations and blocking of sounds and syllables. In addition to these disruptions to speech fluency, most adults who stutter (AWS) also experience numerous observable secondary behaviors before, during, and after a stuttering moment, often involving the facial muscles. Recent studies have explored automatic detection of stuttering using Artificial Intelligence (AI) based algorithm from respiratory rate, audio, etc. during speech utterance. However, most methods require controlled environments and/or invasive wearable sensors, and are unable explain why a decision (fluent vs stuttered) was made. We hypothesize that pre-speech facial activity in AWS, which can be captured non-invasively, contains enough information to accurately classify the upcoming utterance as either fluent or stuttered. Towards this end, this paper proposes a novel explainable AI (XAI) assisted convolutional neural network (CNN) classifier to predict near future stuttering by learning temporal facial muscle movement patterns of AWS and explains the important facial muscles and actions involved. Statistical analyses reveal significantly high prevalence of cheek muscles (p<0.005) and lip muscles (p<0.005) to predict stuttering and shows a behavior conducive of arousal and anticipation to speak. The temporal study of these upper and lower facial muscles may facilitate early detection of stuttering, promote automated assessment of stuttering and have application in behavioral therapies by providing automatic non-invasive feedback in realtime.


page 1

page 3

page 5

page 8

page 9


Towards Trust of Explainable AI in Thyroid Nodule Diagnosis

The ability to explain the prediction of deep learning models to end-use...

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Talking head generation is to synthesize a lip-synchronized talking head...

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

We present a deep learning framework for real-time speech-driven 3D faci...

Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics

We demonstrate the utility of elementary head-motion units termed kineme...

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

Speech-driven 3D face animation poses significant challenges due to the ...

Improved Speech Reconstruction from Silent Video

Speechreading is the task of inferring phonetic information from visuall...

Automated speech-based screening of depression using deep convolutional neural networks

Early detection and treatment of depression is essential in promoting re...

Please sign up or login with your details

Forgot password? Click here to reset