Speaker-independent classification of phonetic segments from raw ultrasound in child speech

07/01/2019
by   Manuel Sam Ribeiro, et al.
3

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production. UTI is increasingly being used for speech therapy, making it important to develop automatic methods to assist various time-consuming manual tasks currently performed by speech therapists. A key challenge is to generalize the automatic processing of ultrasound tongue images to previously unseen speakers. In this work, we investigate the classification of phonetic segments (tongue shapes) from raw ultrasound recordings under several training scenarios: speaker-dependent, multi-speaker, speaker-independent, and speaker-adapted. We observe that models underperform when applied to data from speakers not seen at training time. However, when provided with minimal additional speaker information, such as the mean ultrasound frame, the models generalize better to unseen speakers.

READ FULL TEXT

page 2

page 3

research
06/08/2021

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Articulatory-to-acoustic mapping seeks to reconstruct speech from a reco...
research
07/01/2019

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

We investigate the automatic processing of child speech therapy sessions...
research
06/28/2023

Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound

Automatic Speaker Recognition Systems (SRSs) have been widely used in vo...
research
05/30/2023

Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Thanks to the latest deep learning algorithms, silent speech interfaces ...
research
08/05/2022

Chronological Self-Training for Real-Time Speaker Diarization

Diarization partitions an audio stream into segments based on the voices...
research
07/01/2019

Synchronising audio and ultrasound by learning cross-modal embeddings

Audiovisual synchronisation is the task of determining the time offset b...
research
08/06/2020

Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

In speech production research, different imaging modalities have been em...

Please sign up or login with your details

Forgot password? Click here to reset