Automatic audiovisual synchronisation for ultrasound tongue imaging

05/31/2021
by   Aciel Eshky, et al.
0

Ultrasound tongue imaging is used to visualise the intra-oral articulators during speech production. It is utilised in a range of applications, including speech and language therapy and phonetics research. Ultrasound and speech audio are recorded simultaneously, and in order to correctly use this data, the two modalities should be correctly synchronised. Synchronisation is achieved using specialised hardware at recording time, but this approach can fail in practice resulting in data of limited usability. In this paper, we address the problem of automatically synchronising ultrasound and audio after data collection. We first investigate the tolerance of expert ultrasound users to synchronisation errors in order to find the thresholds for error detection. We use these thresholds to define accuracy scoring boundaries for evaluating our system. We then describe our approach for automatic synchronisation, which is driven by a self-supervised neural network, exploiting the correlation between the two signals to synchronise them. We train our model on data from multiple domains with different speaker characteristics, different equipment, and different recording environments, and achieve an accuracy >92.4 data. Finally, we introduce a novel resource, the Cleft dataset, which we gathered with a new clinical subgroup and for which hardware synchronisation proved unreliable. We apply our model to this out-of-domain data, and evaluate its performance subjectively with expert users. Results show that users prefer our model's output over the original hardware output 79.3 results demonstrate the strength of our approach and its ability to generalise to data from new domains.

READ FULL TEXT

page 2

page 8

page 9

page 11

page 15

research
07/01/2019

Synchronising audio and ultrasound by learning cross-modal embeddings

Audiovisual synchronisation is the task of determining the time offset b...
research
02/27/2021

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

Speech sound disorders are a common communication impairment in childhoo...
research
11/19/2020

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of a...
research
05/22/2023

Towards Ultrasound Tongue Image prediction from EEG during speech production

Previous initial research has already been carried out to propose speech...
research
06/20/2021

Improving Ultrasound Tongue Image Reconstruction from Lip Images Using Self-supervised Learning and Attention Mechanism

Speech production is a dynamic procedure, which involved multi human org...
research
08/06/2020

Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

In speech production research, different imaging modalities have been em...
research
05/30/2023

Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Thanks to the latest deep learning algorithms, silent speech interfaces ...

Please sign up or login with your details

Forgot password? Click here to reset