Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

05/30/2023
by   László Tóth, et al.
0

Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting the recording equipment. To aid quick speaker and session adaptation of ultrasound tongue imaging-based SSI models, we extend our deep networks with a spatial transformer network (STN) module, capable of performing an affine transformation on the input images. Although the STN part takes up only about 10 module might allow to reduce MSE by 88 the whole network. The improvement is even larger (around 92 the network to different recording sessions from the same speaker.

READ FULL TEXT

page 2

page 4

research
06/08/2021

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Articulatory-to-acoustic mapping seeks to reconstruct speech from a reco...
research
05/28/2021

Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

Voice Activity Detection (VAD) is not easy task when the input audio sig...
research
07/01/2019

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

Ultrasound tongue imaging (UTI) provides a convenient way to visualize t...
research
07/26/2021

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

For articulatory-to-acoustic mapping, typically only limited parallel tr...
research
05/31/2021

Automatic audiovisual synchronisation for ultrasound tongue imaging

Ultrasound tongue imaging is used to visualise the intra-oral articulato...
research
08/06/2020

Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

In speech production research, different imaging modalities have been em...

Please sign up or login with your details

Forgot password? Click here to reset