SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

03/03/2023
by   Naoki Kimura, et al.
0

The availability of digital devices operated by voice is expanding rapidly. However, the applications of voice interfaces are still restricted. For example, speaking in public places becomes an annoyance to the surrounding people, and secret information should not be uttered. Environmental noise may reduce the accuracy of speech recognition. To address these limitations, a system to detect a user's unvoiced utterance is proposed. From internal information observed by an ultrasonic imaging sensor attached to the underside of the jaw, our proposed system recognizes the utterance contents without the user's uttering voice. Our proposed deep neural network model is used to obtain acoustic features from a sequence of ultrasound images. We confirmed that audio signals generated by our system can control the existing smart speakers. We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

research
05/28/2021

Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

Voice Activity Detection (VAD) is not easy task when the input audio sig...
research
11/19/2020

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of a...
research
07/12/2022

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

In this paper, we propose NEC (Neural Enhanced Cancellation), a defense ...
research
02/27/2021

Silent versus modal multi-speaker speech recognition from ultrasound and video

We investigate multi-speaker speech recognition from ultrasound images o...
research
03/20/2020

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script ...
research
01/12/2021

Practical Speech Re-use Prevention in Voice-driven Services

Voice-driven services (VDS) are being used in a variety of applications ...
research
08/12/2021

Deep Neural Network Voice Activity Detector for Downsampled Audio Data: An Experiment Report

Sociometric badges are an emerging technology for study how teams intera...

Please sign up or login with your details

Forgot password? Click here to reset