LipLearner: Customizable Silent Speech Interactions on Mobile Devices

02/12/2023
by   Zixiong Su, et al.
0

Silent speech interface is a promising technology that enables private communications in natural language. However, previous approaches only support a small and inflexible vocabulary, which leads to limited expressiveness. We leverage contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort. Our model exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset. For 25-command classification, an F1-score of 0.8947 is achievable only using one shot, and its performance can be further boosted by adaptively learning from more data. This generalizability allowed us to develop a mobile silent speech interface empowered with on-device fine-tuning and visual keyword spotting. A user study demonstrated that with LipLearner, users could define their own commands with high reliability guaranteed by an online incremental learning scheme. Subjective feedback indicated that our system provides essential functionalities for customizable silent speech interactions with high usability and learnability.

READ FULL TEXT

page 5

page 6

page 8

page 10

page 11

page 12

page 13

page 15

research
04/03/2021

Few-Shot Keyword Spotting in Any Language

We introduce a few-shot transfer learning method for keyword spotting in...
research
11/30/2017

VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices

Voice input has been tremendously improving the user experience of mobil...
research
06/06/2023

Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data

Most research on hate speech detection has focused on English where a si...
research
04/08/2019

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Keyword spotting (KWS) plays a critical role in enabling speech-based us...
research
07/25/2019

Accurate and Robust Eye Contact Detection During Everyday Mobile Device Interactions

Quantification of human attention is key to several tasks in mobile huma...
research
03/04/2022

X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback

We aim to help users communicate their intent to machines using flexible...
research
11/30/2021

Deep Learning for Enhanced Scratch Input

The vibrations generated from scratching and tapping on surfaces can be ...

Please sign up or login with your details

Forgot password? Click here to reset