Visual gesture variability between talkers in continuous visual speech

10/03/2017
by   Helen L Bear, et al.
0

Recent adoption of deep learning methods to the field of machine lipreading research gives us two options to pursue to improve system performance. Either, we develop end-to-end systems holistically or, we experiment to further our understanding of the visual speech signal. The latter option is more difficult but this knowledge would enable researchers to both improve systems and apply the new knowledge to other domains such as speech therapy. One challenge in lipreading systems is the correct labeling of the classifiers. These labels map an estimated function between visemes on the lips and the phonemes uttered. Here we ask if such maps are speaker-dependent? Prior work investigated isolated word recognition from speaker-dependent (SD) visemes, we extend this to continuous speech. Benchmarked against SD results, and the isolated words performance, we test with RMAV dataset speakers and observe that with continuous speech, the trajectory between visemes has a greater negative effect on the speaker differentiation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2017

Decoding visemes: improving machine lipreading (PhD thesis)

Machine lipreading (MLR) is speech recognition from visual cues and a ni...
research
10/03/2017

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

In machine lip-reading, which is identification of speech from visual-on...
research
08/11/2020

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

Recently, an end-to-end (E2E) speaker-attributed automatic speech recogn...
research
10/03/2017

Understanding the visual speech signal

For machines to lipread, or understand speech from lip movement, they de...
research
10/03/2017

Finding phonemes: improving machine lip-reading

In machine lip-reading there is continued debate and research around the...
research
10/24/2018

The speaker-independent lipreading play-off; a survey of lipreading machines

Lipreading is a difficult gesture classification task. One problem in co...
research
06/21/2019

Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

This paper describes a new unsupervised machine learning method for simu...

Please sign up or login with your details

Forgot password? Click here to reset