Deformation Flow Based Two-Stream Network for Lip Reading

03/12/2020
by   Jingyun Xiao, et al.
0

Lip reading is the task of recognizing the speech content by analyzing movements in the lip region when people are speaking. Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region. Specifically, we introduce a Deformation Flow Network (DFN) to learn the deformation flow between adjacent frames, which directly captures the motion information within the lip region. The learned deformation flow is then combined with the original grayscale frames with a two-stream network to perform lip reading. Different from previous two-stream networks, we make the two streams learn from each other in the learning process by introducing a bidirectional knowledge distillation loss to train the two branches jointly. Owing to the complementary cues provided by different branches, the two-stream network shows a substantial improvement over using either single branch. A thorough experimental evaluation on two large-scale lip reading benchmarks is presented with detailed analysis. The results accord with our motivation, and show that our method achieves state-of-the-art or comparable performance on these two challenging datasets.

READ FULL TEXT

page 2

page 3

page 6

research
05/08/2020

Synchronous Bidirectional Learning for Multilingual Lip Reading

Lip reading has received increasing attention in recent years. This pape...
research
08/30/2019

Multi-Grained Spatio-temporal Modeling for Lip-reading

Lip-reading aims to recognize speech content from videos via visual anal...
research
11/26/2019

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Lip reading has witnessed unparalleled development in recent years thank...
research
11/22/2022

Flow Guidance Deformable Compensation Network for Video Frame Interpolation

Motion-based video frame interpolation (VFI) methods have made remarkabl...
research
11/15/2020

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize ...
research
03/13/2020

Mutual Information Maximization for Effective Lip Reading

Lip reading has received an increasing research interest in recent years...
research
09/03/2020

Flow-edge Guided Video Completion

We present a new flow-based video completion algorithm. Previous flow co...

Please sign up or login with your details

Forgot password? Click here to reset