LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

10/16/2018
by   Shuang Yang, et al.
6

Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress in some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as lipreading, which has received an increasing interest in recent years. We present a naturally-distributed large-scale benchmark for lip reading in the wild, named LRW-1000, which contains 1000 classes with about 745,187 samples from more than 2000 individual speakers. Each class corresponds to the syllables of a Mandarin word which is composed of one or several Chinese characters. To the best of our knowledge, it is the largest word-level lipreading dataset and also the only public large-scale Mandarin lipreading dataset. This dataset aims at covering a "natural" variability over different speech modes and imaging conditions to incorporate challenges encountered in practical applications. This benchmark shows a large variation over several aspects, including the number of samples in each class, resolution of videos, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up. Besides a detailed description of the dataset and its collection pipeline, we evaluate the popular lipreading methods and perform a thorough analysis of the results from several aspects. The results demonstrate the consistency and challenges of our dataset, which may open up some new promising directions for future work. The dataset and corresponding codes will be available on the web for research use.

READ FULL TEXT

page 1

page 2

page 4

page 6

research
10/14/2021

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can re...
research
04/08/2023

Word-level Persian Lipreading Dataset

Lip-reading has made impressive progress in recent years, driven by adva...
research
01/16/2023

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Inspired by humans comprehending speech in a multi-modal manner, various...
research
10/15/2021

Advances and Challenges in Deep Lip Reading

Driven by deep learning techniques and large-scale datasets, recent year...
research
11/16/2016

Lip Reading Sentences in the Wild

The goal of this work is to recognise phrases and sentences being spoken...
research
11/15/2020

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize ...
research
04/15/2019

Synthesising 3D Facial Motion from "In-the-Wild" Speech

Synthesising 3D facial motion from speech is a crucial problem manifesti...

Please sign up or login with your details

Forgot password? Click here to reset