A large-scale multimodal dataset of human speech recognition

03/15/2023
by   Yao Ge, et al.
0

Nowadays, non-privacy small-scale motion detection has attracted an increasing amount of research in remote sensing in speech recognition. These new modalities are employed to enhance and restore speech information from speakers of multiple types of data. In this paper, we propose a dataset contains 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77-GHz frequency modulated continuous wave (FMCW) data from millimetre wave (mmWave) radar, and laser data. Meanwhile, a depth camera is adopted to record the landmarks of the subject's lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words and 16 sentences. The dataset has been validated and has potential for the research of lip reading and multimodal speech recognition.

READ FULL TEXT

page 1

page 2

page 3

page 10

page 11

research
01/20/2021

VOTE400(Voide Of The Elderly 400 Hours): A Speech Dataset to Study Voice Interface for Elderly-Care

This paper introduces a large-scale Korean speech dataset, called VOTE40...
research
08/07/2020

CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

This paper describes the design and development of CUCHILD, a large-scal...
research
02/06/2020

Continuous Silent Speech Recognition using EEG

In this paper we explore continuous silent speech recognition using elec...
research
04/09/2018

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Describes an audio dataset of spoken words designed to help train and ev...
research
02/06/2020

Towards Mind Reading

In this paper we explore mind reading or continuous silent speech recogn...
research
06/18/2021

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Dysfluencies and variations in speech pronunciation can severely degrade...
research
12/05/2020

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

We present SpeakingFaces as a publicly-available large-scale multimodal ...

Please sign up or login with your details

Forgot password? Click here to reset