Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data

12/26/2021
by   Luis Sanchez Tapia, et al.
0

Speech recognition is very challenging in student learning environments that are characterized by significant cross-talk and background noise. To address this problem, we present a bilingual speech recognition system that uses an interactive video analysis system to estimate the 3D speaker geometry for realistic audio simulations. We demonstrate the use of our system in generating a complex audio dataset that contains significant cross-talk and background noise that approximate real-life classroom recordings. We then test our proposed system with real-life recordings. In terms of the distance of the speakers from the microphone, our interactive video analysis system obtained a better average error rate of 10.83 to 33.12 27.92 terms of 9 important keywords, our approach gave an average sensitivity of 38 compared to 24 average specificity of 90 On average, sensitivity improved from 24 On the other hand, specificity remained high for both methods (90

READ FULL TEXT

page 2

page 4

page 5

page 8

research
05/09/2019

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Significant performance degradation of automatic speech recognition (ASR...
research
02/21/2022

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Audio recordings of collaborative learning environments contain a consta...
research
10/18/2019

Indian EmoSpeech Command Dataset: A dataset for emotion based speech recognition in the wild

Speech emotion analysis is an important task which further enables sever...
research
01/24/2021

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with cl...
research
11/16/2019

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System

N-HANS is a Python toolkit for in-the-wild audio enhancement, including ...
research
09/17/2019

DOVER: A Method for Combining Diarization Outputs

Speech recognition and other natural language tasks have long benefited ...
research
04/03/2021

Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

United States Courts make audio recordings of oral arguments available a...

Please sign up or login with your details

Forgot password? Click here to reset