The JHU ASR System for VOiCES from a Distance Challenge 2019

01/20/2021
by   Yiming Wang, et al.
0

This paper describes the system developed by the JHU team for automatic speech recognition (ASR) of the VOiCES from a Distance Challenge 2019, focusing on single channel distant/farfield audio under noisy conditions. We participated in the Fixed Condition track, where the systems are only trained on an 80-hour subset of the Librispeech corpus provided by the organizer. The training data was first augmented with both background noises and simulated reverberation. We then trained factorized TDNN acoustic models that differed only in their use of i-vectors for adaptation. Both systems utilized RNN language models trained on original and reversed text for rescoring. We submitted three systems: the system using i-vectors with WER 19.4% on the development set, the system without i-vectors that achieved WER 19.0%, and the their lattice-level fusion with WER 17.8%. On the evaluation set, our best system achieves 23.9% WER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2019

The VOiCES from a Distance Challenge 2019 Evaluation Plan

The "VOiCES from a Distance Challenge 2019" is designed to foster resear...
research
11/04/2020

Frustratingly Easy Noise-aware Training of Acoustic Models

Environmental noises and reverberation have a detrimental effect on the ...
research
08/04/2020

"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)

We describe the speech activity detection (SAD), speaker diarization (SD...
research
03/27/2018

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

This paper describes a new baseline system for automatic speech recognit...
research
04/12/2019

STC Speaker Recognition Systems for the VOiCES From a Distance Challenge

This paper presents the Speech Technology Center (STC) speaker recogniti...
research
07/29/2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

This paper proposes a new approach to perform unsupervised fine-tuning a...
research
05/06/2017

A Generative Model of a Pronunciation Lexicon for Hindi

Voice browser applications in Text-to- Speech (TTS) and Automatic Speech...

Please sign up or login with your details

Forgot password? Click here to reset