Deep Speech: Scaling up end-to-end speech recognition

12/17/2014
by   Awni Hannun, et al.
0

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0 also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...
research
04/03/2022

Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Automated Speech Recognition (ASR) is an interdisciplinary application o...
research
02/06/2019

End-to-end Anchored Speech Recognition

Voice-controlled house-hold devices, like Amazon Echo or Google Home, fa...
research
05/30/2017

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Eliminating the negative effect of non-stationary environmental noise is...
research
11/04/2019

What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis

End-to-end speech recognition systems have achieved competitive results ...
research
05/11/2017

Reducing Bias in Production Speech Models

Replacing hand-engineered pipelines with end-to-end deep learning system...
research
04/30/2021

Deformable TDNN with adaptive receptive fields for speech recognition

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based...

Please sign up or login with your details

Forgot password? Click here to reset