Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

12/08/2015
by   Dario Amodei, et al.
0

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2014

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...
research
04/03/2022

Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Automated Speech Recognition (ASR) is an interdisciplinary application o...
research
12/07/2022

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

End-to-end text-to-speech (TTS) systems have been developed for European...
research
06/24/2019

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Automatic syllable count estimation (SCE) is used in a variety of applic...
research
05/11/2017

Reducing Bias in Production Speech Models

Replacing hand-engineered pipelines with end-to-end deep learning system...
research
09/11/2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...
research
11/06/2017

Improved training for online end-to-end speech recognition systems

Achieving high accuracy with end-to-end speech recognizers requires care...

Please sign up or login with your details

Forgot password? Click here to reset