DeepAI AI Chat
Log In Sign Up

Attention based end to end Speech Recognition for Voice Search in Hindi and English

by   Raviraj Joshi, et al.

We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7 models using these modifications. Overall, we report an improvement of 36.9 over the phoneme-CTC system. The paper also provides an overview of different components that can be tuned in a LAS-based system.


page 1

page 2

page 3

page 4


Attention-Based End-to-End Speech Recognition on Voice Search

Recently, there has been an increasing interest in end-to-end speech rec...

On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

The streaming automatic speech recognition (ASR) models are more popular...

EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition

We present EasyASR, a distributed machine learning platform for training...

Voice Quality and Pitch Features in Transformer-Based Speech Recognition

Jitter and shimmer measurements have shown to be carriers of voice quali...

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elega...

Hybrid Autoregressive Transducer (hat)

This paper proposes and evaluates the hybrid autoregressive transducer (...