Attention based end to end Speech Recognition for Voice Search in Hindi and English

11/15/2021

∙

We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7 models using these modifications. Overall, we report an improvement of 36.9 over the phoneme-CTC system. The paper also provides an overview of different components that can be tuned in a LAS-based system.

READ FULL TEXT

Attention based end to end Speech Recognition for Voice Search in Hindi and English

Sign in with Google

Consider DeepAI Pro