Deliberation Model Based Two-Pass End-to-End Speech Recognition

03/17/2020
by   Ke Hu, et al.
0

End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR) and perform competitively relative to conventional models. To further improve the quality, a two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model while maintaining a reasonable latency. The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses. In this work, we propose to attend to both acoustics and first-pass hypotheses using a deliberation network. A bidirectional encoder is used to extract context information from first-pass hypotheses. The proposed deliberation model achieves 12 compared to LAS rescoring in Google Voice Search (VS) tasks, and 23 on a proper noun test set. Compared to a large conventional model, our best model performs 21 complexity, the deliberation decoder has a larger size than the LAS decoder, and hence requires more computations in second-pass decoding.

READ FULL TEXT
research
12/10/2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

In this paper, we present a novel two-pass approach to unify streaming a...
research
03/31/2023

Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

In end-to-end (E2E) speech recognition models, a representational tight-...
research
10/10/2021

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Hybrid and end-to-end (E2E) systems have their individual advantages, wi...
research
08/30/2020

Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Recent advances of end-to-end models have outperformed conventional mode...
research
01/27/2021

Transformer Based Deliberation for Two-Pass Speech Recognition

Interactive speech recognition systems must generate words quickly while...
research
07/27/2018

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elega...
research
11/15/2021

Attention based end to end Speech Recognition for Voice Search in Hindi and English

We describe here our work with automatic speech recognition (ASR) in the...

Please sign up or login with your details

Forgot password? Click here to reset