Transformer Based Deliberation for Two-Pass Speech Recognition

by   Ke Hu, et al.

Interactive speech recognition systems must generate words quickly while also producing accurate results. Two-pass models excel at these requirements by employing a first-pass decoder that quickly emits words, and a second-pass decoder that requires more context but is more accurate. Previous work has established that a deliberation network can be an effective second-pass model. The model attends to two kinds of inputs at once: encoded audio frames and the hypothesis text from the first-pass model. In this work, we explore using transformer layers instead of long-short term memory (LSTM) layers for deliberation rescoring. In transformer layers, we generalize the "encoder-decoder" attention to attend to both encoded audio and first-pass text hypotheses. The output context vectors are then combined by a merger layer. Compared to LSTM-based deliberation, our best transformer deliberation achieves 7 computation. We also compare against non-deliberation transformer rescoring, and find a 9


page 1

page 2

page 3

page 4


Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Recent advances of end-to-end models have outperformed conventional mode...

Deliberation Model Based Two-Pass End-to-End Speech Recognition

End-to-end (E2E) models have made rapid progress in automatic speech rec...

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Hybrid and end-to-end (E2E) systems have their individual advantages, wi...

On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

The streaming automatic speech recognition (ASR) models are more popular...

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...

Efficient Segmental Cascades for Speech Recognition

Discriminative segmental models offer a way to incorporate flexible feat...