Global Normalization for Streaming Speech Recognition in a Modular Framework

05/26/2022
by   Ehsan Variani, et al.
0

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2023

Globally Normalising the Transducer for Streaming Speech Recognition

The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates a...
research
04/05/2021

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

In multi-talker scenarios such as meetings and conversations, speech pro...
research
09/09/2020

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

We introduce VoiceFilter-Lite, a single-channel source separation model ...
research
06/10/2021

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

The unified streaming and non-streaming two-pass (U2) end-to-end model f...
research
11/03/2020

Dynamic latency speech recognition with asynchronous revision

In this work we propose an inference technique, asynchronous revision, t...
research
03/19/2016

Globally Normalized Transition-Based Neural Networks

We introduce a globally normalized transition-based neural network model...

Please sign up or login with your details

Forgot password? Click here to reset