Streaming Speech-to-Confusion Network Speech Recognition

06/02/2023
by   Denis Filimonov, et al.
2

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Language identification is critical for many downstream tasks in automat...
research
05/29/2023

Building Accurate Low Latency ASR for Streaming Voice Search

Automatic Speech Recognition (ASR) plays a crucial role in voice-based a...
research
01/27/2020

Scaling Up Online Speech Recognition Using ConvNets

We design an online end-to-end speech recognition system based on Time-D...
research
05/21/2023

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

For speech interaction, voice activity detection (VAD) is often used as ...
research
04/06/2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

As speech-enabled devices such as smartphones and smart speakers become ...
research
11/05/2019

RNN-T For Latency Controlled ASR With Improved Beam Search

Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...
research
03/22/2020

Low Latency ASR for Simultaneous Speech Translation

User studies have shown that reducing the latency of our simultaneous le...

Please sign up or login with your details

Forgot password? Click here to reset