WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

03/29/2022
by   BinBin Zhang, et al.
0

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit

In this paper, we present a new open source, production first and produc...
research
12/10/2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

In this paper, we present a novel two-pass approach to unify streaming a...
research
12/05/2018

End-to-end contextual speech recognition using class language models and a token passing decoder

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends a...
research
09/18/2019

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

We present Espresso, an open-source, modular, extensible end-to-end neur...
research
02/18/2022

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

End-to-end (E2E) speech recognition architectures assemble all component...
research
01/17/2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

It is difficult for an end-to-end (E2E) ASR system to recognize words su...
research
05/28/2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

Modern public ASR tools usually provide rich support for training variou...

Please sign up or login with your details

Forgot password? Click here to reset