CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

05/27/2020
by   Keyu An, et al.
0

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

CAT: CRF-based ASR Toolkit

In this paper, we present a new open source toolkit for automatic speech...
research
05/20/2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

We present PyChain, a fully parallelized PyTorch implementation of end-t...
research
02/02/2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit

In this paper, we present a new open source, production first and produc...
research
07/07/2021

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

Automatic speech recognition systems have been largely improved in the p...
research
01/14/2020

Improved Robust ASR for Social Robots in Public Spaces

Social robots deployed in public spaces present a challenging task for A...
research
04/30/2021

Deformable TDNN with adaptive receptive fields for speech recognition

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based...
research
01/14/2022

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

In this study, we present recent developments of models trained with the...

Please sign up or login with your details

Forgot password? Click here to reset