FunASR: A Fundamental End-to-End Speech Recognition Toolkit

05/18/2023
by   Zhifu Gao, et al.
0

This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2020

KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition

We present KoSpeech, an open-source software, which is modular and exten...
research
02/02/2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit

In this paper, we present a new open source, production first and produc...
research
10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...
research
12/18/2018

wav2letter++: The Fastest Open-source Speech Recognition System

This paper introduces wav2letter++, the fastest open-source deep learnin...
research
07/31/2017

Familia: An Open-Source Toolkit for Industrial Topic Modeling

Familia is an open-source toolkit for pragmatic topic modeling in indust...
research
04/05/2021

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

In the English speech-to-text (STT) machine learning task, acoustic mode...
research
03/03/2020

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection

With the increased applications of automatic speech recognition (ASR) in...

Please sign up or login with your details

Forgot password? Click here to reset