Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

01/29/2023
by   Xian Shi, et al.
0

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment (FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability. This paper proposes to perform timestamp prediction (TP) while recognizing by utilizing continuous integrate-and-fire (CIF) mechanism in non-autoregressive ASR model - Paraformer. Foucing on the fire place bias issue of CIF, we conduct post-processing strategies including fire-delay and silence insertion. Besides, we propose to use scaled-CIF to smooth the weights of CIF output, which is proved beneficial for both ASR and TP task. Accumulated averaging shift (AAS) and diarization error rate (DER) are adopted to measure the quality of timestamps and we compare these metrics of proposed system and conventional hybrid force-alignment system. The experiment results over manually-marked timestamps testset show that the proposed optimization methods significantly improve the accuracy of CIF timestamps, reducing 66.7% and 82.1% of AAS and DER respectively. Comparing to Kaldi force-alignment trained with the same data, optimized CIF timestamps achieved 12.3% relative AAS reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

We propose an end-to-end ASR system that can be trained on transcribed s...
research
05/06/2021

Reducing Streaming ASR Model Delay with Self Alignment

Reducing prediction delay for streaming end-to-end ASR models with minim...
research
05/18/2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Estimating confidence scores for recognition results is a classic task i...
research
04/10/2021

Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

Continuous integrate-and-fire (CIF) based models, which use a soft and m...
research
05/21/2023

Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

End-to-end (e2e) systems have recently gained wide popularity in automat...
research
05/19/2020

Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

In this work, we first show that on the widely used LibriSpeech benchmar...
research
08/07/2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Hotword customization is one of the important issues remained in ASR fie...

Please sign up or login with your details

Forgot password? Click here to reset