Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

05/21/2023
by   Karel Beneš, et al.
0

End-to-end (e2e) systems have recently gained wide popularity in automatic speech recognition. However, these systems do generally not provide well-calibrated word-level confidences. In this paper, we propose Hystoc, a simple method for obtaining word-level confidences from hypothesis-level scores. Hystoc is an iterative alignment procedure which turns hypotheses from an n-best output of the ASR system into a confusion network. Eventually, word-level confidences are obtained as posterior probabilities in the individual bins of the confusion network. We show that Hystoc provides confidences that correlate well with the accuracy of the ASR hypothesis. Furthermore, we show that utilizing Hystoc in fusion of multiple e2e ASR systems increases the gains from the fusion by up to 1 % WER absolute on Spanish RTVE2020 dataset. Finally, we experiment with using Hystoc for direct fusion of n-best outputs from multiple systems, but we only achieve minor gains when fusing very similar systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

End-to-end (E2E) systems have shown comparable performance to hybrid sys...
research
01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
research
04/17/2019

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Conventional automatic speech recognition (ASR) systems trained from fra...
research
08/08/2018

End-to-end Speech Recognition with Word-based RNN Language Models

This paper investigates the impact of word-based RNN language models (RN...
research
06/20/2023

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

In this work, we describe a novel method of training an embedding-matchi...
research
01/29/2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Conventional ASR systems use frame-level phoneme posterior to conduct fo...
research
07/19/2017

Fast and Accurate OOV Decoder on High-Level Features

This work proposes a novel approach to out-of-vocabulary (OOV) keyword s...

Please sign up or login with your details

Forgot password? Click here to reset