On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems

11/29/2022
by   Thilo von Neumann, et al.
0

We present a general framework to compute the word error rate (WER) of ASR systems that process recordings containing multiple speakers at their input and that produce multiple output word sequences (MIMO). Such ASR systems are typically required, e.g., for meeting transcription. We provide an efficient implementation based on a dynamic programming search in a multi-dimensional Levenshtein distance tensor under the constraint that a reference utterance must be matched consistently with one hypothesis output. This also results in an efficient implementation of the ORC WER which previously suffered from exponential complexity. We give an overview of commonly used WER definitions for multi-speaker scenarios and show that they are specializations of the above MIMO WER tuned to particular application scenarios. We conclude with a discussion of the pros and cons of the various WER definitions and a recommendation when to use which.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

This paper investigates the use of target-speaker automatic speech recog...
research
01/06/2021

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings

An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...
research
09/14/2023

Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)

This paper presents a novel evaluation approach to text-based speaker di...
research
07/21/2023

MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems

MeetEval is an open-source toolkit to evaluate all kinds of meeting tran...
research
05/23/2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

The recently proposed serialized output training (SOT) simplifies multi-...
research
06/13/2022

Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

The "Switchboard benchmark" is a very well-known test set in automatic s...

Please sign up or login with your details

Forgot password? Click here to reset