DOVER: A Method for Combining Diarization Outputs

09/17/2019
by   Andreas Stolcke, et al.
0

Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems. Diarization, the task of segmenting an audio stream into speaker-homogeneous and co-indexed regions, has so far not seen the benefit of this strategy because the structure of the task does not lend itself to a simple voting approach. This paper presents DOVER (diarization output voting error reduction), an algorithm for weighted voting among diarization hypotheses, in the spirit of the ROVER algorithm for combining speech recognition hypotheses. We evaluate the algorithm for diarization of meeting recordings with multiple microphones, and find that it consistently reduces diarization error rate over the average of results from individual channels, and often improves on the single best channel chosen by an oracle.

READ FULL TEXT
research
06/17/2019

On combining features for single-channel robust speech recognition in reverberant environments

This paper addresses the combination of complementary parallel speech re...
research
11/03/2020

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping sp...
research
01/06/2021

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings

An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...
research
10/22/2020

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

This paper describes the Microsoft speaker diarization system for monaur...
research
12/26/2021

Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data

Speech recognition is very challenging in student learning environments ...
research
09/10/2021

Large-vocabulary Audio-visual Speech Recognition in Noisy Environments

Audio-visual speech recognition (AVSR) can effectively and significantly...
research
10/24/2019

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Speaker diarization based on bottom-up clustering of speech segments by ...

Please sign up or login with your details

Forgot password? Click here to reset