WER we are and WER we think we are

10/07/2020
by   Piotr Szymański, et al.
0

Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Simulating realistic speech overlaps improves multi-talker ASR

Multi-talker automatic speech recognition (ASR) has been studied to gene...
research
04/01/2022

PriMock57: A Dataset Of Primary Care Mock Consultations

Recent advances in Automatic Speech Recognition (ASR) have made it possi...
research
10/16/2021

ASR4REAL: An extended benchmark for speech models

Popular ASR benchmarks such as Librispeech and Switchboard are limited i...
research
02/25/2022

Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR

Despite the fact that variation is a fundamental characteristic of natur...
research
04/13/2019

M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding

Deep learning is at the core of recent spoken language understanding (SL...
research
12/18/2018

Multiple topic identification in human/human conversations

The paper deals with the automatic analysis of real-life telephone conve...

Please sign up or login with your details

Forgot password? Click here to reset