Earnings-21: A Practical Benchmark for ASR in the Wild

04/22/2021
by   Miguel Del Rio, et al.
0

Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special attention towards named entity recognition. We benchmark four commercial ASR models, two internal models built with open-source tools, and an open-source LibriSpeech model and discuss their differences in performance on Earnings-21. Using our recently released fstalign tool, we provide a candid analysis of each model's recognition capabilities under different partitions. Our analysis finds that ASR accuracy for certain NER categories is poor, presenting a significant impediment to transcript comprehension and usage. Earnings-21 bridges academic and commercial ASR system evaluation and enables further research on entity modeling and WER on real world audio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Earnings-22: A Practical Benchmark for Accents in the Wild

Modern automatic speech recognition (ASR) systems have achieved superhum...
research
05/22/2020

End-to-end Named Entity Recognition from English Speech

Named entity recognition (NER) from text has been a widely studied probl...
research
05/30/2018

End-to-end named entity extraction from speech

Named entity recognition (NER) is among SLU tasks that usually extract s...
research
11/26/2019

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment

Automatic Speech Recognition (ASR) is greatly developed in recent years,...
research
03/30/2021

MediaSpeech: Multilanguage ASR Benchmark and Dataset

The performance of automated speech recognition (ASR) systems is well kn...
research
08/31/2018

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

AISHELL-1 is by far the largest open-source speech corpus available for ...
research
04/06/2021

EasyCall corpus: a dysarthric speech dataset

This paper introduces a new dysarthric speech command dataset in Italian...

Please sign up or login with your details

Forgot password? Click here to reset