Evaluating Automatic Speech Recognition in an Incremental Setting

02/23/2023
by   Ryan Whetten, et al.
0

The increasing reliability of automatic speech recognition has proliferated its everyday use. However, for research purposes, it is often unclear which model one should choose for a task, particularly if there is a requirement for speed as well as accuracy. In this paper, we systematically evaluate six speech recognizers using metrics including word error rate, latency, and the number of updates to already recognized words on English test data, as well as propose and compare two methods for streaming audio into recognizers for incremental recognition. We further propose Revokes per Second as a new metric for evaluating incremental recognition and demonstrate that it provides insights into overall model performance. We find that, generally, local recognizers are faster and require fewer updates than cloud-based recognizers. Finally, we find Meta's Wav2Vec model to be the fastest, and find Mozilla's DeepSpeech model to be the most stable in its predictions.

READ FULL TEXT
research
05/08/2021

Robustness of end-to-end Automatic Speech Recognition Models – A Case Study using Mozilla DeepSpeech

When evaluating the performance of automatic speech recognition models, ...
research
10/06/2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

Automatic speech recognition models are often adapted to improve their a...
research
06/02/2020

Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

The demand for fast and accurate incremental speech recognition increase...
research
07/11/2022

Online Continual Learning of End-to-End Speech Recognition Models

Continual Learning, also known as Lifelong Learning, aims to continually...
research
05/22/2023

The neural dynamics of auditory word recognition and integration

Listeners recognize and integrate words in rapid and noisy everyday spee...
research
03/29/2022

Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

An inferior performance of the streaming automatic speech recognition mo...
research
07/22/2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

End-to-end automatic speech recognition systems have achieved great accu...

Please sign up or login with your details

Forgot password? Click here to reset