Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper

09/01/2023
by   Tomasz Wojnar, et al.
0

This article introduces Mi-Go, a novel testing framework aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The framework leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the framework, the Whisper model, developed by OpenAI, was employed as a test object. The tests involve using a total of 124 YouTube videos to test all Whisper model versions. The results underscore the utility of YouTube as a valuable testing platform for speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go framework can help pinpoint potential misuse of YouTube subtitles, like Search Engine Optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus co...
research
11/08/2019

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

This work presents a large-scale audio-visual speech recognition system ...
research
05/10/2021

What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice

This technical report describes the methods and results of a three-week ...
research
04/07/2015

Voice based self help System: User Experience Vs Accuracy

In general, self help systems are being increasingly deployed by service...
research
06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...
research
12/01/2017

Visual Features for Context-Aware Speech Recognition

Automatic transcriptions of consumer-generated multi-media content such ...

Please sign up or login with your details

Forgot password? Click here to reset