A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision

06/21/2023
by   Kamer Ali Yuksel, et al.
0

The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual ground-truth transcriptions that are time-consuming and expensive to obtain. This work proposes a multi-language referenceless quality metric, which allows comparing the performance of different ASR models on a speech dataset without ground truth transcriptions. To estimate the quality of ASR hypotheses, a pre-trained language model (LM) is fine-tuned with contrastive learning in a self-supervised learning manner. In experiments conducted on several unseen test datasets consisting of outputs from top commercial ASR engines in various languages, the proposed referenceless metric obtains a much higher correlation with WER scores and their ranks than the perplexity metric from the state-of-art multi-lingual LM in all experiments, and also reduces WER by more than 7% when used for ensembling hypotheses. The fine-tuned model and experiments are made available for the reproducibility: https://github.com/aixplain/NoRefER

READ FULL TEXT
research
03/18/2023

A Deep Learning System for Domain-specific speech Recognition

As human-machine voice interfaces provide easy access to increasingly in...
research
10/11/2021

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Measuring automatic speech recognition (ASR) system quality is critical ...
research
08/14/2020

Adaptable Multi-Domain Language Model for Transformer ASR

We propose an adapter based multi-domain Transformer based language mode...
research
10/27/2022

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Punctuation and Segmentation are key to readability in Automatic Speech ...
research
11/29/2022

Better Transcription of UK Supreme Court Hearings

Transcription of legal proceedings is very important to enable access to...
research
11/06/2018

Discriminative training of RNNLMs with the average word error criterion

In automatic speech recognition (ASR), recurrent neural language models ...

Please sign up or login with your details

Forgot password? Click here to reset