TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

10/27/2022
by   Piyush Behre, et al.
0

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2022

Assessing ASR Model Quality on Disordered Speech using BERTScore

Word Error Rate (WER) is the primary metric used to assess automatic spe...
research
02/27/2023

Diacritic Recognition Performance in Arabic ASR

We present an analysis of diacritic recognition performance in Arabic Au...
research
06/02/2020

An ASR Guided Speech Intelligibility Measure for TTS Model Selection

The perceptual quality of neural text-to-speech (TTS) is highly dependen...
research
02/22/2021

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high perfo...
research
09/14/2021

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

This paper is a study of performance-efficiency trade-offs in pre-traine...
research
04/14/2022

Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers

This paper presents the first multi-objective transformer model for cons...

Please sign up or login with your details

Forgot password? Click here to reset