HTEC: Human Transcription Error Correction

09/18/2023
by   Hanbo Sun, et al.
0

High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study <cit.> has found that every 1 approximately 2 Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2 deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1 sacrificing transcription velocity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2023

Can Generative Large Language Models Perform ASR Error Correction?

ASR error correction continues to serve as an important part of post-pro...
research
04/30/2018

Automatic Metric Validation for Grammatical Error Correction

Metric validation in Grammatical Error Correction (GEC) is currently don...
research
08/07/2023

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

Chinese Automatic Speech Recognition (ASR) error correction presents sig...
research
07/13/2021

Information Spread with Error Correction

We study the process of information dispersal in a network with communic...
research
01/20/2022

Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

In grammatical error correction (GEC), automatic evaluation is an import...
research
03/18/2022

Towards Lithuanian grammatical error correction

Everyone wants to write beautiful and correct text, yet the lack of lang...
research
02/12/2023

An Extended Sequence Tagging Vocabulary for Grammatical Error Correction

We extend a current sequence-tagging approach to Grammatical Error Corre...

Please sign up or login with your details

Forgot password? Click here to reset