CopyNE: Better Contextual ASR by Copying Named Entities

05/22/2023
by   Shilin Zhou, et al.
0

Recent years have seen remarkable progress in automatic speech recognition (ASR). However, traditional token-level ASR models have struggled with accurately transcribing entities due to the problem of homophonic and near-homophonic tokens. This paper introduces a novel approach called CopyNE, which uses a span-level copying mechanism to improve ASR in transcribing entities. CopyNE can copy all tokens of an entity at once, effectively avoiding errors caused by homophonic or near-homophonic tokens that occur when predicting multiple tokens separately. Experiments on Aishell and ST-cmds datasets demonstrate that CopyNE achieves significant reductions in character error rate (CER) and named entity CER (NE-CER), especially in entity-rich scenarios. Furthermore, even when compared to the strong Whisper baseline, CopyNE still achieves notable reductions in CER and NE-CER. Qualitative comparisons with previous approaches demonstrate that CopyNE can better handle entities, effectively improving the accuracy of ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

A Discriminative Entity-Aware Language Model for Virtual Assistants

High-quality automatic speech recognition (ASR) is essential for virtual...
research
05/29/2023

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

End-to-end automatic speech recognition (E2E-ASR) has the potential to i...
research
06/09/2023

Record Deduplication for Entity Distribution Modeling in ASR Transcripts

Voice digital assistants must keep up with trending search queries. We r...
research
12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...
research
06/01/2023

AfriNames: Most ASR models "butcher" African Names

Useful conversational agents must accurately capture named entities to m...
research
03/29/2022

Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition

Conformer has shown a great success in automatic speech recognition (ASR...
research
10/07/2019

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

Instructional videos get high-traffic on video sharing platforms, and pr...

Please sign up or login with your details

Forgot password? Click here to reset