Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

08/07/2019
by   Binh Nguyen, et al.
0

In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2023

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition

While speech recognition Word Error Rate (WER) has reached human parity ...
research
04/21/2021

Discriminative Self-training for Punctuation Prediction

Punctuation prediction for automatic speech recognition (ASR) output tra...
research
03/03/2020

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection

With the increased applications of automatic speech recognition (ASR) in...
research
10/12/2019

VAIS ASR: Building a conversational speech recognition system using language model combination

Automatic Speech Recognition (ASR) systems have been evolving quickly an...
research
10/11/2022

Streaming Punctuation for Long-form Dictation with Transformers

While speech recognition Word Error Rate (WER) has reached human parity ...
research
12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...
research
07/19/2017

Fast and Accurate OOV Decoder on High-Level Features

This work proposes a novel approach to out-of-vocabulary (OOV) keyword s...

Please sign up or login with your details

Forgot password? Click here to reset