Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

06/02/2023
by   Hanbyul Kim, et al.
0

Punctuated text prediction is crucial for automatic speech recognition as it enhances readability and impacts downstream natural language processing tasks. In streaming scenarios, the ability to predict punctuation in real-time is particularly desirable but presents a difficult technical challenge. In this work, we propose a method for predicting punctuated text from input speech using a chunk-based Transformer encoder trained with Connectionist Temporal Classification (CTC) loss. The acoustic model trained with long sequences by concatenating the input and target sequences can learn punctuation marks attached to the end of sentences more effectively. Additionally, by combining CTC losses on the chunks and utterances, we achieved both the improved F1 score of punctuation prediction and Word Error Rate (WER).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin

Nigerian Pidgin remains one of the most popular languages in West Africa...
research
08/02/2023

Careful Whisper – leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

This paper presents a fully automated approach for identifying speech an...
research
07/07/2022

End-to-end Speech-to-Punctuated-Text Recognition

Conventional automatic speech recognition systems do not produce punctua...
research
12/03/2020

End to End ASR System with Automatic Punctuation Insertion

Recent Automatic Speech Recognition systems have been moving towards end...
research
10/24/2019

Recognizing long-form speech using streaming end-to-end models

All-neural end-to-end (E2E) automatic speech recognition (ASR) systems t...
research
02/12/2020

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

We discuss the problem of echographic transcription in autoregressive se...
research
05/21/2020

Large scale evaluation of importance maps in automatic speech recognition

In this paper, we propose a metric that we call the structured saliency ...

Please sign up or login with your details

Forgot password? Click here to reset