Incorporating External POS Tagger for Punctuation Restoration

06/12/2021
by   Ning Shi, et al.
0

Punctuation restoration is an important post-processing step in automatic speech recognition. Among other kinds of external information, part-of-speech (POS) taggers provide informative tags, suggesting each input token's syntactic role, which has been shown to be beneficial for the punctuation restoration task. In this work, we incorporate an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information. Besides, we propose sequence boundary sampling (SBS) to learn punctuation positions more efficiently as a sequence tagging task. Experimental results show that our methods can consistently obtain performance gains and achieve a new state-of-the-art on the common IWSLT benchmark. Further ablation studies illustrate that both large pre-trained language models and the external POS tagger take essential parts to improve the model's performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

Improving Punctuation Restoration for Speech Transcripts via External Data

Automatic Speech Recognition (ASR) systems generally do not produce punc...
research
02/19/2022

Punctuation Restoration

Given the increasing number of livestreaming videos, automatic speech re...
research
02/26/2023

Efficient Ensemble Architecture for Multimodal Acoustic and Textual Embeddings in Punctuation Restoration using Time-Delay Neural Networks

Punctuation restoration plays an essential role in the post-processing p...
research
07/24/2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Punctuation restoration is an important task in automatic speech recogni...
research
10/27/2022

Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling

Boundary information is critical for various Chinese language processing...
research
07/29/2022

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Inverse text normalization (ITN) is an essential post-processing step in...
research
04/30/2018

Syntactic Patterns Improve Information Extraction for Medical Search

Medical professionals search the published literature by specifying the ...

Please sign up or login with your details

Forgot password? Click here to reset