Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

10/26/2022
by   Sharman Tan, et al.
0

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text devoid of formatting, and tagging approaches to formatting address just one or two features at a time. In this paper, we unify spoken-to-written text conversion via a two-stage process: First, we use a single transformer tagging model to jointly produce token-level tags for inverse text normalization (ITN), punctuation, capitalization, and disfluencies. Then, we apply the tags to generate written-form text and use weighted finite state transducer (WFST) grammars to format tagged ITN entity spans. Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2021

NeMo Inverse Text Normalization: From Development To Production

Inverse text normalization (ITN) converts spoken-domain automatic speech...
research
11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...
research
07/29/2022

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Inverse text normalization (ITN) is an essential post-processing step in...
research
08/23/2021

A Unified Transformer-based Framework for Duplex Text Normalization

Text normalization (TN) and inverse text normalization (ITN) are essenti...
research
02/01/2022

Transformer-based Models of Text Normalization for Speech Applications

Text normalization, or the process of transforming text into a consisten...
research
07/17/2017

To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, ...
research
07/20/2022

Improving Data Driven Inverse Text Normalization using Data Augmentation

Inverse text normalization (ITN) is used to convert the spoken form outp...

Please sign up or login with your details

Forgot password? Click here to reset