Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

03/14/2023
by   Xulong Zhang, et al.
0

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. The AXE ignores the absolute position alignment between prediction and ground truth sentence and focuses on tokens matching in relative order. The dynamic rectification method makes the model capable of simulating the non-mask but possible wrong tokens, even if they have high confidence. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Non-autoregressive machine translation models significantly speed up dec...
research
10/08/2022

Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Non-autoregressive translation (NAT) models are typically trained with t...
research
10/16/2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Masked language model (MLM) has been widely used for understanding tasks...
research
11/10/2019

Non-Autoregressive Transformer Automatic Speech Recognition

Recently very deep transformers start showing outperformed performance t...
research
02/17/2023

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Wake word detection exists in most intelligent homes and portable device...
research
02/01/2022

Regression Transformer: Concurrent Conditional Generation and Regression by Blending Numerical and Textual Tokens

We report the Regression Transformer (RT), a method that abstracts regre...
research
10/20/2022

Multi-Granularity Optimization for Non-Autoregressive Translation

Despite low latency, non-autoregressive machine translation (NAT) suffer...

Please sign up or login with your details

Forgot password? Click here to reset