Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition

09/14/2021
by   Chuan-Fei Zhang, et al.
11

Non-autoregressive (NAR) transformer models have been studied intensively in automatic speech recognition (ASR), and a substantial part of NAR transformer models is to use the casual mask to limit token dependencies. However, the casual mask is designed for the left-to-right decoding process of the non-parallel autoregressive (AR) transformer, which is inappropriate for the parallel NAR transformer since it ignores the right-to-left contexts. Some models are proposed to utilize right-to-left contexts with an extra decoder, but these methods increase the model complexity. To tackle the above problems, we propose a new non-autoregressive transformer with a unified bidirectional decoder (NAT-UBD), which can simultaneously utilize left-to-right and right-to-left contexts. However, direct use of bidirectional contexts will cause information leakage, which means the decoder output can be affected by the character information from the input of the same position. To avoid information leakage, we propose a novel attention mask and modify vanilla queries, keys, and values matrices for NAT-UBD. Experimental results verify that NAT-UBD can achieve character error rates (CERs) of 5.0 Aishell1 dev/test sets, outperforming all previous NAR transformer models. Moreover, NAT-UBD can run 49.8x faster than the AR transformer baseline when decoding in a single step.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2020

Transformer with Bidirectional Decoder for Speech Recognition

Attention-based models have made tremendous progress on end-to-end autom...
research
12/08/2019

Bidirectional Scene Text Recognition with a Single Decoder

Scene Text Recognition (STR) is the problem of recognizing the correct w...
research
12/22/2021

Diformer: Directional Transformer for Neural Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own s...
research
06/13/2022

On the Learning of Non-Autoregressive Transformers

Non-autoregressive Transformer (NAT) is a family of text generation mode...
research
10/28/2020

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

We propose a CTC alignment-based single step non-autoregressive transfor...
research
10/19/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Non-autoregressive models generate target words in a parallel way, which...
research
11/10/2019

Non-Autoregressive Transformer Automatic Speech Recognition

Recently very deep transformers start showing outperformed performance t...

Please sign up or login with your details

Forgot password? Click here to reset