Focus on the present: a regularization method for the ASR source-target attention layer

11/02/2020
by   Nanxin Chen, et al.
0

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the fact that both, CTC and source-target attention, are acting on the same encoder representations. To understand the functionality of the attention, CTC is applied to compute the token posteriors given the attention outputs. We found that the source-target attention heads are able to predict several tokens ahead of the current one. Inspired by the observation, a new regularization method is proposed which leverages CTC to make source-target attention more focused on the frames corresponding to the output token being predicted by the decoder. Experiments reveal stable improvements up to 7% and 13% relatively with the proposed regularization on TED-LIUM 2 and LibriSpeech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

Simultaneous machine translation (SiMT) outputs the translation while re...
research
12/22/2017

Source-side Prediction for Neural Headline Generation

The encoder-decoder model is widely used in natural language generation ...
research
05/16/2019

Joint Source-Target Self Attention with Locality Constraints

The dominant neural machine translation models are based on the encoder-...
research
07/09/2022

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

Intermediate layer output (ILO) regularization by means of multitask tra...
research
02/21/2023

Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

For end-to-end speech translation, regularizing the encoder with the Con...
research
07/07/2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

In real-world applications, users often require both translations and tr...

Please sign up or login with your details

Forgot password? Click here to reset