Weak-Attention Suppression For Transformer Based Speech Recognition

05/18/2020
by   Yangyang Shi, et al.
0

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR). However, adjacent acoustic units (i.e., frames) are highly correlated, and long-distance dependencies between them are weak, unlike text units. It suggests that ASR will likely benefit from sparse and localized attention. In this paper, we propose Weak-Attention Suppression (WAS), a method that dynamically induces sparsity in attention probabilities. We demonstrate that WAS leads to consistent Word Error Rate (WER) improvement over strong transformer baselines. On the widely used LibriSpeech benchmark, our proposed method reduced WER by 10 transformers, resulting in a new state-of-the-art among streaming models. Further analysis shows that WAS learns to suppress attention of non-critical and redundant continuous acoustic frames, and is more likely to suppress past frames rather than future ones. It indicates the importance of lookahead in attention-based ASR models.

READ FULL TEXT
research
05/16/2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

The choice of modeling units is critical to automatic speech recognition...
research
11/08/2020

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Automatic Speech Recognition with Transformers

Recently, Transformers have shown competitive automatic speech recogniti...
research
10/29/2022

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

Transformers are among the state of the art for many tasks in speech, vi...
research
10/22/2019

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

We propose and evaluate transformer-based acoustic models (AMs) for hybr...
research
02/27/2023

Low latency transformers for speech processing

The transformer is a widely-used building block in modern neural network...
research
04/26/2019

Transformers with convolutional context for ASR

The recent success of transformer networks for neural machine translatio...
research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...

Please sign up or login with your details

Forgot password? Click here to reset