DeepAI AI Chat
Log In Sign Up

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

by   Chunyang Wu, et al.

Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. However, it requires access to the full sequence, and thecomputational cost grows quadratically with respect to the in-put sequence length. These factors limit its adoption for stream-ing applications. In this work, we proposed a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories. The memory bankstores the embedding information for all the processed seg-ments. On the librispeech benchmark, our proposed methodoutperforms all the existing streamable transformer methods bya large margin and achieved over 15 used LC-BLSTM baseline. Our find-ings are also confirmed on some large internal datasets.


page 1

page 2

page 3

page 4


A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

Transformer with self-attention has achieved great success in the area o...

Self-Attentional Acoustic Models

Self-attention is a method of encoding sequences of vectors by relating ...

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

We propose and evaluate transformer-based acoustic models (AMs) for hybr...

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Self-attention mechanisms have made striking state-of-the-art (SOTA) pro...

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...

Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries

In this paper, we show that structures similar to self-attention are nat...

Relative Positional Encoding for Speech Recognition and Direct Translation

Transformer models are powerful sequence-to-sequence architectures that ...