Low latency transformers for speech processing

02/27/2023
by   Jianbo Ma, et al.
0

The transformer is a widely-used building block in modern neural networks. However, when applied to audio data, the transformer's acausal behaviour, which we term Acausal Attention (AA), has generally limited its application to offline tasks. In this paper we introduce Streaming Attention (SA), which operates causally with fixed latency, and requires lower compute and memory resources than AA to train. Next, we introduce Low Latency Streaming Attention (LLSA), a method which combines multiple SA layers without latency build-up proportional to the layer count. Comparative analysis between AA, SA and LLSA on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) tasks are presented. The results show that causal SA-based networks with fixed latencies of a few seconds (e.g. 1.8 seconds) and LLSA networks with latencies as short as 300 ms can perform comparably with acausal (AA) networks. We conclude that SA and LLSA methods retain many of the benefits of conventional acausal transformers, but with latency characteristics that make them practical to run in real-time streaming applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

In this paper we present a Transformer-Transducer model architecture and...
research
05/19/2020

Exploring Transformers for Large-Scale Speech Recognition

While recurrent neural networks still largely define state-of-the-art sp...
research
10/21/2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

This paper proposes an efficient memory transformer Emformer for low lat...
research
11/27/2018

Cloud based Real-Time and Low Latency Scientific Event Analysis

Astronomy is well recognized as big data driven science. As the novel ob...
research
03/29/2022

Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

An inferior performance of the streaming automatic speech recognition mo...
research
10/27/2020

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

In this paper, we summarize the application of transformer and its strea...
research
05/18/2020

Weak-Attention Suppression For Transformer Based Speech Recognition

Transformers, originally proposed for natural language processing (NLP) ...

Please sign up or login with your details

Forgot password? Click here to reset