Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

05/08/2023
by   Dima Rekesh, et al.
0

Conformer-based models have become the most dominant end-to-end architecture for speech processing tasks. In this work, we propose a carefully redesigned Conformer with a new down-sampling schema. The proposed model, named Fast Conformer, is 2.8x faster than original Conformer, while preserving state-of-the-art accuracy on Automatic Speech Recognition benchmarks. Also we replace the original Conformer global attention with limited context attention post-training to enable transcription of an hour-long audio. We further improve long-form speech transcription by adding a global token. Fast Conformer combined with a Transformer decoder also outperforms the original Conformer in accuracy and in speed for Speech Translation and Spoken Language Understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2023

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

This paper presents an overview and evaluation of some of the end-to-end...
research
10/11/2021

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

The Transformer architecture has been well adopted as a dominant archite...
research
01/14/2021

Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications

Many real-world applications require to convert speech files into text w...
research
05/18/2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Conformer, a convolution-augmented Transformer variant, has become the d...
research
07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...
research
05/14/2018

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

We compare the fast training and decoding speed of RETURNN of attention ...
research
08/12/2020

End-to-End Neural Transformer Based Spoken Language Understanding

Spoken language understanding (SLU) refers to the process of inferring t...

Please sign up or login with your details

Forgot password? Click here to reset