A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

05/18/2023
by   Yifan Peng, et al.
0

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it promising for more general speech applications. This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models. Results demonstrate that E-Branchformer achieves comparable or better performance than Conformer in almost all evaluation sets across 15 ASR, 2 ST, and 3 SLU benchmarks, while being more stable during training. We will release our training configurations and pre-trained models for reproducibility, which can benefit the speech community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

A Comparative Study on Transformer vs RNN in Speech Applications

Sequence-to-sequence models have been widely used in end-to-end speech p...
research
11/29/2021

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Spoken language understanding (SLU) tasks are usually solved by first tr...
research
10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...
research
08/03/2021

Learning a Neural Diff for Speech Models

As more speech processing applications execute locally on edge devices, ...
research
11/19/2021

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

Progress in speech processing has been facilitated by shared datasets an...
research
05/08/2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Conformer-based models have become the most dominant end-to-end architec...
research
09/18/2023

Instruction-Following Speech Recognition

Conventional end-to-end Automatic Speech Recognition (ASR) models primar...

Please sign up or login with your details

Forgot password? Click here to reset