E-Branchformer: Branchformer with Enhanced merging for speech recognition

09/30/2022
by   Kwangyoun Kim, et al.
0

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81 test-clean and test-other sets without using any external training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...
research
05/22/2023

GNCformer Enhanced Self-attention for Automatic Speech Recognition

In this paper,an Enhanced Self-Attention (ESA) mechanism has been put fo...
research
08/31/2021

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

The recently proposed Conformer architecture has shown state-of-the-art ...
research
08/22/2021

Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers

Recurrent neural network transducers (RNN-T) are a promising end-to-end ...
research
02/17/2022

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

We propose multi-layer perceptron (MLP)-based architectures suitable for...
research
01/24/2020

A Branching and Merging Convolutional Network with Homogeneous Filter Capsules

We present a convolutional neural network design with additional branche...
research
09/01/2022

Attention Enhanced Citrinet for Speech Recognition

Citrinet is an end-to-end convolutional Connectionist Temporal Classific...

Please sign up or login with your details

Forgot password? Click here to reset