Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

07/06/2022
by   Yifan Peng, et al.
0

Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing.

READ FULL TEXT

page 8

page 16

research
09/30/2022

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Conformer, combining convolution and self-attention sequentially to capt...
research
05/21/2020

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

Transformer models have been introduced into end-to-end speech recogniti...
research
01/11/2021

ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

Learning to capture dependencies between spatial positions is essential ...
research
05/08/2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Conformer-based models have become the most dominant end-to-end architec...
research
10/23/2020

Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

Recently, several studies reported that dot-product selfattention (SA) m...
research
10/12/2021

Speech Summarization using Restricted Self-Attention

Speech summarization is typically performed by using a cascade of speech...
research
07/02/2023

Conformer LLMs – Convolution Augmented Large Language Models

This work builds together two popular blocks of neural architecture, nam...

Please sign up or login with your details

Forgot password? Click here to reset