InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition

05/24/2023
by   Zhi-Hao Lai, et al.
0

The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR. Specifically, we combine the convolution block with the transformer block in a parallel design. Besides, we propose a bidirectional feature interaction module (BFIM) and a selective fusion module (SFM) to implement the interaction and fusion of local and global features, respectively. Extensive experiments on public ASR datasets demonstrate the effectiveness of our proposed InterFormer and its superior performance over the other Transformer and Conformer models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2023

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Transformer-based models have recently made significant achievements in ...
research
07/12/2021

GiT: Graph Interactive Transformer for Vehicle Re-identification

Transformers are more and more popular in computer vision, which treat a...
research
05/22/2023

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

Effective fusion of multi-scale features is crucial for improving speake...
research
09/14/2023

A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors

We present a novel local-global feature fusion framework for body-weight...
research
10/31/2022

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

The recently proposed Conformer architecture which combines convolution ...
research
08/03/2023

Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification

Clinical decision support systems (CDSSs) have been widely utilized to s...
research
01/04/2022

DigNet: Digging Clues from Local-Global Interactive Graph for Aspect-level Sentiment Classification

In aspect-level sentiment classification (ASC), state-of-the-art models ...

Please sign up or login with your details

Forgot password? Click here to reset