DeepAI AI Chat
Log In Sign Up

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

by   Jin Sakuma, et al.

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9 test-clean/test-other set, and 0.8 / 1.1 the size of self-attention-based architecture.


page 1

page 2

page 3

page 4


Efficient conformer-based speech recognition with linear attention

Recently, conformer-based end-to-end automatic speech recognition, which...

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

The recently proposed Conformer architecture has shown state-of-the-art ...

Conformer-based Hybrid ASR System for Switchboard Dataset

The recently proposed conformer architecture has been successfully used ...

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Conformer, combining convolution and self-attention sequentially to capt...

Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

Self-attention (SA) based models have recently achieved significant perf...

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone m...