SepTr: Separable Transformer for Audio Spectrogram Processing

03/17/2022
by   Nicolae-Catalin Ristea, et al.
0

Following the successful application of vision transformers in multiple computer vision tasks, these models have drawn the attention of the signal processing community. This is because signals are often represented as spectrograms (e.g. through Discrete Fourier Transform) which can be directly provided as input to vision transformers. However, naively applying transformers to spectrograms is suboptimal. Since the axes represent distinct dimensions, i.e. frequency and time, we argue that a better approach is to separate the attention dedicated to each axis. To this end, we propose the Separable Transformer (SepTr), an architecture that employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval. We conduct experiments on three benchmark data sets, showing that our separable architecture outperforms conventional vision transformers and other state-of-the-art methods. Unlike standard transformers, SepTr linearly scales the number of trainable parameters with the input size, thus having a lower memory footprint. Our code is available as open source at https://github.com/ristea/septr.

READ FULL TEXT
research
03/21/2023

Machine Learning for Brain Disorders: Transformers and Visual Transformers

Transformers were initially introduced for natural language processing (...
research
03/30/2021

Rethinking Spatial Dimensions of Vision Transformers

Vision Transformer (ViT) extends the application range of transformers f...
research
11/22/2021

MetaFormer is Actually What You Need for Vision

Transformers have shown great potential in computer vision tasks. A comm...
research
05/30/2021

Gaze Estimation using Transformer

Recent work has proven the effectiveness of transformers in many compute...
research
10/25/2022

Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets

Vision Transformers have attracted a lot of attention recently since the...
research
06/21/2023

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Equivariant Transformers such as Equiformer have demonstrated the effica...
research
12/15/2022

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Vision transformers (ViTs) have achieved impressive results on various c...

Please sign up or login with your details

Forgot password? Click here to reset