Efficient Training of Audio Transformers with Patchout

10/11/2021
by   Khaled Koutini, et al.
0

The great success of transformer-based models in natural language processing (NLP) has led to various attempts at adapting these architectures to other domains such as vision and audio. Recent work has shown that transformers can outperform Convolutional Neural Networks (CNNs) on vision and audio tasks. However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity. Compute and memory complexity grow quadratically with the input length. Therefore, there has been extensive work on optimizing transformers, but often at the cost of lower predictive performance. In this work, we propose a novel method to optimize and regularize transformers on audio spectrograms. The proposed models achieve a new state-of-the-art performance on Audioset and can be trained on a single consumer-grade GPU. Furthermore, we propose a transformer model that outperforms CNNs in terms of both performance and training speed.

READ FULL TEXT
research
11/09/2022

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

Audio Spectrogram Transformer models rule the field of Audio Tagging, ou...
research
07/13/2021

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition ...
research
11/25/2022

Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

The success of supervised deep learning methods is largely due to their ...
research
10/13/2021

Study of positional encoding approaches for Audio Spectrogram Transformers

Transformers have revolutionized the world of deep learning, specially i...
research
09/21/2021

Audiomer: A Convolutional Transformer for Keyword Spotting

Transformers have seen an unprecedented rise in Natural Language Process...
research
05/29/2023

Streaming Audio Transformers for Online Audio Tagging

Transformers have emerged as a prominent model framework for audio taggi...
research
06/01/2023

Adapting a ConvNeXt model to audio classification on AudioSet

In computer vision, convolutional neural networks (CNN) such as ConvNeXt...

Please sign up or login with your details

Forgot password? Click here to reset