CAT: Causal Audio Transformer for Audio Classification

03/14/2023
by   Xiaoyu Liu, et al.
0

The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi-Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. In addition, we propose a causal module that alleviates over-fitting, helps with knowledge transfer, and improves interpretability. CAT obtains higher or comparable state-of-the-art classification performance on ESC50, AudioSet and UrbanSound8K datasets, and can be easily generalized to other Transformer-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2022

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Audio classification is an important task of mapping audio samples into ...
research
11/23/2022

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Vision transformers, which were originally developed for natural languag...
research
03/23/2023

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

We introduce LMCodec, a causal neural speech codec that provides high qu...
research
12/20/2022

Visual Transformers for Primates Classification and Covid Detection

We apply the vision transformer, a deep machine learning model build aro...
research
10/13/2021

Study of positional encoding approaches for Audio Spectrogram Transformers

Transformers have revolutionized the world of deep learning, specially i...
research
09/21/2021

Audiomer: A Convolutional Transformer for Keyword Spotting

Transformers have seen an unprecedented rise in Natural Language Process...
research
06/01/2023

Masked Autoencoders with Multi-Window Attention Are Better Audio Learners

Several recent works have adapted Masked Autoencoders (MAEs) for learnin...

Please sign up or login with your details

Forgot password? Click here to reset