Visual Transformers for Primates Classification and Covid Detection

12/20/2022
by   Steffen Illium, et al.
0

We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics

READ FULL TEXT

page 3

page 4

research
08/12/2018

Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

Audio tagging has attracted increasing attention since last decade and h...
research
03/24/2022

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Dynamic attention mechanism and global modeling ability make Transformer...
research
03/14/2023

CAT: Causal Audio Transformer for Audio Classification

The attention-based Transformers have been increasingly applied to audio...
research
11/23/2022

Data Augmentation Vision Transformer for Fine-grained Image Classification

Recently, the vision transformer (ViT) has made breakthroughs in image r...
research
02/02/2022

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Audio classification is an important task of mapping audio samples into ...
research
07/19/2022

COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers

Monitoring of prevalent airborne diseases such as COVID-19 characteristi...
research
09/14/2023

Efficient Face Detection with Audio-Based Region Proposals

Robot vision often involves a large computational load due to large imag...

Please sign up or login with your details

Forgot password? Click here to reset