AVES: Animal Vocalization Encoder based on Self-Supervision

10/26/2022
by   Masato Hagiwara, et al.
0

The lack of annotated training data in bioacoustics hinders the use of large-scale neural network models trained in a supervised way. In order to leverage a large amount of unannotated audio data, we propose AVES (Animal Vocalization Encoder based on Self-Supervision), a self-supervised, transformer-based audio representation model for encoding animal vocalizations. We pretrain AVES on a diverse set of unannotated audio datasets and fine-tune them for downstream bioacoustics tasks. Comprehensive experiments with a suite of classification and detection tasks have shown that AVES outperforms all the strong baselines and even the supervised "topline" models trained on annotated audio classification datasets. The results also suggest that curating a small training subset related to downstream tasks is an efficient way to train high-quality audio representation models. We open-source our models at <https://github.com/earthspecies/aves>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2022

Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

Transformer-based models attain excellent results and generalize well wh...
research
03/31/2020

How Useful is Self-Supervised Pretraining for Visual Tasks?

Recent advances have spurred incredible progress in self-supervised pret...
research
04/01/2020

Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

Classifier metrics, such as accuracy and F-measure score, often serve as...
research
11/01/2017

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

Audio classification is the task of identifying the sound categories tha...
research
09/04/2023

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation

Histopathological image segmentation is a laborious and time-intensive t...
research
06/01/2023

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

The adoption of advanced deep learning architectures in stuttering detec...
research
10/25/2019

SPICE: Self-supervised Pitch Estimation

We propose a model to estimate the fundamental frequency in monophonic a...

Please sign up or login with your details

Forgot password? Click here to reset