Transferable Models for Bioacoustics with Human Language Supervision

08/09/2023
by   David Robinson, et al.
0

Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds. Given its broad taxa coverage and ability to be flexibly queried in human language, we believe this model opens new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We open-source our models, dataset, and code.

READ FULL TEXT

page 3

page 4

research
07/11/2023

AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

Global change is predicted to induce shifts in anuran acoustic behavior,...
research
06/09/2022

CLAP: Learning Audio Concepts From Natural Language Supervision

Mainstream Audio Analytics models are trained to learn under the paradig...
research
02/25/2019

Automatic Detection and Compression for Passive Acoustic Monitoring of the African Forest Elephant

In this work, we consider applying machine learning to the analysis and ...
research
11/12/2017

Deep Networks tag the location of bird vocalisations on audio spectrograms

This work focuses on reliable detection and segmentation of bird vocaliz...
research
02/12/2023

LiT Tuned Models for Efficient Species Detection

Recent advances in training vision-language models have demonstrated unp...
research
01/08/2019

Presence-absence estimation in audio recordings of tropical frog communities

One non-invasive way to study frog communities is by analyzing long-term...
research
08/20/2021

Parsing Birdsong with Deep Audio Embeddings

Monitoring of bird populations has played a vital role in conservation e...

Please sign up or login with your details

Forgot password? Click here to reset