Detecting human and non-human vocal productions in large scale audio recordings

02/14/2023
by   Guillem Bonafos, et al.
0

We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings. Through a series of computational steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network for detecting various types of natural vocal productions in a noisy data stream without requiring a large sample of labeled data. We test it on two different data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58 continuous recordings and it creates two new databases of 38.8 and 35.2 hours, respectively. We discuss the strengths and limitations of this approach that can be applied to any massive audio recording.

READ FULL TEXT

page 16

page 17

research
05/21/2023

Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

To perform automatic family audio analysis, past studies have collected ...
research
01/05/2023

Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks

We present a novel approach to automatically detect and classify great a...
research
02/03/2021

Building population models for large-scale neural recordings: opportunities and pitfalls

Modern extracellular recording technologies now enable simultaneous reco...
research
08/24/2022

Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio Text Augmentations

The absence of large labeled datasets remains a significant challenge in...
research
11/12/2017

Deep Networks tag the location of bird vocalisations on audio spectrograms

This work focuses on reliable detection and segmentation of bird vocaliz...
research
09/30/2019

DiPCo – Dinner Party Corpus

We present a speech data corpus that simulates a "dinner party" scenario...
research
01/06/2022

Implementing simple spectral denoising for environmental audio recordings

This technical report details changes applied to a noise filter to facil...

Please sign up or login with your details

Forgot password? Click here to reset