Contrastive Environmental Sound Representation Learning

07/18/2022
by   Peter Ochieng, et al.
0

Machine hearing of the environmental sound is one of the important issues in the audio recognition domain. It gives the machine the ability to discriminate between the different input sounds that guides its decision making. In this work we exploit the self-supervised contrastive technique and a shallow 1D CNN to extract the distinctive audio features (audio representations) without using any explicit annotations.We generate representations of a given audio using both its raw audio waveform and spectrogram and evaluate if the proposed learner is agnostic to the type of audio input. We further use canonical correlation analysis (CCA) to fuse representations from the two types of input of a given audio and demonstrate that the fused global feature results in robust representation of the audio signal as compared to the individual representations. The evaluation of the proposed technique is done on both ESC-50 and UrbanSound8K. The results show that the proposed technique is able to extract most features of the environmental audio and gives an improvement of 12.8

READ FULL TEXT

page 9

page 10

research
04/07/2023

Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining

Existing contrastive learning methods for anomalous sound detection refi...
research
04/07/2021

Contrastive Learning of Global and Local Audio-Visual Representations

Contrastive learning has delivered impressive results in many audio-visu...
research
08/14/2023

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

We propose a shift towards end-to-end learning in bird sound monitoring ...
research
03/20/2022

A Study on Robustness to Perturbations for Representations of Environmental Sound

Audio applications involving environmental sound analysis increasingly u...
research
08/14/2020

Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound

In medical imaging, manual annotations can be expensive to acquire and s...
research
07/17/2021

Learning De-identified Representations of Prosody from Raw Audio

We propose a method for learning de-identified prosody representations f...
research
10/19/2022

Audio Tampering Detection Based on Shallow and Deep Feature Representation Learning

Digital audio tampering detection can be used to verify the authenticity...

Please sign up or login with your details

Forgot password? Click here to reset