SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform

06/13/2021
by   Yuhang He, et al.
0

We present a new framework SoundDet, which is an end-to-end trainable and light-weight framework, for polyphonic moving sound event detection and localization. Prior methods typically approach this problem by preprocessing raw waveform into time-frequency representations, which is more amenable to process with well-established image processing pipelines. Prior methods also detect in segment-wise manner, leading to incomplete and partial detections. SoundDet takes a novel approach and directly consumes the raw, multichannel waveform and treats the spatio-temporal sound event as a complete “sound-object" to be detected. Specifically, SoundDet consists of a backbone neural network and two parallel heads for temporal detection and spatial localization, respectively. Given the large sampling rate of raw waveform, the backbone network first learns a set of phase-sensitive and frequency-selective bank of filters to explicitly retain direction-of-arrival information, whilst being highly computationally and parametrically efficient than standard 1D/2D convolution. A dense sound event proposal map is then constructed to handle the challenges of predicting events with large varying temporal duration. Accompanying the dense proposal map are a temporal overlapness map and a motion smoothness map that measure a proposal's confidence to be an event from temporal detection accuracy and movement consistency perspective. Involving the two maps guarantees SoundDet to be trained in a spatio-temporally unified manner. Experimental results on the public DCASE dataset show the advantage of SoundDet on both segment-based and our newly proposed event-based evaluation system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2019

A two-step system for sound event localization and detection

Sound event detection and sound event localization requires different fe...
research
06/29/2021

DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection

Sound event localization and detection consists of two subtasks which ar...
research
05/09/2018

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

Sound event detection systems typically consist of two stages: extractin...
research
10/01/2021

SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

Sound event localization and detection (SELD) consists of two subtasks, ...
research
04/03/2019

End-to-end Binaural Sound Localisation from the Raw Waveform

A novel end-to-end binaural sound localisation approach is proposed whic...
research
06/21/2022

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Sound event detection (SED) is an interesting but challenging task due t...

Please sign up or login with your details

Forgot password? Click here to reset