You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

09/01/2021
by   Satvik Venkatesh, et al.
8

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. YOHO obtained a higher F-measure and lower error rate than the state-of-the-art Convolutional Recurrent Neural Network on multiple datasets. As YOHO is purely a convolutional neural network and has no recurrent layers, it is faster during inference. In addition, as this approach is more end-to-end and predicts acoustic boundaries directly, it is significantly quicker during post-processing and smoothing.

READ FULL TEXT

page 1

page 5

research
11/01/2021

Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset

Sound event detection (SED) in machine listening entails identifying the...
research
06/17/2019

Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...
research
08/20/2018

R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

This paper proposes a Region-based Convolutional Recurrent Neural Networ...
research
02/19/2021

Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast

Segmenting audio into homogeneous sections such as music and speech help...
research
03/30/2018

Conditional End-to-End Audio Transforms

We present an end-to-end method for transforming audio from one style to...
research
05/25/2023

SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered Systems

A fundamental problem of every intermittently-powered sensing system is ...
research
02/13/2019

Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection

In this paper, we adapt Recurrent Neural Networks with Stochastic Layers...

Please sign up or login with your details

Forgot password? Click here to reset