Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

08/19/2022
by   Petros Giannakopoulos, et al.
3

We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcement learning environment where: 1) a state is the class probability distribution provided by the model for a given audio sample, 2) an action is the choice of a candidate optimal value for each parameter of the post-processing stack, 3) the reward is based on the classification accuracy metric we aim to optimize, which is the audio event-based macro F1-score in our case. We apply our post-processing to the class probability distribution outputs of two audio event classification models submitted to the DCASE Task4 2020 challenge. We find that by using reinforcement learning to discover the optimal per-class parameters for the post-processing stack that is applied to the outputs of audio event classification models, we can improve the audio event-based macro F1-score (the main metric used in the DCASE challenge to compare audio event classification accuracy) by 4-5 post-processing stack with manually tuned parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
06/17/2019

Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...
research
09/24/2020

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

The system we used for Task 6 (Automated Audio Captioning)of the Detecti...
research
09/28/2021

When in Doubt: Improving Classification Performance with Alternating Normalization

We introduce Classification with Alternating Normalization (CAN), a non-...
research
04/08/2019

Duration robust sound event detection

Task 4 of the Dcase2018 challenge demonstrated that substantially more r...
research
01/23/2023

Optimising complexity of CNN models for resource constrained devices: QRS detection case study

Traditional DL models are complex and resource hungry and thus, care nee...
research
01/22/2021

Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection

Lexical semantic change detection is a new and innovative research field...
research
11/14/2020

Towards transformation-resilient provenance detection of digital media

Advancements in deep generative models have made it possible to synthesi...

Please sign up or login with your details

Forgot password? Click here to reset