Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning

by   Karn N. Watcharasupat, et al.

The Sørensen–Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-entropy loss often results in suboptimal detection performance as the training is often overwhelmed by updates from negative samples. In this paper, we investigated the effect of the Dice loss, intra- and inter-modal transfer learning, data augmentation, and recording formats, on the performance of polyphonic sound event detection systems with multichannel inputs. Our analysis showed that polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss across different training settings and recording formats in terms of F1 score and error rate. We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques.



There are no comments yet.


page 1

page 2

page 3

page 4


Sound Event Detection Using Duration Robust Loss Function

Many methods of sound event detection (SED) based on machine learning re...

Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production Systems

Precision weed management offers a promising solution for sustainable cr...

Dice Loss for Data-imbalanced NLP Tasks

Many NLP tasks such as tagging and machine reading comprehension are fac...

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net

Our systems submitted to the DCASE2020 task 3: Sound Event Localization ...

Audiovisual transfer learning for audio tagging and sound event detection

We study the merit of transfer learning for two sound recognition proble...

Mixing between the Cross Entropy and the Expectation Loss Terms

The cross entropy loss is widely used due to its effectiveness and solid...

Simpson's Bias in NLP Training

In most machine learning tasks, we evaluate a model M on a given data po...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.