DeepAI AI Chat
Log In Sign Up

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

by   Daiki Takeuchi, et al.

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.


page 3

page 4

page 5


Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

We apply post-processing to the class probability distribution outputs o...

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

This paper describes our approach to the Toxic Spans Detection problem (...

Char2char Generation with Reranking for the E2E NLG Challenge

This paper describes our submission to the E2E NLG Challenge. Recently, ...

Towards transformation-resilient provenance detection of digital media

Advancements in deep generative models have made it possible to synthesi...