DeepAI AI Chat
Log In Sign Up

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

09/24/2020
by   Daiki Takeuchi, et al.
0

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.

READ FULL TEXT

page 3

page 4

page 5

06/17/2019

Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...
07/01/2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...
08/19/2022

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

We apply post-processing to the class probability distribution outputs o...
05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...
04/08/2021

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

This paper describes our approach to the Toxic Spans Detection problem (...
11/04/2018

Char2char Generation with Reranking for the E2E NLG Challenge

This paper describes our submission to the E2E NLG Challenge. Recently, ...
11/14/2020

Towards transformation-resilient provenance detection of digital media

Advancements in deep generative models have made it possible to synthesi...