Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

09/24/2020
by   Daiki Takeuchi, et al.
0

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.

READ FULL TEXT

page 3

page 4

page 5

research
06/17/2019

Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...
research
07/01/2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...
research
08/19/2022

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

We apply post-processing to the class probability distribution outputs o...
research
09/02/2023

A Post-Processing Based Bengali Document Layout Analysis with YOLOV8

This paper focuses on enhancing Bengali Document Layout Analysis (DLA) u...
research
01/22/2021

Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection

Lexical semantic change detection is a new and innovative research field...
research
03/12/2021

A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

Assessing the proper difficulty levels of reading materials or texts in ...
research
11/14/2020

Towards transformation-resilient provenance detection of digital media

Advancements in deep generative models have made it possible to synthesi...

Please sign up or login with your details

Forgot password? Click here to reset