Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

by   Daiki Takeuchi, et al.

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.



There are no comments yet.


page 3

page 4

page 5


Evaluation of post-processing algorithms for polyphonic sound event detection

Sound event detection (SED) aims at identifying audio events (audio tagg...

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...

Effect of Post-processing on Contextualized Word Representations

Post-processing of static embedding has beenshown to improve their perfo...

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

This paper describes our approach to the Toxic Spans Detection problem (...

Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection

Lexical semantic change detection is a new and innovative research field...

A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

Assessing the proper difficulty levels of reading materials or texts in ...

Towards transformation-resilient provenance detection of digital media

Advancements in deep generative models have made it possible to synthesi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.