An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning

08/05/2021
by   Xinhao Mei, et al.
0

Automated audio captioning aims to use natural language to describe the content of audio data. This paper presents an audio captioning system with an encoder-decoder architecture, where the decoder predicts words based on audio features extracted by the encoder. To improve the proposed system, transfer learning from either an upstream audio-related task or a large in-domain dataset is introduced to mitigate the problem induced by data scarcity. Besides, evaluation metrics are incorporated into the optimization of the model with reinforcement learning, which helps address the problem of “exposure bias” induced by “teacher forcing” training strategy and the mismatch between the evaluation metrics and the loss function. The resulting system was ranked 3rd in DCASE 2021 Task 6. Ablation studies are carried out to investigate how much each element in the proposed system can contribute to final performance. The results show that the proposed techniques significantly improve the scores of the evaluation metrics, however, reinforcement learning may impact adversely on the quality of the generated captions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2021

Audio Captioning with Composition of Acoustic and Semantic Information

Generating audio captions is a new research area that combines audio and...
research
04/06/2023

Efficient Audio Captioning Transformer with Patchout and Text Guidance

Automated audio captioning is multi-modal translation task that aim to g...
research
07/21/2021

CL4AC: A Contrastive Loss for Audio Captioning

Automated Audio captioning (AAC) is a cross-modal translation task that ...
research
09/01/2023

CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

Automated Audio Captioning (AAC) involves generating natural language de...
research
06/30/2017

Automated Audio Captioning with Recurrent Neural Networks

We present the first approach to automated audio captioning. We employ a...
research
02/23/2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Automated audio captioning (AAC) aims at generating summarizing descript...
research
05/12/2022

Automated Audio Captioning: an Overview of Recent Progress and New Challenges

Automated audio captioning is a cross-modal translation task that aims t...

Please sign up or login with your details

Forgot password? Click here to reset