Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

10/12/2021
by   Zhongjie Ye, et al.
0

Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic signal processing and natural language processing to generate human-readable sentences for audio clips. The current models are generally based on the neural encoder-decoder architecture, and their decoder mainly uses acoustic information that is extracted from the CNN-based encoder. However, they have ignored semantic information that could help the AAC model to generate meaningful descriptions. This paper proposes a novel approach for automated audio captioning based on incorporating semantic and acoustic information. Specifically, our audio captioning model consists of two sub-modules. (1) The pre-trained keyword encoder utilizes pre-trained ResNet38 to initialize its parameters, and then it is trained by extracted keywords as labels. (2) The multi-modal attention decoder adopts an LSTM-based decoder that contains semantic and acoustic attention modules. Experiments demonstrate that our proposed model achieves state-of-the-art performance on the Clotho dataset. Our code can be found at https://github.com/WangHelin1997/DCASE2021_Task6_PKU

READ FULL TEXT
research
03/06/2022

Leveraging Pre-trained BERT for Audio Captioning

Audio captioning aims at using natural language to describe the content ...
research
04/18/2022

Automated Audio Captioning using Audio Event Clues

Audio captioning is an important research area that aims to generate mea...
research
04/07/2023

Graph Attention for Automated Audio Captioning

State-of-the-art audio captioning methods typically use the encoder-deco...
research
02/23/2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Automated audio captioning (AAC) aims at generating summarizing descript...
research
10/14/2021

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

Automated audio captioning (AAC) is the task of automatically generating...
research
07/01/2020

A Transformer-based Audio Captioning Model with Keyword Estimation

One of the problems with automated audio captioning (AAC) is the indeter...
research
01/28/2022

Automatic Audio Captioning using Attention weighted Event based Embeddings

Automatic Audio Captioning (AAC) refers to the task of translating audio...

Please sign up or login with your details

Forgot password? Click here to reset