Graph Attention for Automated Audio Captioning

04/07/2023
by   Feiyang Xiao, et al.
0

State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained audio neural networks (PANNs) as encoders for feature extraction. However, the convolution operation used in PANNs is limited in capturing the long-time dependencies within an audio signal, thereby leading to potential performance degradation in audio captioning. This letter presents a novel method using graph attention (GraphAC) for encoder-decoder based audio captioning. In the encoder, a graph attention module is introduced after the PANNs to learn contextual association (i.e. the dependency among the audio features over different time frames) through an adjacency graph, and a top-k mask is used to mitigate the interference from noisy nodes. The learnt contextual association leads to a more effective feature representation with feature node aggregation. As a result, the decoder can predict important semantic information about the acoustic scene and events based on the contextual associations learned from the audio signal. Experimental results show that GraphAC outperforms the state-of-the-art methods with PANNs as the encoders, thanks to the incorporation of the graph attention module into the encoder for capturing the long-time dependencies within the audio signal. The source code is available at https://github.com/LittleFlyingSheep/GraphAC.

READ FULL TEXT

page 1

page 4

research
07/21/2021

Audio Captioning Transformer

Audio captioning aims to automatically generate a natural language descr...
research
01/10/2022

Local Information Assisted Attention-free Decoder for Audio Captioning

Automated audio captioning (AAC) aims to describe audio data with captio...
research
10/12/2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

Automated audio captioning (AAC) has developed rapidly in recent years, ...
research
05/30/2023

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Automated audio captioning (AAC) which generates textual descriptions of...
research
03/30/2018

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Recently, caption generation with an encoder-decoder framework has been ...
research
04/18/2022

Caption Feature Space Regularization for Audio Captioning

Audio captioning aims at describing the content of audio clips with huma...
research
12/15/2021

Dense Video Captioning Using Unsupervised Semantic Information

We introduce a method to learn unsupervised semantic visual information ...

Please sign up or login with your details

Forgot password? Click here to reset