Local Information Assisted Attention-free Decoder for Audio Captioning

01/10/2022
by   Feiyang Xiao, et al.
0

Automated audio captioning (AAC) aims to describe audio data with captions using natural language. Most existing AAC methods adopt an encoder-decoder structure, where the attention based mechanism is a popular choice in the decoder (e.g., Transformer decoder) for predicting captions from audio features. Such attention based decoders can capture the global information from the audio features, however, their ability in extracting local information can be limited, which may lead to degraded quality in the generated captions. In this paper, we present an AAC method with an attention-free decoder, where an encoder based on PANNs is employed for audio feature extraction, and the attention-free decoder is designed to introduce local information. The proposed method enables the effective use of both global and local information from audio signals. Experiments show that our method outperforms the state-of-the-art methods with the standard attention based decoder in Task 6 of the DCASE 2021 Challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2021

Audio Captioning Transformer

Audio captioning aims to automatically generate a natural language descr...
research
05/13/2021

Audio Captioning with Composition of Acoustic and Semantic Information

Generating audio captions is a new research area that combines audio and...
research
05/30/2023

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Automated audio captioning (AAC) which generates textual descriptions of...
research
10/10/2022

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Automated audio captioning (AAC) aims to describe the content of an audi...
research
04/07/2023

Graph Attention for Automated Audio Captioning

State-of-the-art audio captioning methods typically use the encoder-deco...
research
09/20/2022

Language-based Audio Retrieval Task in DCASE 2022 Challenge

Language-based audio retrieval is a task, where natural language textual...
research
02/23/2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Automated audio captioning (AAC) aims at generating summarizing descript...

Please sign up or login with your details

Forgot password? Click here to reset