Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

02/23/2021
by   Xuenan Xu, et al.
0

Automated audio captioning (AAC) aims at generating summarizing descriptions for audio clips. Multitudinous concepts are described in an audio caption, ranging from local information such as sound events to global information like acoustic scenery. Currently, the mainstream paradigm for AAC is the end-to-end encoder-decoder architecture, expecting the encoder to learn all levels of concepts embedded in the audio automatically. This paper first proposes a topic model for audio descriptions, comprehensively analyzing the hierarchical audio topics that are commonly covered. We then explore a transfer learning scheme to access local and global information. Two source tasks are identified to respectively represent local and global information, being Audio Tagging (AT) and Acoustic Scene Classification (ASC). Experiments are conducted on the AAC benchmark dataset Clotho and Audiocaps, amounting to a vast increase in all eight metrics with topic transfer learning. Further, it is discovered that local information and abstract representation learning are more crucial to AAC than global information and temporal relationship learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

Automated Audio Captioning using Audio Event Clues

Audio captioning is an important research area that aims to generate mea...
research
01/10/2022

Local Information Assisted Attention-free Decoder for Audio Captioning

Automated audio captioning (AAC) aims to describe audio data with captio...
research
10/12/2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

Automated audio captioning (AAC) has developed rapidly in recent years, ...
research
05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...
research
08/05/2021

An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning

Automated audio captioning aims to use natural language to describe the ...
research
01/28/2022

Automatic Audio Captioning using Attention weighted Event based Embeddings

Automatic Audio Captioning (AAC) refers to the task of translating audio...
research
02/25/2019

Audio Caption: Listen and Tell

Increasing amount of research has shed light on machine perception of au...

Please sign up or login with your details

Forgot password? Click here to reset