Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

07/06/2020
by   Khoa Nguyen, et al.
0

Audio captioning is the task of automatically creating a textual description for the contents of a general audio signal. Typical audio captioning methods rely on deep neural networks (DNNs), where the target of the DNN is to map the input audio sequence to an output sequence of words, i.e. the caption. Though, the length of the textual description is considerably less than the length of the audio signal, for example 10 words versus some thousands of audio feature vectors. This clearly indicates that an output word corresponds to multiple input feature vectors. In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence. We employ a sequence-to-sequence method, which uses a fixed-length vector as an output from the encoder, and we apply temporal sub-sampling between the RNNs of the encoder. We evaluate the benefit of our approach by employing the freely available dataset Clotho and we evaluate the impact of different factors of temporal sub-sampling. Our results show an improvement to all considered metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Automated audio captioning (AAC) is a novel task, where a method takes a...
research
10/21/2019

Clotho: An Audio Captioning Dataset

Audio captioning is the novel task of general audio content description ...
research
06/27/2020

Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation

Automated audio captioning is machine listening task whose goal is to de...
research
06/30/2017

Automated Audio Captioning with Recurrent Neural Networks

We present the first approach to automated audio captioning. We employ a...
research
07/01/2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...
research
07/16/2021

Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach

Automated audio captioning (AAC) is the task of automatically creating t...
research
11/20/2022

An Algorithm for Routing Vectors in Sequences

We propose a routing algorithm that takes a sequence of vectors and comp...

Please sign up or login with your details

Forgot password? Click here to reset