Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation

06/27/2020
by   Sergi Perez-Castanos, et al.
0

Automated audio captioning is machine listening task whose goal is to describe an audio using free text. An automated audio captioning system has to be implemented as it accepts an audio as input and outputs as textual description, that is, the caption of the signal. This task can be useful in many applications such as automatic content description or machine-to-machine interaction. In this work, an automatic audio captioning based on residual learning on the encoder phase is proposed. The encoder phase is implemented via different Residual Networks configurations. The decoder phase (create the caption) is run using recurrent layers plus attention mechanism. The audio representation chosen has been Gammatone. Results show that the framework proposed in this work surpass the baseline system in challenge results.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
07/08/2022

Automated Audio Captioning and Language-Based Audio Retrieval

This project involved participation in the DCASE 2022 Competition (Task ...
research
10/21/2019

Clotho: An Audio Captioning Dataset

Audio captioning is the novel task of general audio content description ...
research
10/21/2020

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Automated audio captioning (AAC) is a novel task, where a method takes a...
research
05/30/2023

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Automated audio captioning (AAC) which generates textual descriptions of...
research
06/05/2020

Audio Captioning using Gated Recurrent Units

Audio captioning is a recently proposed task for automatically generatin...
research
10/28/2022

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Audio captioning is the task of generating captions that describe the co...
research
07/06/2020

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

Audio captioning is the task of automatically creating a textual descrip...

Please sign up or login with your details

Forgot password? Click here to reset