Audio Captioning using Gated Recurrent Units

06/05/2020
by   Ayşegül Özkaya Eren, et al.
0

Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. In this study, a novel deep network architecture with audio embeddings is presented to predict audio captions. Within the aim of extracting audio features in addition to log Mel energies, VGGish audio embedding model is used to explore the usability of audio embeddings in the audio captioning task. The proposed architecture encodes audio and text input modalities separately and combines them before the decoding stage. Audio encoding is conducted through Bi-directional Gated Recurrent Unit (BiGRU) while GRU is used for the text encoding phase. Following this, we evaluate our model by means of the newly published audio captioning performance dataset, namely Clotho, to compare the experimental results with the literature. Our experimental results show that the proposed BiGRU-based deep model outperforms the state of the art results.

READ FULL TEXT
research
05/13/2021

Audio Captioning with Composition of Acoustic and Semantic Information

Generating audio captions is a new research area that combines audio and...
research
10/04/2021

Audio Captioning Using Sound Event Detection

This technical report proposes an audio captioning system for DCASE 2021...
research
06/27/2020

Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation

Automated audio captioning is machine listening task whose goal is to de...
research
01/28/2022

Automatic Audio Captioning using Attention weighted Event based Embeddings

Automatic Audio Captioning (AAC) refers to the task of translating audio...
research
10/21/2020

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Automated audio captioning (AAC) is a novel task, where a method takes a...
research
09/18/2023

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

Data-driven approaches hold promise for audio captioning. However, the d...
research
06/30/2017

Automated Audio Captioning with Recurrent Neural Networks

We present the first approach to automated audio captioning. We employ a...

Please sign up or login with your details

Forgot password? Click here to reset