Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

12/14/2020
by   Yuma Koizumi, et al.
0

The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted into such a language model, we utilize guidance captions retrieved from a training dataset based on similarities that may exist in different audio. Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions. Experimental results show that (i) the proposed method has succeeded to use a pre-trained language model for audio captioning, and (ii) the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2022

Leveraging Pre-trained BERT for Audio Captioning

Audio captioning aims at using natural language to describe the content ...
research
03/30/2023

Prefix tuning for automated audio captioning

Audio captioning aims to generate text descriptions from environmental s...
research
08/24/2022

Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

Lyric interpretations can help people understand songs and their lyrics ...
research
09/07/2023

Zero-Shot Audio Captioning via Audibility Guidance

The task of audio captioning is similar in essence to tasks such as imag...
research
08/12/2022

An investigation on selecting audio pre-trained models for audio captioning

Audio captioning is a task that generates description of audio based on ...
research
10/14/2021

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

Automated audio captioning (AAC) is the task of automatically generating...
research
11/06/2022

I Hear Your True Colors: Image Guided Audio Generation

We propose Im2Wav, an image guided open-domain audio generation system. ...

Please sign up or login with your details

Forgot password? Click here to reset