Audio Difference Learning for Audio Captioning

09/15/2023
by   Tatsuya Komatsu, et al.
0

This study introduces a novel training paradigm, audio difference learning, for improving audio captioning. The fundamental concept of the proposed learning method is to create a feature representation space that preserves the relationship between audio, enabling the generation of captions that detail intricate audio information. This method employs a reference audio along with the input audio, both of which are transformed into feature representations via a shared encoder. Captions are then generated from these differential features to describe their differences. Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference. This results in the difference between the mixed audio and the reference audio reverting back to the original input audio. This allows the original input's caption to be used as the caption for their difference, eliminating the need for additional annotations for the differences. In the experiments using the Clotho and ESC50 datasets, the proposed method demonstrated an improvement in the SPIDEr score by 7 methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2023

RECAP: Retrieval-Augmented Audio Captioning

We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and eff...
research
08/23/2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

We proposed Audio Difference Captioning (ADC) as a new extension task of...
research
09/06/2023

Detecting False Alarms and Misses in Audio Captions

Metrics to evaluate audio captions simply provide a score without much e...
research
03/15/2023

Blind Estimation of Audio Processing Graph

Musicians and audio engineers sculpt and transform their sounds by conne...
research
11/14/2022

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

Automatic Audio Captioning (AAC) is the task that aims to describe an au...
research
04/18/2022

Caption Feature Space Regularization for Audio Captioning

Audio captioning aims at describing the content of audio clips with huma...
research
02/25/2019

Audio Caption: Listen and Tell

Increasing amount of research has shed light on machine perception of au...

Please sign up or login with your details

Forgot password? Click here to reset