What does a Car-ssette tape tell?

05/31/2019
by   Xuenan Xu, et al.
0

Captioning has attracted much attention in image and video understanding while little work examines audio captioning. This paper contributes a manually-annotated dataset on car scene, in extension to a previously published hospital audio captioning dataset. An encoder-decoder model with pretrained word embeddings and additional sentence loss is proposed. This current model can accelerate the training process and generate semantically correct but unseen unique sentences. We test the model on the current car dataset, previous Hospital Dataset and the Joint Dataset, indicating its generalization capability across different scenes. Further, we make an effort to provide a better objective evaluation metric, namely the BERT similarity score. It compares the semantic-level similarity and compensates for drawbacks of N-gram based metrics like BLEU, namely high scores for word-similar sentences. This new metric demonstrates higher correlation with human evaluation. However, though detailed audio captions can now be automatically generated, human annotations still outperform model captions in many aspects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2021

Audio Captioning Using Sound Event Detection

This technical report proposes an audio captioning system for DCASE 2021...
research
02/25/2019

Audio Caption: Listen and Tell

Increasing amount of research has shed light on machine perception of au...
research
10/10/2021

Can Audio Captions Be Evaluated with Image Caption Metrics?

Automated audio captioning aims at generating textual descriptions for a...
research
06/24/2022

Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users

Deaf and hard of hearing individuals regularly rely on captioning while ...
research
10/29/2022

Improving Audio Captioning Using Semantic Similarity Metrics

Audio captioning quality metrics which are typically borrowed from the m...
research
09/06/2023

Detecting False Alarms and Misses in Audio Captions

Metrics to evaluate audio captions simply provide a score without much e...
research
08/23/2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

We proposed Audio Difference Captioning (ADC) as a new extension task of...

Please sign up or login with your details

Forgot password? Click here to reset