A Transformer-based Audio Captioning Model with Keyword Estimation

07/01/2020
by   Yuma Koizumi, et al.
0

One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-of-the-art performance and successfully estimated both the caption and its keywords.

READ FULL TEXT
research
07/01/2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

This technical report describes the system participating to the Detectio...
research
07/04/2022

CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer

Customized keyword spotting (KWS) has great potential to be deployed on ...
research
06/04/2022

Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning

In this paper, we propose an algorithm, Epochal Difficult Captions, to s...
research
10/21/2016

Exploitation of Semantic Keywords for Malicious Event Classification

Learning an event classifier is challenging when the scenes are semantic...
research
12/02/2019

A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection

Microblogging platforms such as Twitter are increasingly being used in e...
research
10/12/2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

Automated audio captioning (AAC) has developed rapidly in recent years, ...
research
11/18/2022

Impact of visual assistance for automated audio captioning

We study the impact of visual assistance for automated audio captioning....

Please sign up or login with your details

Forgot password? Click here to reset