CAPTDURE: Captioned Sound Dataset of Single Sources

05/28/2023
by   Yuki Okamoto, et al.
0

In conventional studies on environmental sound separation and synthesis using captions, datasets consisting of multiple-source sounds with their captions were used for model training. However, when we collect the captions for multiple-source sound, it is not easy to collect detailed captions for each sound source, such as the number of sound occurrences and timbre. Therefore, it is difficult to extract only the single-source target sound by the model-training method using a conventional captioned sound dataset. In this work, we constructed a dataset with captions for a single-source sound named CAPTDURE, which can be used in various tasks such as environmental sound separation and synthesis. Our dataset consists of 1,044 sounds and 4,902 captions. We evaluated the performance of environmental sound extraction using our dataset. The experimental results show that the captions for single-source sounds are effective in extracting only the single-source target sound from the mixture sound.

READ FULL TEXT

page 2

page 4

research
12/01/2021

Environmental Sound Extraction Using Onomatopoeia

Onomatopoeia, which is a character sequence that phonetically imitates a...
research
07/09/2020

RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis

Environmental sound synthesis is a technique for generating a natural en...
research
04/29/2023

Environmental sound conversion from vocal imitations and sound event labels

One way of expressing an environmental sound is using vocal imitations, ...
research
08/27/2019

Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion

Synthesizing and converting environmental sounds have the potential for ...
research
03/22/2023

Dual-Quaternions: Theory and Applications in Sound

Sound is a fundamental and rich source of information; playing a key rol...
research
10/17/2022

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

We propose a method for synthesizing environmental sounds from visually ...
research
07/25/2022

ConceptBeam: Concept Driven Target Speech Extraction

We propose a novel framework for target speech extraction based on seman...

Please sign up or login with your details

Forgot password? Click here to reset