While direction of arrival (DOA) of sound events is generally estimated ...
Many existing works on voice conversion (VC) tasks use automatic speech
...
This paper presents the crossing scheme (X-scheme) for improving the
per...
Image-to-image translation and voice conversion enable the generation of...
Primary goal of an emotional voice conversion (EVC) system is to convert...
Recent years have seen progress beyond domain-specific sound separation ...
Many existing works on singing voice conversion (SVC) require clean
reco...
Recent progress in deep generative models has improved the quality of ne...
In this paper we propose a novel generative approach, DiffRoll, to tackl...
In this paper, we propose a model to perform style transfer of speech to...
This report presents the Sony-TAu Realistic Spatial Soundscapes 2022
(ST...
Sound event localization and detection (SELD) involves identifying the
d...
Recording and annotating real sound events for a sound event localizatio...
This paper deals with the problem of informed source separation (ISS), w...
Audio steganography aims at concealing secret information in carrier aud...
This report describes our systems submitted to the DCASE2021 challenge t...
Automatic transcription of monophonic/polyphonic music is a challenging ...
Conventional singing voice conversion (SVC) methods often suffer from
op...
Tasks that involve high-resolution dense prediction require a modeling o...
Neural-network (NN)-based methods show high performance in sound event
l...
Despite the excellent performance of neural-network-based audio source
s...
Music source separation involves a large input field to model a long-ter...
Our systems submitted to the DCASE2020 task 3: Sound Event Localization ...
Despite recent advances in voice separation methods, many challenges rem...
In this paper we propose a method of single-channel speaker-independent
...
Deep neural networks have become an indispensable technique for audio so...
This paper deals with the problem of audio source separation. To handle ...
We propose a new deep network for audio event recognition, called AENet....
Phonemic or phonetic sub-word units are the most commonly used atomic
el...