Naoya Takahashi

research

∙ 06/15/2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

While direction of arrival (DOA) of sound events is generally estimated ...

5 Kazuki Shimada, et al. ∙

research

∙ 05/24/2023

Iteratively Improving Speech Recognition and Voice Conversion

Many existing works on voice conversion (VC) tasks use automatic speech ...

0 Mayank Kumar Singh, et al. ∙

research

∙ 05/13/2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

This paper presents the crossing scheme (X-scheme) for improving the per...

0 Ryosuke Sawata, et al. ∙

research

∙ 02/27/2023

Cross-modal Face- and Voice-style Transfer

Image-to-image translation and voice conversion enable the generation of...

2 Naoya Takahashi, et al. ∙

research

∙ 02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...

6 Nirmesh Shah, et al. ∙

research

∙ 12/14/2022

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Recent years have seen progress beyond domain-specific sound separation ...

6 Hao-Wen Dong, et al. ∙

research

∙ 10/20/2022

Robust One-Shot Singing Voice Conversion

Many existing works on singing voice conversion (SVC) require clean reco...

0 Naoya Takahashi, et al. ∙

research

∙ 10/14/2022

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Recent progress in deep generative models has improved the quality of ne...

5 Naoya Takahashi, et al. ∙

research

∙ 10/11/2022

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

In this paper we propose a novel generative approach, DiffRoll, to tackl...

16 Kin Wai Cheuk, et al. ∙

research

∙ 08/26/2022

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to...

6 Shrutina Agarwal, et al. ∙

research

∙ 06/04/2022

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (ST...

0 Archontis Politis, et al. ∙

research

∙ 10/14/2021

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

Sound event localization and detection (SELD) involves identifying the d...

0 Kazuki Shimada, et al. ∙

research

∙ 10/13/2021

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

Recording and annotating real sound events for a sound event localizatio...

0 Yuichiro Koyama, et al. ∙

research

∙ 10/11/2021

Amicable examples for informed source separation

This paper deals with the problem of informed source separation (ISS), w...

0 Naoya Takahashi, et al. ∙

research

∙ 10/11/2021

Source Mixing and Separation Robust Audio Steganography

Audio steganography aims at concealing secret information in carrier aud...

0 Naoya Takahashi, et al. ∙

research

∙ 06/21/2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

This report describes our systems submitted to the DCASE2021 challenge t...

0 Kazuki Shimada, et al. ∙

research

∙ 02/17/2021

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Automatic transcription of monophonic/polyphonic music is a challenging ...

23 Sakya Basak, et al. ∙

research

∙ 01/18/2021

Hierarchical disentangled representation learning for singing voice conversion

Conventional singing voice conversion (SVC) methods often suffer from op...

20 Naoya Takahashi, et al. ∙

research

∙ 11/21/2020

Densely connected multidilated convolutional networks for dense prediction tasks

Tasks that involve high-resolution dense prediction require a modeling o...

8 Naoya Takahashi, et al. ∙

research

∙ 10/29/2020

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

Neural-network (NN)-based methods show high performance in sound event l...

0 Kazuki Shimada, et al. ∙

research

∙ 10/07/2020

Adversarial attacks on audio source separation

Despite the excellent performance of neural-network-based audio source s...

0 Naoya Takahashi, et al. ∙

research

∙ 10/05/2020

D3Net: Densely connected multidilated DenseNet for music source separation

Music source separation involves a large input field to model a long-ter...

0 Naoya Takahashi, et al. ∙

research

∙ 06/22/2020

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net

Our systems submitted to the DCASE2020 task 3: Sound Event Localization ...

0 Kazuki Shimada, et al. ∙

research

∙ 11/29/2019

Improving Voice Separation by Incorporating End-to-end Speech Recognition

Despite recent advances in voice separation methods, many challenges rem...

0 Naoya Takahashi, et al. ∙

research

∙ 04/05/2019

Recursive speech separation for unknown number of speakers

In this paper we propose a method of single-channel speaker-independent ...

0 Naoya Takahashi, et al. ∙

research

∙ 05/07/2018

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Deep neural networks have become an indispensable technique for audio so...

0 Naoya Takahashi, et al. ∙

research

∙ 06/29/2017

Multi-scale Multi-band DenseNets for Audio Source Separation

This paper deals with the problem of audio source separation. To handle ...

0 Naoya Takahashi, et al. ∙

research

∙ 01/03/2017

AENet: Learning Deep Audio Features for Video Analysis

We propose a new deep network for audio event recognition, called AENet....

0 Naoya Takahashi, et al. ∙

research

∙ 06/15/2016

Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

Phonemic or phonetic sub-word units are the most commonly used atomic el...

0 Naoya Takahashi, et al. ∙

Naoya Takahashi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro