The sound of my voice: speaker representation loss for target voice separation

11/06/2019
by   Seongkyu Mun, et al.
0

Research on content and style representations has been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it a speaker representation loss (SRL). The objective is to extract the 'sound of my voice' from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the content of target speaker and source separation output. We also propose triplet SRL as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source

Nowadays, the task of sound source separation is an interesting task for...
research
10/29/2021

SA-SDR: A novel loss function for separation of meeting style data

Many state-of-the-art neural network-based source separation systems use...
research
06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...
research
11/06/2022

Preserving background sound in noise-robust voice conversion via multi-task learning

Background sound is an informative form of art that is helpful in provid...
research
10/30/2019

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

We present a multi-channel database of overlapping speech for training, ...
research
12/10/2022

GPU-accelerated Guided Source Separation for Meeting Transcription

Guided source separation (GSS) is a type of target-speaker extraction me...
research
07/20/2022

Spatial Aware Multi-Task Learning Based Speech Separation

During the Covid, online meetings have become an indispensable part of o...

Please sign up or login with your details

Forgot password? Click here to reset