Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

06/05/2021
by   Yuanyuan Bao, et al.
0

Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channel dataset, LibriPhone is made by simultaneously replaying pairs of utterances from LibriSpeech by two professional artificial heads and recording by two built-in microphones of the mobile. Then, we propose a lightweight time-frequency domain separation model, LSTM-Former, which is based on the LSTM framework with source-to-noise ratio (SI-SNR) loss. For the experiments on Libri-Phone, we explore the dual-channel LSTMFormer model and a single-channel version by a random single channel of Libri-Phone. Experimental result shows that the dual-channel LSTM-Former outperforms the single-channel LSTMFormer with relative 25 solution for the TSS task on mobile devices, playing back and recording multiple data sources in real application scenarios for getting dual-channel real data can assist the lightweight model to achieve higher performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2019

A comprehensive study of speech separation: spectrogram vs waveform separation

Speech separation has been studied widely for single-channel close-talk ...
research
12/30/2021

Feature extraction with mel scale separation method on noise audio recordings

This paper focuses on improving the accuracy of noise audio recordings. ...
research
03/15/2023

Beamformer-Guided Target Speaker Extraction

We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method...
research
03/07/2023

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Since diarization and source separation of meeting data are closely rela...
research
03/21/2023

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Personalized TTS is an exciting and highly desired application that allo...
research
11/26/2018

Robustness against the channel effect in pathological voice detection

Many people are suffering from voice disorders, which can adversely affe...
research
09/21/2020

Using Inaudible Audio and Voice Assistants to Transmit Sensitive Data over Telephony

New security and privacy concerns arise due to the growing popularity of...

Please sign up or login with your details

Forgot password? Click here to reset