Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

by   Ziye Yang, et al.

Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting and tracing a specific speaker in the ad-hoc array. However, this technique has not been explored yet. In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning. The algorithm contains three components. First, we propose a supervised channel selection framework based on speaker extraction, where the estimated utterance-level SNRs of the target speech are used as the basis for the channel selection. Second, we apply the selected channels to a deep learning based MVDR algorithm, where a single-channel speaker extraction algorithm is applied to each selected channel for estimating the mask of the target speech. We conducted an extensive experiment on a WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the proposed method.



There are no comments yet.


page 1


Frame-level multi-channel speaker verification with large-scale ad-hoc microphone arrays

Ad-hoc microphone arrays has recieved attention, in which the number and...

Deep Ad-hoc Beamforming

Deep learning based speech enhancement methods face two problems. First,...

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recog...

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

An important problem in ad-hoc microphone speech separation is how to gu...

Implicit Filter-and-sum Network for Multi-channel Speech Separation

Various neural network architectures have been proposed in recent years ...

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Recently, speech recognition with ad-hoc microphone arrays has received ...

Attention-based multi-channel speaker verification with ad-hoc microphone arrays

Recently, ad-hoc microphone array has been widely studied. Unlike tradit...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.