PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

01/24/2022
by   Takuya Yoshioka, et al.
0

This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions. Speech recognition-based evaluation was carried out by using real conversational recordings obtained with various smartphones. The proposed model yielded significant gains in word error rate with limited computational cost over systems using a block-online beamformer and a single distant microphone.

READ FULL TEXT
research
03/03/2021

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recog...
research
10/31/2018

Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices

Diarization of audio recordings from ad-hoc mobile devices using spatial...
research
03/29/2021

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Recently, speech recognition with ad-hoc microphone arrays has received ...
research
07/03/2023

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays

The performance of speaker verification degrades significantly in advers...
research
11/03/2018

Deep Ad-hoc Beamforming

Deep learning based speech enhancement methods face two problems. First,...
research
08/21/2023

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

We present LibriWASN, a data set whose design follows closely the LibriC...
research
04/06/2021

Learning to Rank Microphones for Distant Speech Recognition

Fully exploiting ad-hoc microphone networks for distant speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset