Deep Ad-hoc Beamforming
Deep learning based speech enhancement methods face two problems. First, their performance is strongly affected by the distance between the speech source and the microphones. Second, unlike conventional methods, deep-learning-based multichannel methods do not show significant performance improvement over their single-channel counterpart. To address the above problem, we propose deep ad-hoc beamforming---the first deep-learning-based multichannel speech enhancement method in an ad-hoc microphone array. It serves for scenarios where the microphones are placed randomly in a room and work collaboratively. It aims to pick up speech signals with equally good quality in a range where the array covers. Its core idea is to reweight the estimated speech signals when conducting beamforming, where the weights produced by a neural network are an estimation of the signal-to-noise ratios at the microphone array. We conducted an experiment in a scenario where the location of the speech source is far-field, random, and blind to the microphones. Results show that our method outperforms representative deep-learning-based speech enhancement methods by a large margin.
READ FULL TEXT