BeamTransformer: Microphone Array-based Overlapping Speech Detection

09/09/2021
by   Siqi Zheng, et al.
0

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have been internally separated into different beams.

READ FULL TEXT
research
05/25/2018

Relative Transfer Function Estimation Exploiting Spatially Separated Microphones in an Incoherent Noise Field

Many multi-microphone speech enhancement algorithms require the relative...
research
05/25/2018

Relative Transfer Function Estimation Exploiting Spatially Separated Microphones in a Diffuse Noise Field

Many multi-microphone speech enhancement algorithms require the relative...
research
09/14/2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
research
04/13/2019

Low-Latency Speaker-Independent Continuous Speech Separation

Speaker independent continuous speech separation (SI-CSS) is a task of c...
research
02/13/2021

Multi-Channel Speech Enhancement using Graph Neural Networks

Multi-channel speech enhancement aims to extract clean speech from a noi...
research
04/30/2019

Incorporating Symbolic Sequential Modeling for Speech Enhancement

In a noisy environment, a lossy speech signal can be automatically resto...
research
11/16/2022

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

Personalized speech enhancement has been a field of active research for ...

Please sign up or login with your details

Forgot password? Click here to reset