Continuous Speech Separation with Conformer

by   Sanyuan Chen, et al.

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5 utterance-wise evaluation and a 15.4 evaluation.


page 1

page 2

page 3

page 4


Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous s...

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Speech separation is an important problem in speech processing, which ta...

Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Time-domain Transformer neural networks have proven their superiority in...

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Continuous speech separation using a microphone array was shown to be pr...

Continuous Speech Separation with Recurrent Selective Attention Network

While permutation invariant training (PIT) based continuous speech separ...

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speaker Separation

We propose multi-microphone complex spectral mapping, a simple way of ap...

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

With its strong modeling capacity that comes from a multi-head and multi...