TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

10/08/2021
by   Nithin Rao Koluguri, et al.
0

In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations. We employ 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (t-vector). TitaNet is a scalable architecture and achieves state-of-the-art performance on speaker verification task with an equal error rate (EER) of 0.68 trial file and also on speaker diarization tasks with diarization error rate (DER) of 1.73 Furthermore, we investigate various sizes of TitaNet and present a light TitaNet-S model with only 6M parameters that achieve near state-of-the-art results in diarization tasks.

READ FULL TEXT
research
05/14/2020

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Current speaker verification techniques rely on a neural network to extr...
research
10/27/2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification

We present AmberNet, a compact end-to-end neural network for Spoken Lang...
research
07/07/2021

MACCIF-TDNN: Multi aspect aggregation of channel and context interdependence features in TDNN-based speaker verification

Most of the recent state-of-the-art results for speaker verification are...
research
05/10/2021

Study on the temporal pooling used in deep neural networks for speaker verification

The x-vector architecture has recently achieved state-of-the-art results...
research
07/10/2020

Conditioned Time-Dilated Convolutions for Sound Event Detection

Sound event detection (SED) is the task of identifying sound events alon...
research
08/12/2021

Xi-Vector Embedding for Speaker Recognition

We present a Bayesian formulation for deep speaker embedding, wherein th...
research
03/31/2021

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices

We demonstrate that 1x1-convolutions in 1D time-channel separable convol...

Please sign up or login with your details

Forgot password? Click here to reset