Triplet Network with Attention for Speaker Diarization

08/04/2018
by   Huan Song, et al.
0

In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds simple fully connected networks for metric learning, thus not fully leveraging the modeling power of DNN architectures. This paper investigates the importance of learning effective representations from the sequences directly in metric learning pipelines for speaker diarization. More specifically, we propose to employ attention models to learn embeddings and the metric jointly in an end-to-end fashion. Experiments are conducted on the CALLHOME conversational speech corpus. The diarization results demonstrate that, besides providing a unified model, the proposed approach achieves improved performance when compared against existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2020

In defence of metric learning for speaker recognition

The objective of this paper is 'open-set' speaker recognition of unseen ...
research
11/01/2018

Designing an Effective Metric Learning Pipeline for Speaker Diarization

State-of-the-art speaker diarization systems utilize knowledge from exte...
research
07/24/2019

Cross-Attention End-to-End ASR for Two-Party Conversations

We present an end-to-end speech recognition model that learns interactio...
research
05/30/2022

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Deep neural networks (DNNs) have shown promising results for acoustic ec...
research
02/24/2021

Triplet loss based embeddings for forensic speaker identification in Spanish

With the advent of digital technology, it is more common that committed ...
research
04/13/2023

Leveraging triplet loss for unsupervised action segmentation

In this paper, we propose a novel fully unsupervised framework that lear...
research
08/12/2021

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Keyword Spotting (KWS) remains challenging to achieve the trade-off betw...

Please sign up or login with your details

Forgot password? Click here to reset