Improving speaker turn embedding by crossmodal transfer learning from face embedding

07/10/2017
by   Nam Le, et al.
0

Learning speaker turn embeddings has shown considerable improvement in situations where conventional speaker modeling approaches fail. However, this improvement is relatively limited when compared to the gain observed in face embedding learning, which has been proven very successful for face verification and clustering tasks. Assuming that face and voices from the same identities share some latent properties (like age, gender, ethnicity), we propose three transfer learning approaches to leverage the knowledge from the face domain (learned from thousands of images and identities) for tasks in the speaker domain. These approaches, namely target embedding transfer, relative distance transfer, and clustering structure transfer, utilize the structure of the source face embedding space at different granularities to regularize the target speaker turn embedding space as optimizing terms. Our methods are evaluated on two public broadcast corpora and yield promising advances over competitive baselines in verification and audio clustering tasks, especially when dealing with short speaker utterances. The analysis of the results also gives insight into characteristics of the embedding spaces and shows their potential applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification

In far-field speaker verification, the performance of speaker embeddings...
research
10/27/2020

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding spea...
research
08/12/2019

A Study on Angular Based Embedding Learning for Text-independent Speaker Verification

Learning a good speaker embedding is important for many automatic speake...
research
09/23/2021

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

In this paper, we present a novel speaker diarization system for streami...
research
02/28/2022

Magnitude-aware Probabilistic Speaker Embeddings

Recently, hyperspherical embeddings have established themselves as a dom...
research
09/30/2018

Modeling Uncertainty with Hedged Instance Embedding

Instance embeddings are an efficient and versatile image representation ...
research
11/22/2021

Component Transfer Learning for Deep RL Based on Abstract Representations

In this work we investigate a specific transfer learning approach for de...

Please sign up or login with your details

Forgot password? Click here to reset