DeepAI AI Chat
Log In Sign Up

Look who's not talking

by   Youngki Kwon, et al.

The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.


page 1

page 2

page 3

page 4


Look Who's Talking: Active Speaker Detection in the Wild

In this work, we present a novel audio-visual dataset for active speaker...

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...

Speaker detection in the wild: Lessons learned from JSALT 2019

This paper presents the problems and solutions addressed at the JSALT wo...

Barometers Can Hear, and Sense Finger Taps

Most modern smartphones are equipped with a barometer to sample air pres...

Improving Security in McAdams Coefficient-Based Speaker Anonymization by Watermarking Method

Speaker anonymization aims to suppress speaker individuality to protect ...

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Speech signals are complex composites of various information, including ...

A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

This paper proposes a unified deep speaker embedding framework for model...