DeepAI AI Chat
Log In Sign Up

Look who's not talking

11/30/2020
by   Youngki Kwon, et al.
0

The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/17/2021

Look Who's Talking: Active Speaker Detection in the Wild

In this work, we present a novel audio-visual dataset for active speaker...
06/23/2021

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...
12/02/2019

Speaker detection in the wild: Lessons learned from JSALT 2019

This paper presents the problems and solutions addressed at the JSALT wo...
08/10/2020

Barometers Can Hear, and Sense Finger Taps

Most modern smartphones are equipped with a barometer to sample air pres...
07/15/2021

Improving Security in McAdams Coefficient-Based Speaker Anonymization by Watermarking Method

Speaker anonymization aims to suppress speaker individuality to protect ...
10/29/2019

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Speech signals are complex composites of various information, including ...
12/01/2020

A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

This paper proposes a unified deep speaker embedding framework for model...