Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay

05/17/2022
by   Arash Shahmansoori, et al.
0

Voice assistants record sound and can overhear conversations. Thus, a consent management mechanism is desirable such that users can express their wish to be recorded or not. Consent management can be implemented using speaker recognition; users that do not give consent enrol their voice and all further recordings of these users is subsequently not processed. Building speaker recognition based consent management is challenging due to the dynamic nature of the problem, required scalability for large number of speakers, and need for fast speaker recognition with high accuracy. This paper describes a speaker recognition based consent management system addressing the aforementioned challenges. A fully supervised batch contrastive learning is applied to learn the underlying speaker equivariance inductive bias during the training on the set of speakers noting recording dissent. Speakers that do not provide consent are grouped in buckets which are trained continuously. The embeddings are contrastively learned for speakers in their buckets during training and act later as a replay buffer for classification. The buckets are progressively registered during training and a novel multi-strided random sampling of the contrastive embedding replay buffer is proposed. Buckets are contrastively trained for a few steps only in each iteration and replayed for classification progressively leading to fast convergence. An algorithm for fast and dynamic registration and removal of speakers in buckets is described. The evaluation results show that the proposed approach provides the desired fast and dynamic solution for consent management and outperforms existing approaches in terms of convergence speed and adaptive capabilities as well as verification performance during inference.

READ FULL TEXT

page 20

page 21

page 23

page 25

research
02/06/2019

Centroid-based deep metric learning for speaker recognition

Speaker embedding models that utilize neural networks to map utterances ...
research
07/17/2018

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

The Multitarget Challenge aims to assess how well current speech technol...
research
05/26/2020

Adversarial Contrastive Predictive Coding for Unsupervised Learning of Disentangled Representations

In this work we tackle disentanglement of speaker and content related va...
research
04/07/2019

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation

The Multi-target Challenge aims to assess how well current speech techno...
research
01/24/2022

Bias in Automated Speaker Recognition

Automated speaker recognition uses data processing to identify speakers ...
research
08/26/2020

FCN Approach for Dynamically Locating Multiple Speakers

In this paper, we present a deep neural network-based online multi-speak...

Please sign up or login with your details

Forgot password? Click here to reset