Speaker Diarization as a Fully Online Learning Problem in MiniVox

06/08/2020
by   Baihan Lin, et al.
1

We proposed a novel AI framework to conduct real-time multi-speaker diarization and recognition without prior registration and pretraining in a fully online learning setting. Our contributions are two-fold. First, we proposed a new benchmark to evaluate the rarely studied fully online speaker diarization problem. We built upon existing datasets of real world utterances to automatically curate MiniVox, an experimental environment which generates infinite configurations of continuous multi-speaker speech stream. Secondly, we considered the practical problem of online learning with episodically revealed rewards and introduced a solution based on semi-supervised and self-supervised learning methods. Lastly, we provided a workable web-based recognition system which interactively handles the cold start problem of new user's addition by transferring representations of old arms to new ones with an extendable contextual bandit. We demonstrated that our proposed method obtained robust performance in the online MiniVox framework.

READ FULL TEXT
research
09/17/2020

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

We considered a novel practical problem of online learning with episodic...
research
02/21/2023

A Reinforcement Learning Framework for Online Speaker Diarization

Speaker diarization is a task to label an audio or video recording with ...
research
10/23/2020

Online Semi-Supervised Learning with Bandit Feedback

We formulate a new problem at the intersectionof semi-supervised learnin...
research
11/06/2020

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used ...
research
02/06/2021

Speaker attribution with voice profiles by graph-based semi-supervised learning

Speaker attribution is required in many real-world applications, such as...
research
06/18/2022

Semi-supervised Time Domain Target Speaker Extraction with Attention

In this work, we propose Exformer, a time-domain architecture for target...
research
01/27/2018

IRSA Transmission Optimization via Online Learning

In this work, we propose a new learning framework for optimising transmi...

Please sign up or login with your details

Forgot password? Click here to reset