A Review of Speaker Diarization: Recent Advances with Deep Learning

01/24/2021
by   Tae Jin Park, et al.
3

Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multi-speaker audio recordings to enable speaker adaptive processing, but also gained its own value as a stand-alone application over time to provide speaker-specific meta information for downstream tasks such as audio retrieval. More recently, with the rise of deep learning technology that has been a driving force to revolutionary changes in research and practices across speech application domains in the past decade, more rapid advancements have been made for speaker diarization. In this paper, we review not only the historical development of speaker diarization technology but also the recent advancements in neural speaker diarization approaches. We also discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these two components to be complementary to each other. By considering such exciting technical trends, we believe that it is a valuable contribution to the community to provide a survey work by consolidating the recent developments with neural methods and thus facilitating further progress towards a more efficient speaker diarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2019

Deep learning methods in speaker recognition: a review

This paper summarizes the applied deep learning practices in the field o...
research
06/14/2023

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

Speaker-independent VSR is a complex task that involves identifying spok...
research
12/26/2021

Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data

Speech recognition is very challenging in student learning environments ...
research
05/05/2016

The IBM Speaker Recognition System: Recent Advances and Error Analysis

We present the recent advances along with an error analysis of the IBM s...
research
12/20/2017

Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works

With the exponential increase in the amount of digital information over ...
research
10/25/2021

Lhotse: a speech data representation library for the modern deep learning ecosystem

Speech data is notoriously difficult to work with due to a variety of co...
research
07/23/2020

Version Control of Speaker Recognition Systems

This paper discusses one of the most challenging practical engineering p...

Please sign up or login with your details

Forgot password? Click here to reset