Audiovisual speaker diarization of TV series

12/18/2018
by   Xavier Bost, et al.
0

Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2018

Détection de locuteurs dans les séries TV

Speaker diarization of audio streams turns out to be particularly challe...
research
07/17/2015

Deep Multimodal Speaker Naming

Automatic speaker naming is the problem of localizing as well as identif...
research
12/18/2018

Constrained speaker diarization of TV series based on visual patterns

Speaker diarization, usually denoted as the 'who spoke when' task, turns...
research
03/23/2023

Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

In TV services, dialogue level personalization is key to meeting user pr...
research
03/30/2022

Using Active Speaker Faces for Diarization in TV shows

Speaker diarization is one of the critical components of computational m...
research
10/26/2022

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

In this paper, we propose a deep learning based multi-speaker direction ...
research
02/17/2020

Serial Speakers: a Dataset of TV Series

For over a decade, TV series have been drawing increasing interest, both...

Please sign up or login with your details

Forgot password? Click here to reset