Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

03/23/2023
by   Matteo Torcoli, et al.
0

In TV services, dialogue level personalization is key to meeting user preferences and needs. When dialogue and background sounds are not separately available from the production stage, Dialogue Separation (DS) can estimate them to enable personalization. DS was shown to provide clear benefits for the end user. Still, the estimated signals are not perfect, and some leakage can be introduced. This is undesired, especially during passages without dialogue. We propose to combine DS and Voice Activity Detection (VAD), both recently proposed for TV audio. When their combination suggests dialogue inactivity, background components leaking in the dialogue estimate are reassigned to the background estimate. A clear improvement of the audio quality is shown for dialogue-free signals, without performance drops when dialogue is active. A post-processed VAD estimate with improved detection accuracy is also generated. It is concluded that DS and VAD can improve each other and are better used together.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Predicting Preferred Dialogue-to-Background Loudness Difference in Dialogue-Separated Audio

Dialogue Enhancement (DE) enables the rebalancing of dialogue and backgr...
research
06/25/2020

Dialogue Enhancement in Object-based Audio – Evaluating the Benefit on People above 65

Due to age-related hearing loss, elderly people often struggle with foll...
research
12/18/2018

Audiovisual speaker diarization of TV series

Speaker diarization may be difficult to achieve when applied to narrativ...
research
08/22/2017

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Isolating the voice of a specific person while filtering out other voice...
research
02/22/2023

Topic-switch adapted Japanese Dialogue System based on PLATO-2

Large-scale open-domain dialogue systems such as PLATO-2 have achieved s...
research
08/30/2020

Personalized TV Recommendation: Fusing User Behavior and Preferences

In this paper, we propose a two-stage ranking approach for recommending ...
research
10/29/2020

Progressive Voice Trigger Detection: Accuracy vs Latency

We present an architecture for voice trigger detection for virtual assis...

Please sign up or login with your details

Forgot password? Click here to reset