Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of .650 on the ExVo Few-Shot dev set, a 2.5% increase over our baseline CNN14 CCC of .634.

READ FULL TEXT
research
10/04/2021

Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Emotional voice conversion (VC) aims to convert a neutral voice to an em...
research
06/29/2022

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

The capability of generating speech with specific type of emotion is des...
research
09/03/2018

Three-Stage Speaker Verification Architecture in Emotional Talking Environments

Speaker verification performance in neutral talking environment is usual...
research
02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...
research
07/01/2017

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Usually, people talk neutrally in environments where there are no abnorm...
research
05/30/2022

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Although significant progress has been made to audio-driven talking face...
research
11/22/2019

Decision Making guided by Emotion A computational architecture

A computational architecture is presented, in which "swift and fuzzy" em...

Please sign up or login with your details

Forgot password? Click here to reset