Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

04/24/2022
by   Natsuo Yamashita, et al.
0

This paper investigates a method for simulating natural conversation in the model training of end-to-end neural diarization (EEND). Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset. Simulated datasets play an essential role in the training of EEND, but as yet there has been insufficient investigation into an optimal simulation method. We thus propose a method to simulate natural conversational speech. In contrast to conventional methods, which simply combine the speech of multiple speakers, our method takes turn-taking into account. We define four types of speaker transition and sequentially arrange them to simulate natural conversations. The dataset simulated using our method was found to be statistically similar to the real dataset in terms of the silence and overlap ratios. The experimental results on two-speaker diarization using the CALLHOME and CSJ datasets showed that the simulated dataset contributes to improving the performance of EEND.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

Cross-Attention End-to-End ASR for Two-Party Conversations

We present an end-to-end speech recognition model that learns interactio...
research
06/20/2021

Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization

This paper investigates an end-to-end neural diarization (EEND) method f...
research
04/02/2022

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end neural diarization (EEND) is nowadays one of the most promine...
research
11/12/2022

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end diarization presents an attractive alternative to standard ca...
research
05/29/2023

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

We performed an experimental review of current diarization systems for t...
research
06/03/2020

Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

Criminal investigations rely on the collection of conversational data. T...
research
06/24/2023

Improving End-to-End Neural Diarization Using Conversational Summary Representations

Speaker diarization is a task concerned with partitioning an audio recor...

Please sign up or login with your details

Forgot password? Click here to reset