Implicit spoken language diarization

06/22/2023
by   Jagabandhu Mishra, et al.
0

Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedding vectors. Hence this work initially explores the available speaker diarization frameworks that capture speaker information implicitly to perform LD tasks. The performance of the LD system on synthetic code-switch data using the end-to-end x-vector approach is 6.78 for practical data is 22.50 Jaccard error rate (JER), respectively. The performance degradation is due to the data imbalance and resolved to some extent by using pre-trained wave2vec embeddings that provide a relative improvement of 30.74

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2023

Implicit Self-supervised Language Representation for Spoken Language Diarization

In a code-switched (CS) scenario, the use of spoken language diarization...
research
05/24/2023

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

We present SPECTRON, a novel approach to adapting pre-trained language m...
research
09/23/2020

The importance of fillers for text representations of speech transcripts

While being an essential component of spoken language, fillers (e.g."um"...
research
02/10/2023

Spoken language change detection inspired by speaker change detection

Spoken language change detection (LCD) refers to identifying the languag...
research
09/30/2019

Non-native Speaker Verification for Spoken Language Assessment

Automatic spoken language assessment systems are becoming more popular i...
research
07/19/2020

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

The performance of most speaker diarization systems with x-vector embedd...
research
09/07/2021

Text-Free Prosody-Aware Generative Spoken Language Modeling

Speech pre-training has primarily demonstrated efficacy on classificatio...

Please sign up or login with your details

Forgot password? Click here to reset