Multimodal Modeling For Spoken Language Identification

09/19/2023
by   Shikhar Bharadwaj, et al.
0

Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification. Our study reveals that metadata such as video title, description and geographic location provide substantial information to identify the spoken language of the multimedia recording. We conduct experiments using two diverse public datasets of YouTube videos, and obtain state-of-the-art results on the language identification task. We additionally conduct an ablation study that describes the distinct contribution of each modality for language recognition.

READ FULL TEXT
research
10/22/2020

Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification

Deep neural networks have been employed for various spoken language reco...
research
10/14/2021

Speech Toxicity Analysis: A New Spoken Language Processing Task

Toxic speech, also known as hate speech, is regarded as one of the cruci...
research
06/07/2021

SIGTYP 2021 Shared Task: Robust Spoken Language Identification

While language identification is a fundamental speech and language proce...
research
03/02/2021

Listen, Read, and Identify: Multimodal Singing Language Identification of Music

We propose a multimodal singing language classification model that uses ...
research
09/14/2023

CiwaGAN: Articulatory information exchange

Humans encode information into sounds by controlling articulators and de...
research
12/15/2022

You were saying? – Spoken Language in the V3C Dataset

This paper presents an analysis of the distribution of spoken language i...
research
02/27/2023

Language identification as improvement for lip-based biometric visual systems

Language has always been one of humanity's defining characteristics. Vis...

Please sign up or login with your details

Forgot password? Click here to reset