Listen, Read, and Identify: Multimodal Singing Language Identification of Music

03/02/2021
by   Keunwoo Choi, et al.
0

We propose a multimodal singing language classification model that uses both audio content and textual metadata. LRID-Net, the proposed model, takes an audio signal and a language probability vector estimated from the metadata and outputs the probabilities of the target languages. Optionally, LRID-Net is facilitated with modality dropouts to handle a missing modality. In the experiment, we trained several LRID-Nets with varying modality dropout configuration and test them with various combinations of input modalities. The experiment results demonstrate that using multimodal input improves the performance. The results also suggest that adopting modality dropout does not degrade performance of the model when there are full modality inputs while enabling the model to handle missing modality cases to some extent.

READ FULL TEXT

page 4

page 5

research
03/06/2023

Multimodal Prompting with Missing Modalities for Visual Recognition

In this paper, we tackle two challenges in multimodal learning for visua...
research
07/26/2023

Visual Prompt Flexible-Modal Face Anti-Spoofing

Recently, vision transformer based multimodal learning methods have been...
research
11/13/2015

Symbol Grounding Association in Multimodal Sequences with Missing Elements

In this paper, we extend a symbolic association framework for being able...
research
10/19/2019

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

We present an audio-visual multimodal approach for the task of zeroshot ...
research
09/19/2023

Multimodal Modeling For Spoken Language Identification

Spoken language identification refers to the task of automatically predi...
research
08/19/2019

A unified representation network for segmentation with missing modalities

Over the last few years machine learning has demonstrated groundbreaking...
research
03/12/2021

Orthogonal Statistical Inference for Multimodal Data Analysis

Multimodal imaging has transformed neuroscience research. While it prese...

Please sign up or login with your details

Forgot password? Click here to reset