Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

12/01/2020
by   Peter Wu, et al.
0

Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings. Our approach provides a new direction for cross-lingual data augmentation in any speech-based NLP task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...
research
05/13/2023

The Geometry of Multilingual Language Models: An Equality Lens

Understanding the representations of different languages in multilingual...
research
11/09/2022

Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes

Providing better language tools for low-resource and endangered language...
research
02/05/2022

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

For automatically identifying hate speech and offensive content in tweet...
research
11/27/2018

Cross-Lingual Approaches to Reference Resolution in Dialogue Systems

In the slot-filling paradigm, where a user can refer back to slots in th...
research
05/22/2023

Automatic Readability Assessment for Closely Related Languages

In recent years, the main focus of research on automatic readability ass...
research
11/14/2022

Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

In this work, we focus on intrasentential code-mixing and propose severa...

Please sign up or login with your details

Forgot password? Click here to reset