Low-Resource Contextual Topic Identification on Speech

07/17/2018
by   Chunxi Liu, et al.
0

In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2017

Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

Audio Word2Vec offers vector representations of fixed dimensionality for...
research
02/23/2018

The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection

We describe the system our team used during NIST's LoReHLT (Low Resource...
research
07/03/2023

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Connectionist Temporal Classification (CTC) models are popular for their...
research
08/14/2021

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

We present the findings of the LoResMT 2021 shared task which focuses on...
research
10/09/2019

Spoken Language Identification using ConvNets

Language Identification (LI) is an important first step in several speec...
research
06/05/2023

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

Spoken languages often utilise intonation, rhythm, intensity, and struct...

Please sign up or login with your details

Forgot password? Click here to reset