Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

02/24/2022
by   Quan Wang, et al.
0

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, a simple domain adaptation mechanism is introduced to allow adapting an existing language identification model to a new domain where the prior language distribution is different. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-base models outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation significantly improve the model accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2021

Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on pr...
research
07/11/2021

Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network

Partial Domain Adaptation (PDA) is a practical and general domain adapta...
research
09/21/2021

Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment

Multi-Source Domain Adaptation (MSDA) is a more practical domain adaptat...
research
05/26/2019

Temporal Attentive Alignment for Video Domain Adaptation

Although various image-based domain adaptation (DA) techniques have been...
research
02/14/2022

Domain Adaptation via Prompt Learning

Unsupervised domain adaption (UDA) aims to adapt models learned from a w...
research
06/01/2023

Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

This work focuses on improving the Spoken Language Identification (LangI...
research
10/27/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

For video recognition task, a global representation summarizing the whol...

Please sign up or login with your details

Forgot password? Click here to reset