Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain

by   Suwon Shon, et al.

End-to-end deep learning language or dialect identification systems operate on the spectrogram or other acoustic feature and directly generate identification scores for each class. An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition. In general, we assume that there is enough variation in the training dataset to expose the system to multiple domains. In this work, we study how to best make use a training dataset in order to have maximum effectiveness on unknown target domains. Our goal is to process the input without any knowledge of the target domain while preserving robust performance on other domains as well. To accomplish this objective, we propose a domain attentive fusion approach for end-to-end dialect/language identification systems. To help with experimentation, we collect a dataset from three different domains, and create experimental protocols for a domain mismatched condition. The results of our proposed approach, which were tested on a variety of broadcast and YouTube data, shows significant performance gain compared to traditional approaches, even without any prior target domain information.


page 1

page 2

page 3

page 4


Data Techniques For Online End-to-end Speech Recognition

Practitioners often need to build ASR systems for new use cases in a sho...

Task Guided Compositional Representation Learning for ZDA

Zero-shot domain adaptation (ZDA) methods aim to transfer knowledge abou...

Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog

Recent studies have shown remarkable success in end-to-end task-oriented...

#SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm

Automatic sarcasm detection methods have traditionally been designed for...

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Adapting semantic segmentation models to new domains is an important but...

Dynamic-Pix2Pix: Noise Injected cGAN for Modeling Input and Target Domain Joint Distributions with Limited Training Data

Learning to translate images from a source to a target domain with appli...

On Assessing the Usefulness of Proxy Domains for Developing and Evaluating Embodied Agents

In many situations it is either impossible or impractical to develop and...

Please sign up or login with your details

Forgot password? Click here to reset