A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification

04/02/2018
by   Weicheng Cai, et al.
0

A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the results reveal that our proposed dictionary encoding layer achieves significant error reduction comparing with the simple average pooling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2018

Insights into End-to-End Learning Scheme for Language Identification

A novel interpretable end-to-end learning scheme for language identifica...
research
04/14/2018

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

In this paper, we explore the encoding/pooling layer and loss function i...
research
12/08/2016

Deep TEN: Texture Encoding Network

We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encod...
research
09/09/2018

End-to-end Language Identification using NetFV and NetVLAD

In this paper, we apply the NetFV and NetVLAD layers for the end-to-end ...
research
02/20/2019

Utterance-level end-to-end language identification using attention-based CNN-BLSTM

In this paper, we present an end-to-end language identification framewor...
research
06/19/2019

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

In this paper, we propose a new pooling method called spatial pyramid en...
research
04/25/2018

Learnable Histogram: Statistical Context Features for Deep Neural Networks

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher V...

Please sign up or login with your details

Forgot password? Click here to reset