Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

05/23/2018
by   Ruixi Lin, et al.
0

Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English where most words are made up of 26 phonologic alphabet letters, Chinese is logographic, where a Chinese character is a more basic semantic unit that can be informative and its meaning does not vary too much in contexts. Chinese word embeddings alone can be inadequate for representing words, and pre-trained embeddings can suffer from not aligning well with the task at hand. To account for the inadequacy and leverage Chinese character information, we propose a low-effort and generic way to dynamically integrate character embedding based feature maps with word embedding based inputs, whose resulting word-character embeddings are stacked with a contextual information extraction module to further incorporate context information for predictions. On top of the proposed model, we employ an ensemble method to combine single models and obtain the final result. The approach is data-independent without relying on external sources like pre-trained word embeddings. The proposed model outperforms baseline models and existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2015

Component-Enhanced Chinese Character Embeddings

Distributed word representations are very useful for capturing semantic ...
research
11/07/2019

Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

Most Chinese pre-trained encoders take a character as a basic unit and l...
research
11/13/2017

Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

Character-based sequence labeling framework is flexible and efficient fo...
research
11/18/2016

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The repre...
research
06/03/2019

Chinese Embedding via Stroke and Glyph Information: A Dual-channel View

Recent studies have consistently given positive hints that morphology is...
research
11/25/2019

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Spelling error detection serves as a crucial preprocessing in many natur...
research
10/16/2018

Subword Semantic Hashing for Intent Classification on Small Datasets

In this paper, we introduce the use of Semantic Hashing as embedding for...

Please sign up or login with your details

Forgot password? Click here to reset