Word and Document Embeddings based on Neural Network Approaches

11/18/2016
by   Siwei Lai, et al.
0

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and researchers aim at designing better features for specific tasks. Recently, the rapid development of deep learning and representation learning has brought new inspiration to various domains. In natural language processing, the most widely used feature representation is the Bag-of-Words model. This model has the data sparsity problem and cannot keep the word order information. Other features such as part-of-speech tagging or more complex syntax features can only fit for specific tasks in most cases. This thesis focuses on word representation and document representation. We compare the existing systems and present our new model. First, for generating word embeddings, we make comprehensive comparisons among existing word embedding models. In terms of theory, we figure out the relationship between the two most important models, i.e., Skip-gram and GloVe. In our experiments, we analyze three key points in generating word embeddings, including the model construction, the training corpus and parameter design. We evaluate word embeddings with three types of tasks, and we argue that they cover the existing use of word embeddings. Through theory and practical experiments, we present some guidelines for how to generate a good word embedding. Second, in Chinese character or word representation. We introduce the joint training of Chinese character and word. ... Third, for document representation, we analyze the existing document representation models, including recursive NNs, recurrent NNs and convolutional NNs. We point out the drawbacks of these models and present our new model, the recurrent convolutional neural networks. ...

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2019

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese charac...
research
10/04/2016

Chinese Event Extraction Using DeepNeural Network with Word Embedding

A lot of prior work on event extraction has exploited a variety of featu...
research
11/11/2017

Learning Document Embeddings With CNNs

We propose a new model for unsupervised document embedding. Existing app...
research
04/20/2018

A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction

Objective: To automatically create large labeled training datasets and r...
research
11/13/2017

Targeted Advertising Based on Browsing History

Audience interest, demography, purchase behavior and other possible clas...
research
05/23/2018

Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

Intent classification has been widely researched on English data with de...
research
06/13/2019

Character n-gram Embeddings to Improve RNN Language Models

This paper proposes a novel Recurrent Neural Network (RNN) language mode...

Please sign up or login with your details

Forgot password? Click here to reset