KeyVec: Key-semantics Preserving Document Representations

09/27/2017
by   Bin Bi, et al.
0

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms take as input for a number of NLP tasks. We propose a neural network model, KeyVec, which learns document representations with the goal of preserving key semantics of the input text. It enables the learned low-dimensional vectors to retain the topics and important information from the documents that will flow to downstream tasks. Our empirical evaluations show the superior quality of KeyVec representations in two different document understanding tasks.

READ FULL TEXT
research
01/07/2019

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distribu...
research
11/26/2022

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Word feature vectors have been proven to improve many NLP tasks. With re...
research
06/28/2018

A Computational Theory for Life-Long Learning of Semantics

Semantic vectors are learned from data to express semantic relationships...
research
07/08/2017

Efficient Vector Representation for Documents through Corruption

We present an efficient document representation learning framework, Docu...
research
05/31/2017

Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

We investigate the pertinence of methods from algebraic topology for tex...
research
03/08/2021

A Topological Approach to Compare Document Semantics Based on a New Variant of Syntactic N-grams

This paper delivers a new perspective of thinking and utilizing syntacti...
research
11/15/2022

When to Use What: An In-Depth Comparative Empirical Analysis of OpenIE Systems for Downstream Applications

Open Information Extraction (OpenIE) has been used in the pipelines of v...

Please sign up or login with your details

Forgot password? Click here to reset