Improve Document Embedding for Text Categorization Through Deep Siamese Neural Network

05/31/2020
by   Erfaneh Gharavi, et al.
0

Due to the increasing amount of data on the internet, finding a highly-informative, low-dimensional representation for text is one of the main challenges for efficient natural language processing tasks including text classification. This representation should capture the semantic information of the text while retaining their relevance level for document classification. This approach maps the documents with similar topics to a similar space in vector space representation. To obtain representation for large text, we propose the utilization of deep Siamese neural networks. To embed document relevance in topics in the distributed representation, we use a Siamese neural network to jointly learn document representations. Our Siamese network consists of two sub-network of multi-layer perceptron. We examine our representation for the text categorization task on BBC news dataset. The results show that the proposed representations outperform the conventional and state-of-the-art representations in the text classification task on this dataset.

READ FULL TEXT

page 8

page 9

research
04/22/2020

Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks

Text classification is fundamental in natural language processing (NLP),...
research
05/14/2016

Rationale-Augmented Convolutional Neural Networks for Text Classification

We present a new Convolutional Neural Network (CNN) model for text class...
research
03/04/2020

SeMemNN: A Semantic Matrix-Based Memory Neural Network for Text Classification

Text categorization is the task of assigning labels to documents written...
research
10/14/2021

Compressibility of Distributed Document Representations

Contemporary natural language processing (NLP) revolves around learning ...
research
12/23/2016

"What is Relevant in a Text Document?": An Interpretable Machine Learning Approach

Text documents can be described by a number of abstract concepts such as...
research
11/08/2018

Doc2Im: document to image conversion through self-attentive embedding

Text classification is a fundamental task in NLP applications. Latest re...
research
02/21/2018

Matching Long Text Documents via Graph Convolutional Networks

Identifying the relationship between two text objects is a core research...

Please sign up or login with your details

Forgot password? Click here to reset