An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

08/28/2020
by   Genggeng Liu, et al.
2

In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.

READ FULL TEXT

page 1

page 5

page 7

page 10

research
04/02/2019

Short Text Classification Improved by Feature Space Extension

With the explosive development of mobile Internet, short text has been a...
research
07/02/2021

Transformer-F: A Transformer network with effective methods for learning universal sentence representation

The Transformer model is widely used in natural language processing for ...
research
10/18/2018

TS-CNN: Text Steganalysis from Semantic Space Based on Convolutional Neural Network

Steganalysis has been an important research topic in cybersecurity that ...
research
08/21/2023

Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset

The selection of features for text classification is a fundamental task ...
research
07/21/2020

Human Abnormality Detection Based on Bengali Text

In the field of natural language processing and human-computer interacti...
research
01/12/2019

A Speech Act Classifier for Persian Texts and its Application in Identify Speech Act of Rumors

Speech Acts (SAs) are one of the important areas of pragmatics, which gi...
research
06/01/2021

Distribution Matching for Rationalization

The task of rationalization aims to extract pieces of input text as rati...

Please sign up or login with your details

Forgot password? Click here to reset