Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

05/29/2017
by   Zhenzhou Wu, et al.
0

Despite the success of deep learning on many fronts especially image and speech, its application in text classification often is still not as good as a simple linear SVM on n-gram TF-IDF representation especially for smaller datasets. Deep learning tends to emphasize on sentence level semantics when learning a representation with models like recurrent neural network or recursive neural network, however from the success of TF-IDF representation, it seems a bag-of-words type of representation has its strength. Taking advantage of both representions, we present a model known as TDSM (Top Down Semantic Model) for extracting a sentence representation that considers both the word-level semantics by linearly combining the words with attention weights and the sentence-level semantics with BiLSTM and use it on text classification. We apply the model on characters and our results show that our model is better than all the other character-based and word-based convolutional neural network models by zhang15 across seven different datasets with only 1% of their parameters. We also demonstrate that this model beats traditional linear models on TF-IDF vectors on small and polished datasets like news article in which typically deep learning models surrender.

READ FULL TEXT
research
09/04/2015

Character-level Convolutional Networks for Text Classification

This article offers an empirical exploration on the use of character-lev...
research
08/23/2021

Semantic-Preserving Adversarial Text Attacks

Deep neural networks (DNNs) are known to be vulnerable to adversarial im...
research
02/26/2019

Semantic Hilbert Space for Text Representation Learning

Capturing the meaning of sentences has long been a challenging task. Cur...
research
02/13/2018

Sentence Boundary Detection for French with Subword-Level Information Vectors and Convolutional Neural Networks

In this work we tackle the problem of sentence boundary detection applie...
research
06/24/2021

byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings

This article introduces byteSteady – a fast model for classification usi...
research
04/17/2021

Attacking Text Classifiers via Sentence Rewriting Sampler

Most adversarial attack methods on text classification are designed to c...
research
05/02/2016

Compositional Sentence Representation from Character within Large Context Text

This paper describes a Hierarchical Composition Recurrent Network (HCRN)...

Please sign up or login with your details

Forgot password? Click here to reset