On Tree-structured Multi-stage Principal Component Analysis (TMPCA) for Text Classification

07/22/2018
by   Yuanhang Su, et al.
0

A novel sequence-to-vector (seq2vec) embedding method, called the tree-structured multi-stage principal component analysis (TMPCA), is proposed for the text classification problem in this paper. Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better performance than state-of-the-art fastText and other neural-network-based solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2018

Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis

A novel text data dimension reduction technique, called the tree-structu...
research
01/20/2018

Efficient Text Classification Using Tree-structured Multi-linear Principle Component Analysis

A novel text data dimension reduction technique, called the tree-structu...
research
11/05/2014

Multilinear Principal Component Analysis Network for Tensor Object Classification

The recently proposed principal component analysis network (PCANet) has ...
research
01/13/2019

Image retrieval method based on CNN and dimension reduction

An image retrieval method based on convolution neural network and dimens...
research
04/28/2021

Interpretable Embedding Procedure Knowledge Transfer via Stacked Principal Component Analysis and Graph Neural Network

Knowledge distillation (KD) is one of the most useful techniques for lig...
research
04/13/2020

ProFormer: Towards On-Device LSH Projection Based Transformers

At the heart of text based neural models lay word representations, which...
research
06/24/2021

byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings

This article introduces byteSteady – a fast model for classification usi...

Please sign up or login with your details

Forgot password? Click here to reset