Learning Interpretable and Discrete Representations with Adversarial Training for Unsupervised Text Classification

04/28/2020
by   Yau-Shian Wang, et al.
0

Learning continuous representations from unlabeled textual data has been increasingly studied for benefiting semi-supervised learning. Although it is relatively easier to interpret discrete representations, due to the difficulty of training, learning discrete representations for unlabeled textual data has not been widely explored. This work proposes TIGAN that learns to encode texts into two disentangled representations, including a discrete code and a continuous noise, where the discrete code represents interpretable topics, and the noise controls the variance within the topics. The discrete code learned by TIGAN can be used for unsupervised text classification. Compared to other unsupervised baselines, the proposed TIGAN achieves superior performance on six different corpora. Also, the performance is on par with a recently proposed weakly-supervised text classification method. The extracted topical words for representing latent topics show that TIGAN learns coherent and highly interpretable topics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2023

ESimCSE Unsupervised Contrastive Learning Jointly with UDA Semi-Supervised Learning for Large Label System Text Classification Mode

The challenges faced by text classification with large tag systems in na...
research
02/25/2019

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification

Hierarchical text classification has many real-world applications. Howev...
research
09/11/2018

Topic Memory Networks for Short Text Classification

Many classification models work poorly on short texts due to data sparsi...
research
12/21/2022

Text classification in shipping industry using unsupervised models and Transformer based supervised models

Obtaining labelled data in a particular context could be expensive and t...
research
09/15/2021

Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders

The ability of learning disentangled representations represents a major ...
research
04/15/2021

Consistency Training with Virtual Adversarial Discrete Perturbation

We propose an effective consistency training framework that enforces a t...
research
07/07/2011

Text Classification: A Sequential Reading Approach

We propose to model the text classification process as a sequential deci...

Please sign up or login with your details

Forgot password? Click here to reset