Semi-supervised Text Regression with Conditional Generative Adversarial Networks

10/02/2018 ∙ by Tao Li, et al. ∙ Purdue University California Institute of Technology 0

Enormous online textual information provides intriguing opportunities for understandings of social and economic semantics. In this paper, we propose a novel text regression model based on a conditional generative adversarial network (GAN), with an attempt to associate textual data and social outcomes in a semi-supervised manner. Besides promising potential of predicting capabilities, our superiorities are twofold: (i) the model works with unbalanced datasets of limited labelled data, which align with real-world scenarios; and (ii) predictions are obtained by an end-to-end framework, without explicitly selecting high-level representations. Finally we point out related datasets for experiments and future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With millions of textual information uploaded every day, the Internet embeds tremendous data of social and economic phenomena, and have attracted consistent interests not only from sociologists and economists but also statisticians and computer scientists. For example, [1] forecasted movie revenues using online reviews; based on social media data, [2] monitored flu pandemic and [3] predicted election results.

To our best knowledge, the concept of text regression was first introduced by [4]

who described it as: given a piece of text, predict a real-world continuous quantity associated with the text’s meaning. They applied a linear model to estimate financial risks by using financial reports directly and claimed a significant outperformance compared to previous methods. Subsequently, several linear text regression models were proposed; to name a few:

[5, 6, 7].

Although easy for interpretation and implementation, linear models rely heavily on specific selections of high-level textual representations and fail to properly capture complicated distributions. Recent successese of deep neural networks in the field of computer vision (e.g.,

[8] and [9]

) encourage reseachers to discover their potential in natural language processing. Unlike image synthesis, using deep networks for natural language generation (NLG) is notoriously difficult

[10], as the feature space of a sentence is discrete and thereby discontinuous and non-differentiable. [11]

attacked this issue by using one-hot vectors obtained from softmax function for backpropergation.

[12] used ranking scores instead of real/fake prediction for the objective function of the discriminator.

Our idea of using GANs for text regression was inspired by recent advances in NLG (e.g., [13] and [14]). We further shift the focus from realistic language synthesis to the generation of adversarial samples from a LSTM [15], who competes against a discriminator for regression (see Figure 1). The performance of our model is guaranteed by deep neural networks’ power of capturing complicated distributions especially when obtained in an adversarial manner. The capability of training with limited supervision also facilitates promising future applications.

The rest of the paper is organized as follows: in Section II

we discuss existing text regression techniques and previous works in semi-supervised learning with GANs; the model is detailed in Section

III; we conclude the paper in Section IV by future works.

Fig. 1: Architecture of the TR-GAN model.

Ii Related Work

Ii-a Text Regression

Previous attempts at text regression mainly focused on linear models. [4] adopted a support vector regression (SVR) [16] in financial reports to predict the volatility of stock returns, a widely used measure of financial risk, and reported a significant outperformance compared to state-of-the-arts. To correlate movies’ online reviews and corresponding revenues, [1] extracted high-level features of textual reviews and incorporated them into a elastic net model [17]. [3] exploited a multi-task learning scheme that leverages textual data with user profiles for voting intention prediction. As mentioned earlier, linear models sometimes are oversimplified and fail to properly capture real-world scenarios. [18]

proposed the first non-linear model, a deep convolutional neural network, for text regression which surpassed previous state of the art even with limited supervision.

Ii-B Semi-supervised Learning

Semi-supervised learning tackles the problem of learning a mapping between data and labels when only a small subset of labels are available. Earlier approaches of generative models with semi-supervised learning consider Gaussian mixture models

[19] and non-parametric density models [20], but suffer from limitations of scalability and inference accuracy. Recently [21] addresses this problem by developing stochastic variational inference algorithms for join optimization of model and variational parameters.

Since generative adversarial networks (GANs) has been shown to be promising in generating realistic images [22], several approaches have been proposed to use GANs in semi-supervised learning. [23] extends the discriminator () to be a

class classifier with objective function to minimize prediction certainty on generated images, while generator aims for maximize the same objective.

[24] augments the class discriminator to include a label as fake for the generated images. These work have shown that incorporating adversarial objectives can make the learning of classifier robust and data efficient. While previous works mainly focus on classification setting, in our work, we extend the GAN based semi-supervised learning to regression task.

Iii The TR-GAN Model

In this section, we detail the conditional generative adversarial network for text regression in a semi-supervised setting (TR-GAN). We first introduce the word embedding method.

Iii-a Word Embedding

Word embedding method learns a high dimension representation for each word, thereby incorporate semantic information that cannot be captured by the single token. In our work, we adopted pretrained word embedding for each word in the text input. Then each document in data can be represented by a matrix, where is the number of words in the document and is the dimension of word embedding in the pretrained model.

Iii-B Model Architecture

As illustrated in Figure 1

, the network architecture is a conditional GAN with a generator and a discriminator. A long short-term memory network (LSTM)

[15] is deployed as the generator for natural languages. As the embedding is fed into LSTM, the generator is a LSTM-based sentence decoder. The discriminator is a convolutional neural network (CNN) [25], where serval residual blocks [26]

are followed by batch normalization with

as the activate function. Subsequently, two fully connected layers are finalized for adversarial learning and the regression task.

The objective function adopt mean absolute error (MAE) for regression tasks and adversarial loss for sequence generation. Not only can this model generate realistic sentences through the optimized generator but the discriminator is also trained as a regression model for multiple prediction tasks (e.g., auto sales prediction, public opinion tracking, and even epidemiological surveillance from social media), which are of great interest to a wide range of stakeholders.

Iv Future Work

We are excited about the idea of using GANs for text regression. Given the nature of the TR-GAN model, it is not challenging to find an experimental dataset; for example, [27] collected 50,000 textual comments below YouTube videos, among which 20,000 are labelled by state-of-the-art algorithms and 1,000 are labelled manually. We also are interested to see how the generated languages look like, given that existing literatures of using GANs for NLG merely report original experimental results but instead numerical metrics.

Acknowledgments

We thank Hao Peng and Kantapon Kaewtip for insightful discussions. The idea of this work originally came out during discussions of [28] and [29].

References