With millions of textual information uploaded every day, the Internet embeds tremendous data of social and economic phenomena, and have attracted consistent interests not only from sociologists and economists but also statisticians and computer scientists. For example,  forecasted movie revenues using online reviews; based on social media data,  monitored flu pandemic and  predicted election results.
To our best knowledge, the concept of text regression was first introduced by 
who described it as: given a piece of text, predict a real-world continuous quantity associated with the text’s meaning. They applied a linear model to estimate financial risks by using financial reports directly and claimed a significant outperformance compared to previous methods. Subsequently, several linear text regression models were proposed; to name a few:[5, 6, 7].
Although easy for interpretation and implementation, linear models rely heavily on specific selections of high-level textual representations and fail to properly capture complicated distributions. Recent successese of deep neural networks in the field of computer vision (e.g., and 10], as the feature space of a sentence is discrete and thereby discontinuous and non-differentiable. 
attacked this issue by using one-hot vectors obtained from softmax function for backpropergation. used ranking scores instead of real/fake prediction for the objective function of the discriminator.
Our idea of using GANs for text regression was inspired by recent advances in NLG (e.g.,  and ). We further shift the focus from realistic language synthesis to the generation of adversarial samples from a LSTM , who competes against a discriminator for regression (see Figure 1). The performance of our model is guaranteed by deep neural networks’ power of capturing complicated distributions especially when obtained in an adversarial manner. The capability of training with limited supervision also facilitates promising future applications.
Ii Related Work
Ii-a Text Regression
Previous attempts at text regression mainly focused on linear models.  adopted a support vector regression (SVR)  in financial reports to predict the volatility of stock returns, a widely used measure of financial risk, and reported a significant outperformance compared to state-of-the-arts. To correlate movies’ online reviews and corresponding revenues,  extracted high-level features of textual reviews and incorporated them into a elastic net model .  exploited a multi-task learning scheme that leverages textual data with user profiles for voting intention prediction. As mentioned earlier, linear models sometimes are oversimplified and fail to properly capture real-world scenarios. 
proposed the first non-linear model, a deep convolutional neural network, for text regression which surpassed previous state of the art even with limited supervision.
Ii-B Semi-supervised Learning
Semi-supervised learning tackles the problem of learning a mapping between data and labels when only a small subset of labels are available. Earlier approaches of generative models with semi-supervised learning consider Gaussian mixture models and non-parametric density models , but suffer from limitations of scalability and inference accuracy. Recently  addresses this problem by developing stochastic variational inference algorithms for join optimization of model and variational parameters.
Since generative adversarial networks (GANs) has been shown to be promising in generating realistic images , several approaches have been proposed to use GANs in semi-supervised learning.  extends the discriminator () to be a
class classifier with objective function to minimize prediction certainty on generated images, while generator aims for maximize the same objective. augments the class discriminator to include a label as fake for the generated images. These work have shown that incorporating adversarial objectives can make the learning of classifier robust and data efficient. While previous works mainly focus on classification setting, in our work, we extend the GAN based semi-supervised learning to regression task.
Iii The TR-GAN Model
In this section, we detail the conditional generative adversarial network for text regression in a semi-supervised setting (TR-GAN). We first introduce the word embedding method.
Iii-a Word Embedding
Word embedding method learns a high dimension representation for each word, thereby incorporate semantic information that cannot be captured by the single token. In our work, we adopted pretrained word embedding for each word in the text input. Then each document in data can be represented by a matrix, where is the number of words in the document and is the dimension of word embedding in the pretrained model.
Iii-B Model Architecture
As illustrated in Figure 1
, the network architecture is a conditional GAN with a generator and a discriminator. A long short-term memory network (LSTM) is deployed as the generator for natural languages. As the embedding is fed into LSTM, the generator is a LSTM-based sentence decoder. The discriminator is a convolutional neural network (CNN) , where serval residual blocks 
are followed by batch normalization with
as the activate function. Subsequently, two fully connected layers are finalized for adversarial learning and the regression task.
The objective function adopt mean absolute error (MAE) for regression tasks and adversarial loss for sequence generation. Not only can this model generate realistic sentences through the optimized generator but the discriminator is also trained as a regression model for multiple prediction tasks (e.g., auto sales prediction, public opinion tracking, and even epidemiological surveillance from social media), which are of great interest to a wide range of stakeholders.
Iv Future Work
We are excited about the idea of using GANs for text regression. Given the nature of the TR-GAN model, it is not challenging to find an experimental dataset; for example,  collected 50,000 textual comments below YouTube videos, among which 20,000 are labelled by state-of-the-art algorithms and 1,000 are labelled manually. We also are interested to see how the generated languages look like, given that existing literatures of using GANs for NLG merely report original experimental results but instead numerical metrics.
-  M. Joshi, D. Das, K. Gimpel, and N. A. Smith, “Movie reviews and revenues: An experiment in text regression,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010, pp. 293–296.
-  V. Lampos and N. Cristianini, “Tracking the flu pandemic by monitoring the social web,” in Cognitive Information Processing (CIP), 2010 2nd International Workshop on. IEEE, 2010, pp. 411–416.
-  V. Lampos, D. Preoţiuc-Pietro, and T. Cohn, “A user-centric model of voting intention from social media,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2013, pp. 993–1003.
-  S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith, “Predicting risk from financial reports with regression,” in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009, pp. 272–280.
-  S. Volkova, G. Coppersmith, and B. Van Durme, “Inferring user political preferences from streaming communications,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2014, pp. 186–196.
-  V. Lampos, N. Aletras, D. Preoţiuc-Pietro, and T. Cohn, “Predicting and characterising user impact on twitter,” in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, pp. 405–413.
-  D. Preoţiuc-Pietro, V. Lampos, and N. Aletras, “An analysis of the user occupational class through twitter content,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, 2015, pp. 1754–1764.
C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al.
, “Photo-realistic single image super-resolution using a generative adversarial network.” inCVPR, vol. 2, no. 3, 2017, p. 4.
-  X. Liu, T. Li, H. Peng, I. C. Ouyang, T. Kim, R. Wang, and G. Guo, “Mining semantic descriptions from data for beauty understanding,” in 2019 IEEE Winter Conference on Applications of Computer Vision, 2019.
T. Li, K. Fu, M. Choi, X. Liu, and Y. Chen, “Toward robust and efficient
training of generative adversarial networks with bayesian approximation,” in
the Approximation Theory and Machine Learning Conference, 2018.
-  M. J. Kusner and J. M. Hernández-Lobato, “Gans for sequences of discrete elements with the gumbel-softmax distribution,” arXiv preprint arXiv:1611.04051, 2016.
-  K. Lin, D. Li, X. He, Z. Zhang, and M.-T. Sun, “Adversarial ranking for language generation,” in Advances in Neural Information Processing Systems, 2017, pp. 3155–3165.
-  L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient.” in AAAI, 2017, pp. 2852–2858.
-  J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, and D. Jurafsky, “Adversarial learning for neural dialogue generation,” arXiv preprint arXiv:1701.06547, 2017.
-  S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
-  H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V. Vapnik, “Support vector regression machines,” in Advances in neural information processing systems, 1997, pp. 155–161.
-  H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.
-  Z. Bitvai and T. Cohn, “Non-linear text regression with a deep convolutional neural network,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, 2015, pp. 180–185.
-  X. Zhu, “Semi-supervised learning literature survey,” Computer Science, University of Wisconsin-Madison, vol. 2, no. 3, p. 4, 2006.
-  C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum, “Semi-supervised learning with trees,” in Advances in neural information processing systems, 2004, pp. 257–264.
-  D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances in Neural Information Processing Systems, 2014, pp. 3581–3589.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
-  J. T. Springenberg, “Unsupervised and semi-supervised learning with categorical generative adversarial networks,” arXiv preprint arXiv:1511.06390, 2015.
-  A. Odena, “Semi-supervised learning with generative adversarial networks,” arXiv preprint arXiv:1606.01583, 2016.
-  N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” arXiv preprint arXiv:1404.2188, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
T. Li, L. Lin, M. Choi, K. Fu, S. Gong, and J. Wang, “Youtube av 50k: an
annotated corpus for comments in autonomous vehicles,” in
the 13th International Joint Symposium on Artificial Intelligence and Natural Language Processing, 2018.
-  C. Wang, S. Gong, A. Zhou, T. Li, and S. Peeta, “Cooperative adaptive cruise control for connected autonomous vehicles by factoring communication-related constraints,” arXiv preprint arXiv:1807.07232, 2018.
-  S. Gong, A. Zhou, J. Wang, T. Li, and S. Peeta, “Cooperative adaptive cruise control for a platoon of connected and autonomous vehicles considering dynamic information flow topology,” arXiv preprint arXiv:1807.02224, 2018.