Transductive Learning with String Kernels for Cross-Domain Text Classification

11/02/2018
by   Radu Tudor Ionescu, et al.
0

For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/25/2018

Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Recently, string kernels have obtained state-of-the-art results in vario...
04/21/2018

Automated essay scoring with string kernels and word embeddings

In this work, we present an approach based on combining string kernels a...
09/16/2018

Cross-Domain Labeled LDA for Cross-Domain Text Classification

Cross-domain text classification aims at building a classifier for a tar...
11/19/2019

Open Cross-Domain Visual Search

This paper introduces open cross-domain visual search, where categories ...
09/24/2020

Feature Adaptation of Pre-Trained Language Models across Languages and Domains for Text Classification

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains...
03/23/2020

Fast Cross-domain Data Augmentation through Neural Sentence Editing

Data augmentation promises to alleviate data scarcity. This is most impo...
09/01/2019

Topics to Avoid: Demoting Latent Confounds in Text Classification

Despite impressive performance on many text classification tasks, deep n...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Domain shift is a fundamental problem in machine learning, that has attracted a lot of attention in the natural language processing and vision communities 

[2, 32, 29, 42, 30, 39, 11, 37, 40, 6, 13]. To understand and address this problem, generated by the lack of labeled data in a target domain, researchers have studied the behavior of machine learning methods in cross-domain settings [29, 12, 13] and came up with various domain adaptation techniques [28, 39, 11, 6]. In cross-domain classification, a classifier is trained on data from a source domain and tested on data from a (different) target domain. The accuracy of machine learning methods is usually lower in the cross-domain setting, due to the distribution gap between different domains. However, researchers proposed several domain adaptation techniques by using the unlabeled test data to obtain better performance [25, 16, 5, 14, 37]. Interestingly, some recent works [13, 18] indicate that string kernels can yield robust results in the cross-domain setting without any domain adaptation. In fact, methods based on string kernels have demonstrated impressive results in various text classification tasks ranging from native language identification [36, 23, 24, 22] and authorship identification [34] to dialect identification [21, 18, 4], sentiment analysis [13, 35] and automatic essay scoring [7]. As long as a labeled training set is available, string kernels can reach state-of-the-art results in various languages including English [23, 13, 7], Arabic [17, 24, 18, 4], Chinese [35] and Norwegian [24]. Different from all these recent approaches, we use unlabeled data from the test set in a transductive setting in order to significantly increase the performance of string kernels. In our recent work [19], we proposed two transductive learning approaches combined into a unified framework that improves the results of string kernels in two different tasks. In this paper, we provide a formal and detailed description of our transductive algorithm and present results in cross-domain English polarity classification.

The paper is organized as follows. Related work on cross-domain text classification and string kernels is presented in Section 2. Section 3

presents our approach to obtain domain adapted string kernels. The transductive transfer learning method is described in Section 

4. The polarity classification experiments are presented in Section 5. Finally, we draw conclusions and discuss future work in Section 6.

2 Related Work

2.1 Cross-Domain Classification

Transfer learning (or domain adaptation) aims at building effective classifiers for a target domain when the only available labeled training data belongs to a different (source) domain. Domain adaptation techniques can be roughly divided into graph-based methods [32, 33, 6, 31], probabilistic models [42, 30], knowledge-based models [16, 3, 12] and joint optimization frameworks [28]. The transfer learning methods from the literature show promising results in a variety of real-world applications, such as image classification [28], text classification [25, 14, 42], polarity classification [32, 33, 30, 11, 31] and others [8].

General transfer learning approaches. Long et al. [28] proposed a novel transfer learning framework to model distribution adaptation and label propagation in a unified way, based on the structural risk minimization principle and the regularization theory. Shu et al. [39] proposed a method that bridges the distribution gap between the source domain and the target domain through affinity learning, by exploiting the existence of a subset of data points in the target domain that are distributed similarly to the data points in the source domain. In [37]

, deep learning is employed to jointly optimize the representation, the cross-domain transformation and the target label inference in an end-to-end fashion. More recently, Sun et al.

[40] proposed an unsupervised domain adaptation method that minimizes the domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. Chang et al. [6] proposed a framework based on using a parallel corpus to calibrate domain-specific kernels into a unified kernel for leveraging graph-based label propagation between domains.

Cross-domain text classification. Joachims [25]

introduced the Transductive Support Vector Machines (TSVM) framework for text classification, which takes into account a particular test set and tries to minimize the error rate for those particular test samples. Ifrim et al. 

[16] presented a transductive learning approach for text classification based on combining latent variable models for decomposing the topic-word space into topic-concept and concept-word spaces, and explicit knowledge models with named concepts for populating latent variables. Guo et al. [14] proposed a transductive subspace representation learning method to address domain adaptation for cross-lingual text classification. Zhuang et al. [42]

presented a probabilistic model, by which both the shared and distinct concepts in different domains can be learned by the Expectation-Maximization process which optimizes the data likelihood. In

[1], an algorithm to adapt a classification model by iteratively learning domain-specific features from the unlabeled test data is described.

Cross-domain polarity classification. In recent years, cross-domain sentiment (polarity) classification has gained popularity due to the advances in domain adaptation on one side, and to the abundance of documents from various domains available on the Web, expressing positive or negative opinion, on the other side. Some of the general domain adaptation frameworks have been applied to polarity classification [42, 1, 6], but there are some approaches that have been specifically designed for the cross-domain sentiment classification task [2, 26, 32, 33, 12, 30, 11, 13, 31]. To the best of our knowledge, Blitzer et al. [2] were the first to report results on cross-domain classification proposing the structural correspondence learning (SCL) method, and its variant based on mutual information (SCL-MI). Pan et al. [32] proposed a spectral feature alignment (SFA) algorithm to align domain-specific words from different domains into unified clusters, using domain-independent words as a bridge. Bollegala et al. [3]

used a cross-domain lexicon creation to generate a sentiment-sensitive thesaurus (SST) that groups different words expressing the same sentiment, using unigram and bigram features as 

[2, 32]. Luo et al. [30] proposed a cross-domain sentiment classification framework based on a probabilistic model of the author’s emotion state when writing. An Expectation-Maximization algorithm is then employed to solve the maximum likelihood problem and to obtain a latent emotion distribution of the author. Franco-Salvador et al. [12] combined various recent and knowledge-based approaches using a meta-learning scheme (KE-Meta). They performed cross-domain polarity classification without employing any domain adaptation technique. More recently, Fernández et al. [11] introduced the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. The approach builds term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a highly predictive term that behaves similarly across domains. A graph-based approach for sentiment classification that models the relatedness of different domains based on shared users and keywords is proposed in [31].

2.2 String Kernels

In recent years, methods based on string kernels have demonstrated remarkable performance in various text classification tasks [27, 10, 34, 23, 13, 18, 7]

. String kernels represent a way of using information at the character level by measuring the similarity of strings through character n-grams. Lodhi et al. 

[27] used string kernels for document categorization, obtaining very good results. String kernels were also successfully used in authorship identification [34]. More recently, various combinations of string kernels reached state-of-the-art accuracy rates in native language identification [23] and Arabic dialect identification [18]. Interestingly, string kernels have been used in cross-domain settings without any domain adaptation, obtaining impressive results. For instance, Ionescu et al. [23] have employed string kernels in a cross-corpus (and implicitly cross-domain) native language identification experiment, improving the state-of-the-art accuracy by a remarkable . Giménez-Pérez et al. [13] have used string kernels for single-source and multi-source polarity classification. Remarkably, they obtain state-of-the-art performance without using knowledge from the target domain, which indicates that string kernels provide robust results in the cross-domain setting without any domain adaptation. Ionescu et al. [18] obtained the best performance in the Arabic Dialect Identification Shared Task of the 2017 VarDial Evaluation Campaign [41], with an improvement of over the second-best method. It is important to note that the training and the test speech samples prepared for the shared task were recorded in different setups [41], or in other words, the training and the test sets are drawn from different distributions. Different from all these recent approaches [23, 13, 18], we use unlabeled data from the target domain to significantly increase the performance of string kernels in cross-domain text classification, particularly in English polarity classification.

3 Transductive String Kernels

String kernels. Kernel functions [38] capture the intuitive notion of similarity between objects in a specific domain. For example, in text mining, string kernels can be used to measure the pairwise similarity between text samples, simply based on character n-grams. Various string kernel functions have been proposed to date [27, 38, 23]. Perhaps one of the most recently introduced string kernels is the histogram intersection string kernel [23]. For two strings over an alphabet , , the intersection string kernel is formally defined as follows:

(1)

where is the number of occurrences of n-gram as a substring in , and is the length of . The spectrum string kernel or the presence bits string kernel can be defined in a similar fashion [23].

Transductive string kernels. We present a simple and straightforward approach to produce a transductive similarity measure suitable for strings. We take the following steps to derive transductive string kernels. For a given kernel (similarity) function , we first build the full kernel matrix , by including the pairwise similarities of samples from both the train and the test sets. For a training set of samples and a test set of samples, such that , each component in the full kernel matrix is defined as follows:

(2)

where and are samples from the set , for all . We then normalize the kernel matrix by dividing each component by the square root of the product of the two corresponding diagonal components:

(3)

We transform the normalized kernel matrix into a radial basis function (RBF) kernel matrix as follows:

(4)

Each row in the RBF kernel matrix is now interpreted as a feature vector. In other words, each sample is represented by a feature vector that contains the similarity between the respective sample and all the samples in . Since includes the test samples as well, the feature vector is inherently adapted to the test set. Indeed, it is easy to see that the features will be different if we choose to apply the string kernel approach on a set of test samples , such that . It is important to note that through the features, the subsequent classifier will have some information about the test samples at training time. More specifically, the feature vector conveys information about how similar is every test sample to every training sample. We next consider the linear kernel, which is given by the scalar product between the new feature vectors. To obtain the final linear kernel matrix, we simply need to compute the product between the RBF kernel matrix and its transpose:

(5)

In this way, the samples from the test set, which are included in , are used to obtain new (transductive) string kernels that are adapted to the test set at hand.

1 Input: – the training set of training samples and associated class labels; – the set of test samples; – a kernel function; – the number of test samples to be added in the second round of training; – a binary kernel classifier. Domain-Adapted Kernel Matrix Computation Steps: ;
2 ; ; ; ;
3 for  do
4        for  do
5               ;
6              
7       
8for  do
9        for  do
10               ;
11               ;
12              
13       
14;
15 Transductive Kernel Classifier Steps: ;
16 ;
17 ;
18 for  do
19        ;
20        ;
21        ;
22        for  do
23               the dual weights of trained on with the labels ;
24               ;
25              
26       ; ;
27        for  do
28               ;
29               ;
30              
31       if  then
32               sort in descending order and return the sorted indexes;
33               ;
34               ;
35               ;
36               ;
37               ;
38              
39       
Output: – the set of predicted labels for the test samples in .
Algorithm 1 Transductive Kernel Algorithm

4 Transductive Kernel Classifier

We next present a simple yet effective approach for adapting a one-versus-all kernel classifier trained on a source domain to a different target domain. Our transductive kernel classifier (TKC) approach is composed of two learning iterations. Our entire framework is formally described in Algorithm 1.

Notations. We use the following notations in the algorithm. Sets, arrays and matrices are written in capital letters. All collection types are considered to be indexed starting from position . The elements of a set are denoted by , the elements of an array are alternatively denoted by or , and the elements of a matrix are denoted by or when convenient. The sequence is denoted by . We use sequences to index arrays or matrices as well. For example, for an array and two integers and , denotes the sub-array . In a similar manner, denotes a sub-matrix of the matrix , while returns the -th row of M and returns the

-th column of M. The zero matrix of

components is denoted by , and the square zero matrix is denoted by

. The identity matrix is denoted by

.

Algorithm description. In steps -, we compute the domain-adapted string kernel matrix, as described in the previous section. In the first learning iteration (when ), we train several classifiers to distinguish each individual class from the rest, according to the one-versus-all (OVA) scheme. In step , the kernel classifier is trained to distinguish a class from the others, assigning a dual weight to each training sample from the source domain. The returned column vector of dual weights is denoted by and the bias value is denoted by . The vector of weights contains values, such that the weight corresponds to the training sample . When the test kernel matrix of components is multiplied with the vector in step , the result is a column vector of positive or negative scores. Afterwards (step

), the test samples are sorted in order to maximize the probability of correctly predicted labels. For each test sample

, we consider the score (step ) produced by the classifier for the chosen class (step ), which is selected according to the OVA scheme. The sorting is based on the hypothesis that if the classifier associates a higher score to a test sample, it means that the classifier is more confident about the predicted label for the respective test sample. Before the second learning iteration, a number of test samples from the top of the sorted list are added to the training set (steps -) for another round of training. As the classifier is more confident about the predicted labels of the added test samples, the chance of including noisy examples (with wrong labels) is minimized. On the other hand, the classifier has the opportunity to learn some useful domain-specific patterns of the test domain. We believe that, at least in the cross-domain setting, the added test samples bring more useful information than noise. We would like to stress out that the ground-truth test labels are never used in our transductive algorithm. Although the test samples are required beforehand, their labels are not necessary. Hence, our approach is suitable in situations where unlabeled data from the target domain can be collected cheaply, and such situations appear very often in practice, considering the great amount of data available on the Web.

5 Polarity Classification

Data set. For the cross-domain polarity classification experiments, we use the second version of Multi-Domain Sentiment Dataset [2]. The data set contains Amazon product reviews of four different domains: Books (B), DVDs (D), Electronics (E) and Kitchen appliances (K). Reviews contain star ratings (from 1 to 5) which are converted into binary labels as follows: reviews rated with more than 3 stars are labeled as positive, and those with less than 3 stars as negative. In each domain, there are 1000 positive and 1000 negative reviews.

Baselines. We compare our approach with several methods [32, 3, 12, 40, 13, 15] in two cross-domain settings. Using string kernels, Giménez-Pérez et al. [13] reported better performance than SST [3] and KE-Meta [12] in the multi-source domain setting. In addition, we compare our approach with SFA [32], CORAL [40] and TR-TrAdaBoost [15] in the single-source setting.

Method DEKB BEKD BDKE BDEK
SST [3]
KE-Meta [12]
 [13]
 [13]
* * *
* * *
+ TKC * * * *
+ TKC * * * *
Table 1: Multi-source cross-domain polarity classification accuracy rates (in ) of our transductive approaches versus a state-of-the-art baseline based on string kernels [13], as well as SST [3] and KE-Meta [12]. The best accuracy rates are highlighted in bold. The marker * indicates that the performance is significantly better than the best baseline string kernel according to a paired McNemar’s test performed at a significance level of .

Evaluation procedure and parameters. We follow the same evaluation methodology of Giménez-Pérez et al. [13], to ensure a fair comparison. Furthermore, we use the same kernels, namely the presence bits string kernel () and the intersection string kernel (), and the same range of character n-grams (5-8). To compute the string kernels, we used the open-source code provided by Ionescu et al. [23, 20]. For the transductive kernel classifier, we select

unlabeled test samples to be included in the training set for the second round of training. We choose Kernel Ridge Regression

[38] as classifier and set its regularization parameter to in all our experiments. Although Giménez-Pérez et al. [13] used a different classifier, namely Kernel Discriminant Analysis, we observed that Kernel Ridge Regression produces similar results () when we employ the same string kernels. As Giménez-Pérez et al. [13], we evaluate our approach in two cross-domain settings. In the multi-source setting, we train the models on all domains, except the one used for testing. In the single-source setting, we train the models on one of the four domains and we independently test the models on the remaining three domains.

Method DB EB KB BD ED KD
SFA [32]

CORAL [40]
- - - -

TR-TrAdaBoost [15]

 [13]

 [13]
* * * * * *

* * * * * *

+ TKC
* * * * * *

+ TKC
* * * * * *
Method BE DE KE BK DK EK
SFA [32]

CORAL [40]
- - - -

TR-TrAdaBoost [15]

 [13]

 [13]
* * * * * *

* * * * * *

+ TKC
* * * * * *

+ TKC
* * * * * *
Table 2: Single-source cross-domain polarity classification accuracy rates (in ) of our transductive approaches versus a state-of-the-art baseline based on string kernels [13], as well as SFA [32], CORAL [40] and TR-TrAdaBoost [15]. The best accuracy rates are highlighted in bold. The marker * indicates that the performance is significantly better than the best baseline string kernel according to a paired McNemar’s test performed at a significance level of .

Results in multi-source setting. The results for the multi-source cross-domain polarity classification setting are presented in Table 1. Both the transductive presence bits string kernel () and the transductive intersection kernel () obtain better results than their original counterparts. Moreover, according to the McNemar’s test [9], the results on the DVDs, the Electronics and the Kitchen target domains are significantly better than the best baseline string kernel, with a confidence level of . When we employ the transductive kernel classifier (TKC), we obtain even better results. On all domains, the accuracy rates yielded by the transductive classifier are more than better than the best baseline. For example, on the Books domain the accuracy of the transductive classifier based on the presence bits kernel () is above the best baseline () represented by the intersection string kernel. Remarkably, the improvements brought by our transductive string kernel approach are statistically significant in all domains.

Results in single-source setting. The results for the single-source cross-domain polarity classification setting are presented in Table 2. We considered all possible combinations of source and target domains in this experiment, and we improve the results in each and every case. Without exception, the accuracy rates reached by the transductive string kernels are significantly better than the best baseline string kernel [13], according to the McNemar’s test performed at a confidence level of . The highest improvements (above ) are obtained when the source domain contains Books reviews and the target domain contains Kitchen reviews. As in the multi-source setting, we obtain much better results when the transductive classifier is employed for the learning task. In all cases, the accuracy rates of the transductive classifier are more than better than the best baseline string kernel. Remarkably, in four cases (EB, ED, BK and DK) our improvements are greater than . The improvements brought by our transductive classifier based on string kernels are statistically significant in each and every case. In comparison with SFA [32], we obtain better results in all but one case (KD). Remarkably, we surpass the other state-of-the-art approaches [40, 15] in all cases.

6 Conclusion

In this paper, we presented two domain adaptation approaches that can be used together to improve the results of string kernels in cross-domain settings. We provided empirical evidence indicating that our framework can be successfully applied in cross-domain text classification, particularly in cross-domain English polarity classification. Indeed, the polarity classification experiments demonstrate that our framework achieves better accuracy rates than other state-of-the-art methods [32, 3, 12, 40, 13, 15]. By using the same parameters across all the experiments, we showed that our transductive transfer learning framework can bring significant improvements without having to fine-tune the parameters for each individual setting. Although the framework described in this paper can be generally applied to any kernel method, we focused our work only on string kernel approaches used in text classification. In future work, we aim to combine the proposed transductive transfer learning framework with different kinds of kernels and classifiers, and employ it for other cross-domain tasks.

References

  • [1] Bhatt, S.H., Semwal, D., Roy, S.: An Iterative Similarity based Adaptation Technique for Cross-domain Text Classification. In: Proceedings of CONLL. pp. 52–61 (2015)
  • [2] Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of ACL. pp. 187–205 (2007)
  • [3] Bollegala, D., Weir, D., Carroll, J.: Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus. IEEE Transactions on Knowledge and Data Engineering 25(8), 1719–1731 (2013)
  • [4] Butnaru, A.M., Ionescu, R.T.: UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row. In: Proceedings of VarDial Workshop of COLING. pp. 77–87 (2018)
  • [5] Ceci, M.: Hierarchical Text Categorization in a Transductive Setting. In: Proceedings of ICDMW. pp. 184–191 (December 2008)
  • [6] Chang, W.C., Wu, Y., Liu, H., Yang, Y.: Cross-Domain Kernel Induction for Transfer Learning. In: Proceedings of AAAI. pp. 1763–1769 (February 2017)
  • [7] Cozma, M., Butnaru, A., Ionescu, R.T.: Automated essay scoring with string kernels and word embeddings. In: Proceedings of ACL. pp. 503–509 (2018)
  • [8] Daumé III, H.: Frustratingly Easy Domain Adaptation. In: Proceedings of ACL. pp. 256–263 (2007)
  • [9] Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1923 (Oct 1998)
  • [10] Escalante, H.J., Solorio, T., Montes-y-Gómez, M.: Local Histograms of Character N-grams for Authorship Attribution. In: Proceedings of ACL: HLT. vol. 1, pp. 288–298 (2011)
  • [11]

    Fernández, A.M., Esuli, A., Sebastiani, F.: Distributional Correspondence Indexing for Cross-lingual and Cross-domain Sentiment Classification. Journal of Artificial Intelligence Research 55(1), 131–163 (Jan 2016)

  • [12] Franco-Salvador, M., Cruz, F.L., Troyano, J.A., Rosso, P.: Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowledge-Based Systems 86, 46–56 (2015)
  • [13] Giménez-Pérez, R.M., Franco-Salvador, M., Rosso, P.: Single and Cross-domain Polarity Classification using String Kernels. In: Proceedings of EACL. pp. 558–563 (April 2017)
  • [14] Guo, Y., Xiao, M.: Transductive Representation Learning for Cross-Lingual Text Classification. In: Proceedings of ICDM. pp. 888–893 (December 2012)
  • [15] Huang, X., Rao, Y., Xie, H., Wong, T.L., Wang, F.L.: Cross-Domain Sentiment Classification via Topic-Related TrAdaBoost. In: Proceedings of AAAI. pp. 4939–4940 (2017)
  • [16] Ifrim, G., Weikum, G.: Transductive Learning for Text Classification Using Explicit Knowledge Models. In: Proceedings of PKDD. pp. 223–234 (2006)
  • [17] Ionescu, R.T.: A Fast Algorithm for Local Rank Distance: Application to Arabic Native Language Identification. In: Proceedings of ICONIP. vol. 9490, pp. 390–400 (2015)
  • [18] Ionescu, R.T., Butnaru, A.: Learning to Identify Arabic and German Dialects using Multiple Kernels. In: Proceedings of VarDial Workshop of EACL. pp. 200–209 (2017)
  • [19] Ionescu, R.T., Butnaru, A.M.: Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set. In: Proceedings of EMNLP (2018)
  • [20]

    Ionescu, R.T., Popescu, M.: Native Language Identification with String Kernels. In: Knowledge Transfer between Computer Vision and Text Mining, chap. 8, pp. 193–227. Advances in Computer Vision and Pattern Recognition, Springer International Publishing (2016)

  • [21] Ionescu, R.T., Popescu, M.: UnibucKernel: An Approach for Arabic Dialect Identification based on Multiple String Kernels. In: Proceedings of VarDial Workshop of COLING. pp. 135–144 (2016)
  • [22] Ionescu, R.T., Popescu, M.: Can string kernels pass the test of time in native language identification? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. pp. 224–234 (2017)
  • [23] Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? a language-independent approach to native language identification. In: Proceedings of EMNLP. pp. 1363–1373 (October 2014)
  • [24] Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: Insights from behind the curtains. Computational Linguistics 42(3), 491–525 (2016)
  • [25] Joachims, T.: Transductive Inference for Text Classification Using Support Vector Machines. In: Proceedings of ICML. pp. 200–209 (1999)
  • [26] Li, T., Sindhwani, V., Ding, C., Zhang, Y.: Knowledge Transformation for Cross-domain Sentiment Classification. In: Proceedings of SIGIR. pp. 716–717 (2009)
  • [27] Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
  • [28] Long, M., Wang, J., Ding, G., Pan, S.J., Yu, P.S.: Adaptation Regularization: A General Framework for Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 26(5), 1076–1089 (2014)
  • [29]

    Lui, M., Baldwin, T.: Cross-domain feature selection for language identification. In: Proceedings of IJCNLP. pp. 553–561 (2011)

  • [30] Luo, K.H., Deng, Z.H., Yu, H., Wei, L.C.: JEAM: A Novel Model for Cross-Domain Sentiment Classification Based on Emotion Analysis. In: Proceedings of EMNLP. pp. 2503–2508 (2015)
  • [31] Nelakurthi, A.R., Tong, H., Maciejewski, R., Bliss, N., He, J.: User-guided Cross-domain Sentiment Classification. In: Proceedings of SDM (2017)
  • [32] Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain Sentiment Classification via Spectral Feature Alignment. In: Proceedings of WWW. pp. 751–760 (2010)
  • [33] Ponomareva, N., Thelwall, M.: Semi-supervised vs. Cross-domain Graphs for Sentiment Analysis. In: Proceedings of RANLP. pp. 571–578 (September 2013)
  • [34] Popescu, M., Grozea, C.: Kernel methods and string kernels for authorship analysis. In: Proceedings of CLEF (Online Working Notes/Labs/Workshop) (September 2012)
  • [35] Popescu, M., Grozea, C., Ionescu, R.T.: HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages. In: Proceedings of KES. pp. 1755–1763 (2017)
  • [36] Popescu, M., Ionescu, R.T.: The Story of the Characters, the DNA and the Native Language. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. pp. 270–278 (June 2013)
  • [37] Sener, O., Song, H.O., Saxena, A., Savarese, S.: Learning Transferrable Representations for Unsupervised Domain Adaptation. In: Proceedings of NIPS. pp. 2110–2118 (2016)
  • [38] Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
  • [39] Shu, L., Latecki, L.J.: Transductive Domain Adaptation with Affinity Learning. In: Proceedings of CIKM. pp. 1903–1906. ACM (2015)
  • [40] Sun, B., Feng, J., Saenko, K.: Return of Frustratingly Easy Domain Adaptation. In: Proceedings of AAAI. pp. 2058–2065 (2016)
  • [41] Zampieri, M., Malmasi, S., Ljubešić, N., Nakov, P., Ali, A., Tiedemann, J., Scherrer, Y., Aepli, N.: Findings of the VarDial Evaluation Campaign 2017. In: Proceedings of VarDial Workshop of EACL. pp. 1–15 (2017)
  • [42] Zhuang, F., Luo, P., Yin, P., He, Q., Shi, Z.: Concept Learning for Cross-domain Text Classification: A General Probabilistic Framework. In: Proceedings of IJCAI. pp. 1960–1966 (2013)