A new semi-supervised inductive transfer learning framework: Co-Transfer

08/18/2021
by   Zhe Yuan, et al.
0

In many practical data mining scenarios, such as network intrusion detection, Twitter spam detection, and computer-aided diagnosis, a source domain that is different from but related to a target domain is very common. In addition, a large amount of unlabeled data is available in both source and target domains, but labeling each of them is difficult, expensive, time-consuming, and sometime unnecessary. Therefore, it is very important and worthwhile to fully explore the labeled and unlabeled data in source and target domains to settle the task in target domain. In this paper, a new semi-supervised inductive transfer learning framework, named Co-Transfer is proposed. Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the source domain to the target domain, and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the target domain to the source domain, using bootstraped samples from the original labeled data. In each round of co-transfer, each group of TrAdaBoost classifiers are refined using the carefully labeled data. Finally, the group of TrAdaBoost classifiers learned to transfer from the source domain to the target domain produce the final hypothesis. Experiments results illustrate Co-Transfer can effectively exploit and reuse the labeled and unlabeled data in source and target domains.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/24/2020

MiCo: Mixup Co-Training for Semi-Supervised Domain Adaptation

Semi-supervised domain adaptation (SSDA) aims to adapt models from a lab...
10/15/2020

Self-training for Few-shot Transfer Across Extreme Task Differences

All few-shot learning techniques must be pre-trained on a large, labeled...
09/03/2018

Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification

We consider the cross-domain sentiment classification problem, where a s...
05/22/2018

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

The focus in machine learning has branched beyond training classifiers o...
08/05/2018

Autoencoder Based Sample Selection for Self-Taught Learning

Self-taught learning is a technique that uses a large number of unlabele...
01/20/2020

Heterogeneous Transfer Learning in Ensemble Clustering

This work proposes an ensemble clustering method using transfer learning...
02/14/2019

Transfer Learning for Sequence Labeling Using Source Model and Target Data

In this paper, we propose an approach for transferring the knowledge of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Transfer learning has been being widely utilized to transfer knowledge from a source domain to a related target domain even they are in different feature spaces or with different distributions (Pan and Yang, 2010; Zhuang et al., 2021). According to Yang (Pan and Yang, 2010), transfer learning can be categorized under three sub-settings, inductive transfer learning, transductive transfer learning, and unsupervised transfer learning. In the inductive transfer learning setting, the data in target domain are all labeled no matter the data in source domain are labeled or not. In transductive transfer learning setting, the data in target domain are unlabeled while the data in source domain are labeled. While in unsupervised transfer learning setting, the data in target and source domains are all unlabeled.

With different setting, semi-supervised transfer learning (Abuduweili et al., 2021; Liu et al., 2012; Tang et al., 2017; Wei et al., 2019) has been concerned recently. It is utilized to handle many real applications in which only part of the data in target domain are labeled while the data in source domain are labeled or a model is pretrained in source domain. However, it is more common in many real application scenarios that a small amount of labeled data in source and target domains are available while a large amount of unlabeled data in source and target domains are also available. For example, in application of computer-aided diagnosis (CAD) systems (Chebli et al., 2018), only a small amount of images can be carefully diagnosed by medical experts while a large number of unlabeled medical images are very easy to obtain. Due to aging or upgrading of equipments, the pattern of the medical images collected before may be different with the collected at the current time (Mohanasundaram et al., 2019; Mustafa et al., 2021). Other examples include network intrusion detection and Twitter spam detection. That is to say, the distributions of the data collected in two different time interval are different. Thus, the following challenge is how to transfer knowledge from the labeled and unlabeled data in source domain to help learning on the labeled and unlabeled data in target domain?

In this paper, to handle the problem mentioned above, a new semi-supervised inductive transfer learning paradigm named Co-Transfer is proposed. It combines semi-supervised learning with instance-based transfer learning. Co-Transfer first generates three TrAdaBoost

(Dai et al., 2007) classifiers for transferring knowledge from the source domain to the target domain, and meanwhile another three TrAdaBoost classifiers are generated for transferring knowledge from the target domain to the source domain, using bootstraped samples from the original labeled data. In each round of co-transfer, each group of TrAdaBoost classifiers are refined using the newly labeled data, in which one part is labeled by itself and the other part is labeled by another group of TrAdaBoost classifiers. It is should be noted that the newly labeled samples are carefully selected under certain conditions, which is used in tri-training (Zhou and Li, 2005). Finally, the group of TrAdaBoost classifiers learned to transfer from the source domain to the target domain produce the final hypothesis via majority voting. Experiments on UCI datasets (Dua and Graff, 2017) and text classification tasks verify Co-Transfer can effectively exploit and reuse the labeled and unlabeled data of the source and target domains.

The rest of this paper is organized as follows. Section 2 briefly reviews semi-supervised learning, instance-based transfer learning and semi-supervised transfer learning. Section 3 presents Co-Transfer. Section 4 reports the experimental results on UCI data sets and the text classification task. Finally, Section 5 concludes this paper.

2 Related work

In this section, we briefly review the related work of semi-supervised learning, instance-based transfer learning, and semi-supervised transfer learning.

2.1 Semi-supervised learning

For semi-supervised learning (Zhu, 2008; Van Engelen and Hoos, 2020; Karisani and Karisani, 2021), the main idea is to utilize a small number of labeled samples and a large amount of unlabeled samples to improve the performance of the learned hypothesis. One of the assumptions for semi-supervised learning is that the unlabeled examples hold the same distribution as that held by the labeled ones. There are many classical approaches have been proposed and can be mainly divided into several categories, like generative models (Diplaros et al., 2007; Scrucca, 2021), wrapper methods (Blum and Mitchell, 1998; Zhou and Li, 2005; Li and Zhou, 2007), low-density separation models (Cervantes et al., 2020), and graph-based methods (Shimomura et al., 2021) etc. Among them, the wrapper method is a type of simple and popular method, which first trains classifiers on the original labeled data, and then to utilize the predictions of the resulting classifiers to generate newly labeled data. The classifiers is then re-trained on these pseudo-labeled data in addition to the existing labeled data.

Avrim Blum and Tom Mitchell proposed the Co-training algorithm (Blum and Mitchell, 1998), which requires two sufficient and redundant views. Co-training trains two classifiers separately on two different views, and uses the predictions of each classifier on unlabeled examples to augment the training set of the other. Dasgupta et al. (Dasgupta et al., 2002) have shown that when the requirement is met, the co-trained classifiers could make fewer generalization errors by maximizing their agreement over the unlabeled data. Since two sufficient and redundant views can hardly be met in most scenarios. Zhou et al. (Zhou and Li, 2005) proposed the tri-training algorithm, which does not require two sufficient and redundant views. Later, Li et al. (Li and Zhou, 2007) proposed the co-forest algorithm by extending tri-training to more classifiers. In tri-training and co-forest, to ensure that the newly labeled samples can bring positive effects, fuzzy set theory (Angluin and Laird, 1988) is introduced. Some other wrapper methods can be found in (Van Engelen and Hoos, 2020).

2.2 Instance-based transfer learning

The intuitive idea of the instance-based inductive transfer methods is to reuse certain parts of the data in source domain together with the target domain data to train a high-accurate classifier.

Dai et al. (Dai et al., 2007) proposed TrAdaBoost, which is an extension of AdaBoost. TrAdaBoost attempts to iteratively re-weight the source domain data to reduce the effect of the bad source data while encourage the good source data to contribute more for target domain. In addition, they also theoretically analyzed that TrAdaBoost can converge well. Similarly, Kamishima et al. (Kamishima et al., 2009) proposed a TrBagg method, which is an extension of bagging. The TrBagg training process consists of two stages: learning and filtering. In the learning stage, weak classifiers are trained on bootstraped samples from the target and source data sets. In the filtering phase, these weak classifiers are evaluated on the target domain data. Weak classifiers with low accuracy will be discarded while the weak classifiers with high accuracy will be selected to produce the final hypothesis. Later, Shi et al. (Shi et al., 2009) proposed the COITL algorithm for inductive transfer learning by extending the co-training method for semi-supervised learning. The key idea of COITL is to replace the step of labeling the unlabeled examples by re-weighting the source domain data. In COITL, two base learners are trained on the target domain data, and each learner is refined using the weighted source domain examples predicted by the other. In addition, a number of other instance-based inductive transfer methods have been proposed to extend single source domain to multiple source domains (Cheng et al., 2014; Ding et al., 2016; Yao and Doretto, 2010; Yang et al., 2020).

2.3 Semi-supervised transfer learning

Semi-supervised transfer learning has not received widespread attention, to the best of our knowledge, and only several related papers are found. According to these papers, semi-supervised transfer learning always aims to tackle learning on the labeled and unlabeled data in target domain. Due to different forms of using source domain, it is mainly divided into two categories: the source domain data is directly available and the source domain data is not available but a pre-trained models is available.

Liu et al. (Liu et al., 2012) proposed a tri-training based transfer learning algorithm (TriTransfer). In TriTransfer, three initial classifiers are generated from the labeled data in source domain and the originally labeled data in target domain, and then an unlabeled example in target domain is labeled and added to the labeled data for a classifier if other two classifiers agree on its label. After an expanded labeled data set is obtained, the classifier is re-trained. The final classifier is a weighted combination of the three classifiers.

Tang et al. (Tang et al., 2017)

proposed a semi-supervised transfer learning algorithm to use Convolutional Neural Network (CNN) for Chinese character recognition. First, a CNN model is trained by massive labeled samples in source domain. Then the CNN model is fine-tuned by limited labeled samples in target domain. Finally, the fine-tuned CNN model is continuously trained with the unlabeled samples in target domain and kept fine-tuning the model with the labeled samples in target domain to minimize the loss.

Wei et al. (Wei et al., 2019) proposed a semi-supervised transfer learning algorithm for image rain removal. The network is therefore trained to adapt real unsupervised diverse rain types through transferring from the supervised synthesized rain by elaborately formulating the residual between an unlabeled rainy image and its expected network output as a specific parameterized rain streaks distribution.

Abuduweili et al. (Abuduweili et al., 2021) proposed a semi-supervised transfer learning algorithm to utilize both the powerful pre-trained models from source domain as well as the labeled/unlabeled data in target domain. Adaptive knowledge consistency (AKC) on the examples between the source and target model is utilized to transfer knowledge from the pre-trained model and help generalize the target model, while adaptive representation consistency (ARC) on the target model between labeled and unlabeled examples is utilized to adjust the representation produced by the supervised learning to the real target domain.

3 Co-Transfer

3.1 Problem formulation

To formalize the definition of semi-supervised inductive transfer learning proposed in this paper, some notations are introduced.

In this paper, it is assumed that the source domain and the target domain have the same feature space but different distributions. denotes the feature space, and denotes the label space. is a data instance. Both and are divided into two parts: labeled part and unlabeled part. The labeled part includes a few labeled instances, and , where is an instance in the source domain, and is the corresponding class label, while is an instance in the target domain, and is the corresponding class label. The unlabeled part includes unlabeled data, and . Furthermore, a test data set , which has the same distribution as the target domain , is available. For simplicity, in this paper, we set =. In addition, a large amount of unlabeled data in the source and target domain are available, usually and . The objective of the proposed semi-supervised inductive transfer learning is to utilize and to learn a function on the target domain, such that can correctly predict the label of instances in .

3.2 Co-Transfer

In Co-Transfer, two ensemble classifiers , , and , , are initially trained. By TrAdaBoost (Dai et al., 2007), each base classifier of is trained on the bootstraped samples from the original labeled data sets and for transfer learning from the target domain to the source domain. Correspondingly, each base classifier of is trained on the bootstraped samples from the original labeled data sets and for transfer learning from the source domain to the target domain. Thus, the bootstrap strategy keeps the diversity between the base classifiers in and . The mutual transfer learning between the source and target domains like this will iterate many rounds. Each base classifier of and is then refined with the newly-labeled samples.

In detail, in each iteration round of Co-Transfer, the unlabeled samples in (, corresponds to the source domain, while corresponds to the target domain) are labeled and added to the original labeled data as target data to retrain the classifier (=). An unlabeled sample in can be labeled if the other two classifiers = () agree on labeling this sample. The newly-labeled data set is named as . Meanwhile, the unlabeled samples in are labeled and added to the original labeled data as source data to retrain . An unlabeled sample in can be labeled if all the base classifiers in the ensemble agree on labeling this sample. The newly-labeled data set is named as . In summary, and are employed as the source and target data respectively to retrain by TrAdaBoost. Note that all the unlabeled samples that are selected by or () are not removed from . Therefore, they might be selected again in the following iterations. When none of () changes the iteration ends. The final hypothesis for the target domain is produced via majority voting by .

Like Tri-training (Zhou and Li, 2005), the employment of the ensemble of classifiers and (

) not only serves as a simple way to avoid utilizing complicated confidence and transferable estimation methods but also makes the labeling of unlabeled data more reliable than a single classifier does. However, although the ensemble

or () generalizes better than a single classifier, the misclassification of an unlabeled example is inevitable. Therefore, () will receive noisy examples as its source and target data, which might bias the refinement of . Fortunately, Zhou et al. (Zhou and Li, 2005) has proposed that the negative influence caused by such noise could be compensated by augmenting the labeled set with sufficient amount of newly labeled examples under certain conditions. Thus, being inspired from Zhou et al. (Zhou and Li, 2005), the same strategy is taken to restrict the influence caused by noise newly labeled samples in two cases.

The first case is how to select unlabeled data in to be labeled as target data for retraining (=). Let and denote the set of samples that are labeled for retraining in the - round of iteration and -- round of iteration whose size are and , respectively. and denote the upper boundary of the classification error rate of in the - round of iteration and -- round of iteration, respectively. Therefore, according to Zhou et al. (Zhou and Li, 2005), the following constraint is obtained.

(1)

According to Eq.1, when and are met, might still be violated since may occur. To make Eq.1 hold again in this case, must be subsampled so that . Therefore, should be randomly subsampled using following function:

(2)

The second case is how to select unlabeled data in to be labeled as source data for retraining (=). Let and denote the set of samples that are labeled in the - round of iteration and -- round of iteration whose size are and , respectively. and denote the upper boundary of the classification error rate of in the - round of iteration and -- round of iteration, respectively. Therefore, we have the following constraints:

(3)

According to Eq.3, when and , is met, might still be violated since may occur. To make Eq.3 hold again in this case, must be subsampled so that . Therefore, should be randomly subsampled using the following function:

(4)

The constraint introduced in the second case brings more reliably pseudo-labeled samples to the source domain, makes the source domain become more accurate, and then leads to more reliably pseudo-labeled samples added to the target domain. This is very important for the iteration process in Co-Transfer.

Input:
Output:
;
for  do
       ;
       for  do
             ;
             ;
             ;
             ;
            
      ;
      
repeat
       for  do
             ;
             ;
             for  do
                   ;
                   ;
                   if  then
                         ;
                         if  then
                               if  then
                                     ;
                                    
                              else if  then
                                     %refer Eq.2;
                                     ;
                                    
                              
                        
                  
            if  then
                   ;
                   if  then
                         if  then
                               ;
                              
                        else if  then
                               %refer Eq.4;
                               ;
                              
                        
                  
            
      for  do
             for  do
                   if  then
                         ;
                         ;
                         ;
                         ;
                        
                  
            ;
            
      
until ;
Algorithm 1 Co-Transfer

The pseudo-code of Co-Transfer is presented in Algorithm 1. Lines 5-7 denote that and are obtained by from the original labeled data in and as the source and target data, respectively, and then a TrAdaBoost classifier is trained. In line 13, the function attempts to estimate the classification error rate of . In line 16, attempts to estimate the classification error rate of the hypothesis derived from the combination of and . Here, both error rates are estimated on . In detail, the error rate of is approximated through dividing the number of labeled samples on which the three classifiers of make incorrect classification by the number of labeled samples on which the classification made by . Similarly, the error rate of the hypothesis of is approximated through dividing the number of labeled samples on which both and make incorrect classification by the number of labeled samples on which the classification made by is the same as that made by . In line 18, the function is to label the unlabeled samples in when and agree for labeling. In line 26, the function is to screen the pseudo-labeled samples that are consistent with the ensemble from , , and . In lines 23 and 31, the function randomly removes number of examples from . Lines 33-40 employ the newly labeled source and target data to refine each base classifier of . The data flow diagram during the training process of Co-Transfer is shown in Figure 1.

Figure 1: The data flow diagram of Co-Transfer training process

Co-Transfer combines instance-based transfer learning and semi-supervised learning to fully explore the knowledge in the existing data, including the source domain knowledge that is different from the target domain distribution but related, and the knowledge of the unlabeled samples in the source and target domains.

4 Experiments

4.1 Datasets

Five UCI111http://www.ics.uci.edu/˜mlearn/MLRepository.html data sets and the Reuters222http://www.daviddlewis.com/resources/testcollections/ text classification task data set are used in the experiments. Table 1 tabulates the detailed information of these data sets, where “attr" represents the number of attributes, “" and “" represents the size of the source domain and the target domain, respectively, “class" represents the number of labels of data, “" represents the prediction error rate on made by the classifier trained by the source domain data, while “" represents the prediction error rate on made by the classifier trained by the target domain data. is employed to evaluate the transferability from the source domain to the target domain, while is used to evaluate the transferability from the target domain to the source domain. Such operations can be found in (Shi et al., 2009).

Date set attr class
Mushroom 22 4608 3516 2 0.332 0.646
Waveform 21 1722 1582 2 0.171 0.135
Magic 10 9718 9302 2 0.332 0.377
Splice 60 795 740 2 0.116 0.085
Vote 16 220 215 2 0.079 0.118
Orgs vs People 4771 1237 1208 2 0.408 0.335
Orgs vs Places 4415 1016 1043 2 0.392 0.365
People vs Places 4562 1077 1077 2 0.463 0.443
Table 1: EXPERIMENTAL DATA SETS

The five UCI data sets are first preprocessed for semi-supervised transfer learning. Among them, for the Mushroom, Waveform, Magic, and Splice data sets, the preprocessing method is the same as the method taken in the paper (Shi et al., 2009). These data sets have been proved that the source domain could assist the target domain task to better learn the hypothesis, that is, there is a positive transfer. To verify the effectiveness of the Co-Transfer algorithm on small-scale data set, the Vote data set is introduced. For each instance in the Vote data set, if its fourth attribute value is 0, it is put into the target domain ; otherwise, the instance is added to the source domain . The preprocessed Reuters text classification data set is used to evaluate the classification accuracy of Co-Transfer on real application task. It can be downloaded directly, which includes three data sets, namely Orgs vs People, Orgs vs Places and People vs places.

4.2 Experimental settings

For each data set, five-fold cross validation is employed for reliable evaluation. In each fold, the data in the target domain are divided into the target domain training data set and the test data set . Then, the source domain data set and the target training data set are randomly partitioned into two labeled sets and and two unlabeled set and , respectively, with a given label rate, i.e., 10%, 20%, 40%, and 50%. In addition, in each fold of cross validation, the source domain training data set is randomly partitioned into and twice, and the target domain training data set is randomly partitioned into and three times. Therefore, the final error rate is the average of 30=(5*6) test results.

To verify Co-Transfer could not only benefit from the labeled data in the source and target domains but also from the unlabeled data in the source and target domains, four baseline methods are introduced for comparison. Table 2 lists the data strategy used by Co-Transfer and the baseline methods, including DT, TrAdaBoost, Tri-training and . DT, which is trained only on , is introduced as the worst classifier for comparison, while , which is trained on , , , and provided with all ground-truth labels of all unlabeled data, is introduced as the best classifier for comparison. TrAdaBoost, which is trained on and , is introduced for comparison to illustrate Co-Transfer can benefit from unlabeled samples in addition to reuse the source domain data. Tri-training, which is trained on and , is introduced for comparison to show Co-Transfer could benefit from the source domain data.

Baseline Training Data Test Data

-
-
Table 2: The data strategy used by the algorithms

Since the DT and Tri-training algorithms do not require hyperparameters, they can be run directly. For TrAdaBoost, Co-Transfer and

, the grid search is used to find suitable parameters. As a result, for UCI data sets, =10 and =10 are used in Mushroom, =65 and =4 in Waveform, =35 and =20 in Magic, =15 and =50 in Splice, and =5 and =50 in Vote. For the text classification data sets, =60 and =4 are used in Orgs vs People; =30; =4 in Orgs vs Places; =60; =3 in People vs Places.

4.3 Experimental results and analysis

Tables 3-6

tabulate the error rates of Co-Transfer and the baseline methods under different label rates. The significance is checked using a standard t-test with 95% confidence between Co-Transfer and each baseline method. The

/ indicates that Co-Transfer is significantly better or worse than the corresponding baseline method, while the means that Co-Transfer is not significantly better than the compared baseline method with 95% confidence. The row avg. shows the average results over all the experimental data sets.

DataSet - -
Initial Final Iter Initial Final Iter

Mushroom
0.004 0.006 0.008 0.004 2.4 0.118 0.005 3 0.0
Waveform 0.201 0.149 0.193 0.19 3 0.179 0.137 3 0.158
Magic 0.155 0.159 0.161 0.149 3 0.157 0.148 3.43 0.113
Splice 0.135 0.122 0.132 0.112 2.8 0.16 0.111 3 0.055
Vote 0.054 0.045 0.064 0.051 2.27 0.086 0.044 2.87 0.02
Orgs vs People 0.264 0.215 0.298 0.258 3 0.377 0.191 3.07 0.153
Orgs vs Places 0.282 0.264 0.307 0.274 3 0.384 0.234 3.17 0.179
People vs Places 0.274 0.227 0.303 0.264 3 0.417 0.206 3 0.159
avg. 0.171 0.148 0.183 0.163 2.81 0.235 0.135 3.07 0.105
Table 3: The error rates of the compared algorithms under the label rate of 10%
DataSet - -
Initial Final Iter Initial Final Iter

Mushroom
0.002 0.001 0.003 0.002 2.2 0.081 0.002 3 0.0
Waveform 0.179 0.146 0.17 0.169 3 0.159 0.127 3 0.159
Magic 0.149 0.147 0.146 0.139 3 0.147 0.127 3.53 0.116
Splice 0.103 0.095 0.101 0.095 2.93 0.113 0.092 3 0.054
Vote 0.047 0.036 0.047 0.039 2.2 0.045 0.039 3.07 0.018
Orgs vs People 0.213 0.182 0.22 0.198 3 0.353 0.155 3.27 0.15
Orgs vs Places 0.242 0.238 0.252 0.231 3 0.365 0.205 3.6 0.186
People vs Places 0.217 0.201 0.249 0.199 3 0.42 0.17 3.43 0.155
avg. 0.144 0.131 0.149 0.134 2.79 0.21 0.115 3.24 0.105
Table 4: The error rates of the compared algorithms under the label rate of 20%
DataSet - -
Initial Final Iter Initial Final Iter

Mushroom
0.0 0.0 0.001 0.0 2.07 0.051 0.0 3 0.0
Waveform 0.174 0.144 0.154 0.165 3 0.157 0.135 3 0.158
Magic 0.144 0.134 0.146 0.13 3 0.136 0.113 3.37 0.109
Splice 0.085 0.09 0.072 0.07 3 0.089 0.084 3 0.054
Vote 0.029 0.025 0.031 0.025 2.47 0.027 0.023 3 0.016
Orgs vs People 0.18 0.194 0.177 0.165 3 0.311 0.147 3.8 0.154
Orgs vs Places 0.197 0.225 0.222 0.19 3 0.352 0.176 3.8 0.182
People vs Places 0.19 0.19 0.195 0.177 3 0.38 0.142 3.67 0.155
avg. 0.125 0.125 0.125 0.115 2.82 0.188 0.102 3.33 0.104
Table 5: The error rates of the compared algorithms under the label rate of 40%
DataSet - -
Initial Final Iter Initial Final Iter

Mushroom
0.0 0.0 0.001 0.0 2.13 0.04 0.0 3 0.0
Waveform 0.176 0.151 0.159 0.167 3 0.151 0.138 3.03 0.159
Magic 0.143 0.136 0.15 0.13 3 0.138 0.108 3.4 0.111
Splice 0.082 0.083 0.076 0.069 3 0.088 0.076 3 0.054
Vote 0.029 0.027 0.034 0.026 2.53 0.029 0.026 3 0.016
Orgs vs People 0.17 0.2 0.175 0.162 3 0.313 0.142 4.03 0.152
Orgs vs Places 0.187 0.22 0.188 0.168 3 0.352 0.168 4 0.178
People vs Places 0.183 0.196 0.181 0.166 3 0.341 0.142 4.03 0.159
avg. 0.121 0.127 0.12 0.111 2.83 0.182 0.1 3.44 0.104
Table 6: The error rates of the compared algorithms under the label rate of 50%

In the columns of Tri-training and Co-Transfer, the Initial column shows the error rate of the hypothesis trained only on the original labeled data, which is for Tri-training and and for Co-Transfer. The Final column shows the error rate of the hypothesis being further refined with the pseudo-labeled data, which are from for Tri-training and and for Co-Transfer. Iter represents the number of learning rounds from Initial to Final.

Tables 3-6 show that not only the unlabeled data in the source and target domains can be explored, but the knowledge of the source domain can also be reused to improve the accuracy of the hypothesis learned by Co-Transfer under different label rates. Under different label rates, the final hypothesis learned by Co-Transfer reaches the lowest average error rates except . However, it was surprising, when the average error rate of is compared with that of Co-Transfer, it can be observed that the accuracy of the final hypothesis learned by Co-Transfer is comparable to that of the classifier learned by when the label rate is set at 40% and 50%.

To analyze Co-Transfer can benefit from unlabeled data, we compare Co-Transfer with TrAdaBoost which do not use unlabeled data to improve the accuracy of the learned hypothesis. From Tables 3-6, it can be observed that the average error rate of the final hypothesis learned by Co-Transfer is lower that the classifier learned by TrAdaBoost under different label rates. In detail, by averaging on all the data sets, Co-Transfer achieves an average error rate of 0.135 and 0.115 under 10% and 20% label rates, respectively. In addition, when the label rate increases to 40% and 50%, TrAdaBoost cannot further benefit from source domain samples. However, even in this case, Co-Transfer can still explore the unlabeled examples to achieve better performance than that of TrAdaBoost. When the label rate is 40% and 50%, Co-Transfer reaches an average error rate of 0.102 and 0.1 respectively.

In addition to benefiting from unlabeled examples, Co-Transfer can also reuse the knowledge of the source domain to further improve the accuracy of the learned hypothesis. To investigate the effectiveness of Co-Transfer, Tri-training, a well-known semi-supervised learning algorithm which do not benefit from source domain, is introduced for comparison. Since both Co-Transfer and Tri-training are iterative method, three different indicators are compared, including Initial, Final and Iter. From Tables 3-6, it can be observed that the error rate of the initial hypothesis learned by Co-Transfer is higher than that of the initial hypothesis learned by Tri-training on all the data sets under different label rates. However, the error rate of the final hypothesis learned by Co-Transfer is lower than that of Tri-training while the average number of iterations of Co-Transfer is slightly more than that of Tri-training by counting on all the data sets under all label rates. In detail, the average number of iterations of Co-Transfer is only 3.27 while that of Tri-training is 2.81. Moreover, the average error rate of the final hypothesis learned by Co-Transfer is reduced by 2.8%, 1.9%, 1.3%, and 1.1% on all experimental data sets, respectively, than those achieved by Tri-training under different label rates. These results can illustrate the iterative process of Co-Transfer is more effective than that of Tri-training.

To investigate the effectiveness of Co-Transfer in practical application scenarios, such as text classification tasks, the performance of Co-Transfer on the datasets of Orgs vs People, Orgs vs Places and People vs Places are further analyzed. It can be observed from Tables 3-6 that Co-Transfer can enhance the accuracy of the learned hypothesis by exploring the knowledge of the source domain and the unlabeled data under different label rates. By comparing Co-Transfer with TrAdaBoost and DT respectively, the error rate of the final hypothesis learned by Co-Transfer is reduced by 4.0% and 4.9% on Orgs vs People, 4.2% and 3.2% on Orgs vs Places, and 3.8% and 5.1% on People vs Places by averaging four different label rates respectively. When comparing Co-Transfer and Tri-training, by averaging over four different label rates, such as 10%, 20%, 40%, and 50%, Co-Transfer achieves a reduction in error rate of 3.7% on Orgs vs People, 2.1% on Orgs vs Places, and 3.7% on People vs Places. This results show that Co-Transfer can not only explore the unlabeled data but also reuse the knowledge of the source domain to improve the classification accuracy of the learned hypothesis.

(a) Orgs vs People
(b) Orgs vs Places
(c) People vs Places
Figure 2: The error rate averaged under different label rates over text classification task data sets

At last, to get an insight into the learning process of Co-Transfer, the error rates at each learning iteration are averaged over different label rates on the text classification task data sets, including Orgs vs People, Orgs vs Places, and People vs Places. Note that among Co-Transfer and the baseline methods, only Tri-training and Co-Transfer can iteratively improve the learned hypothesis in training process, while the other three algorithms do not have an iterative learning process. Figure 2 depicts the changes in the average error rates of the compared algorithms from the initial iteration to the final iteration. It can be observed from Figure 2 that the initial average error rate of Co-Transfer is higher than the average error rate of the compared Tri-training algorithm. After the initial iteration, the lines of Co-Transfer are always below those of Tri-training, and the average error rates of Co-Transfer keep on decreasing after utilizing unlabeled data and source domain knowledge and quickly converges within just a few learning iterations. These advantages make Co-Transfer to greatly save the cost of manpower of labeling and reuse pre-existing examples to avoid the waste of knowledge.

(a) Orgs vs People
(b) Orgs vs Places
(c) People vs Places
Figure 3: The error rate is averaged over all label rates with different and

Note that in previous experiments, the depth of the tree and the number of iterations are fixed in Co-Transfer. Different and might affect the complexity of model and influence classification accuracy. Therefore, the error rates of Co-Transfer with different depth of the tree and the number of iterations are further investigated on the text classification task data sets. By averaging over all label rates, the variations of average error rate of Co-Transfer on the text classification data sets are shown in Figure 3. In these figures, the lines with different color represent the changes of the average error rates with different depths of the tree under different iteration times . It can be observed that the values of the parameters and for the optimal performance of Co-Transfer are different on the three data sets. These observation illustrate that Co-Transfer is more sensitive to the parameter selection of the model for different data sets. However, it can be observed from Figure 3 that the error rate declines with the increase of on all datasets.

5 Conclusion

In this paper, a semi-supervised inductive transfer learning framework Co-Transfer is proposed. Co-Transfer finely combine inductive transfer learning and semi-supervised learning , which can reuse the existing knowledge in source domain data and explore unlabeled data to boost the classification accuracy of the learned hypothesis. To ensure this, in Co-Transfer, two groups of three TrAdaBoost classifiers are employed to refine for producing the final hypothesis. In each round of iteration in Co-Transfer, each group of TrAdaBoost classifiers can be refined using the newly labeled data, in which one part is labeled by itself and the other part is labeled by another group of TrAdaBoost classifiers. The newly labeled samples are carefully selected under certain conditions, which is verified in tri-training (Zhou and Li, 2005). Experiments on several UCI data sets and the text classification task data set illustrate the effectiveness of Co-Transfer.

Note that in this paper only the decision tree is used as base classifier in Co-Transfer. In addition, other well-known algorithms, such as Naive Bayes, SVM, and Neural Networks etc, can also be employed as base classifier. In future, these methods can be utilized to test the performance of Co-Transfer with different base classifiers. And further, more data sets, especially the data sets from real applications, should be used to extensively evaluate Co-Transfer. Transferability between the source and target domains is also an important issue for Co-Transfer.

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (61866007), the Natural Science Foundation of Guangxi District (2018GXNSFDA138006), Guangxi Key Laboratory of Trusted Software (KX201721), Image and graphic intelligent processing project of Key Laboratory Fund (GIIP2005,GIIP201505).

References

  • A. Abuduweili, X. Li, H. Shi, C. Xu, and D. Dou (2021) Adaptive consistency regularization for semi-supervised transfer learning. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 6923–6932. Cited by: §1, §2.3.
  • D. Angluin and P. Laird (1988) Learning from noisy examples. Machine Learning 2 (4), pp. 343–370. Cited by: §2.1.
  • A. Blum and T. Mitchell (1998) Combining labeled and unlabeled data with co-training. In

    In COLT: Proceedings of the Workshop on Computational Learning Theory

    ,
    pp. 92–100. Cited by: §2.1, §2.1.
  • J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez (2020)

    A comprehensive survey on support vector machine classification: applications, challenges and trends

    .
    Neurocomputing 408, pp. 189–215. Cited by: §2.1.
  • A. Chebli, A. Djebbar, and H. F. Marouani (2018) Semi-supervised learning for medical application: a survey. In 2018 International Conference on Applied Smart Systems (ICASS), pp. 1–9. Cited by: §1.
  • Y. Cheng, X. Wang, and G. Cao (2014) Multi-source tri-training transfer learning. IEICE TRANSACTIONS on Information and Systems 97 (6), pp. 1668–1672. Cited by: §2.2.
  • W. Dai, Q. Yang, G. Xue, and Y. Yu (2007) Boosting for transfer learning. In In ICML ’07: Proceedings of the 24th international conference on Machine learning, pp. 193–200. Cited by: §1, §2.2, §3.2.
  • S. Dasgupta, M. L. Littman, and D. McAllester (2002) PAC generalization bounds for co-training. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Vol. 1, pp. 375–382. Cited by: §2.1.
  • Z. Ding, M. Shao, and Y. Fu (2016) Incomplete multisource transfer learning. IEEE transactions on neural networks and learning systems 29 (2), pp. 310–323. Cited by: §2.2.
  • A. Diplaros, N. Vlassis, and T. Gevers (2007) A spatially constrained generative model and an em algorithm for image segmentation. IEEE Transactions on Neural Networks 18 (3), pp. 798–808. Cited by: §2.1.
  • D. Dua and C. Graff (2017) UCI machine learning repository. External Links: Link Cited by: §1.
  • T. Kamishima, M. Hamasaki, and S. Akaho (2009) TrBagg: a simple transfer learning method and its application to personalization in collaborative tagging. In 2009 Ninth IEEE International Conference on Data Mining, pp. 219–228. Cited by: §2.2.
  • P. Karisani and N. Karisani (2021) Semi-supervised text classification via self-pretraining. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 40–48. Cited by: §2.1.
  • M. Li and Z. Zhou (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37 (6), pp. 1088–1098. Cited by: §2.1, §2.1.
  • X. Liu, H. Zhang, Z. Cai, and G. Wang (2012) A tri-training based transfer learning algorithm. In

    2012 IEEE 24th International Conference on Tools with Artificial Intelligence

    ,
    Vol. 1, pp. 698–703. Cited by: §1, §2.3.
  • R. Mohanasundaram, A. S. Malhotra, R. Arun, and P. Periasamy (2019) Deep learning and semi-supervised and transfer learning algorithms for medical imaging. In Deep Learning and Parallel Computing Environment for Bioengineering Systems, pp. 139–151. Cited by: §1.
  • B. Mustafa, A. Loh, J. Freyberg, P. MacWilliams, M. Wilson, S. M. McKinney, M. Sieniek, J. Winkens, Y. Liu, P. Bui, et al. (2021) Supervised transfer learning at scale for medical imaging. arXiv preprint arXiv:2101.05913. Cited by: §1.
  • S. J. Pan and Q. Yang (2010) A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §1.
  • L. Scrucca (2021) A fast and efficient modal em algorithm for gaussian mixtures.

    Statistical Analysis and Data Mining: The ASA Data Science Journal

    .
    Cited by: §2.1.
  • Y. Shi, Z. Lan, W. Liu, and W. Bi (2009) Extending semi-supervised learning methods for inductive transfer learning. In 2009 Ninth IEEE international conference on data mining, pp. 483–492. Cited by: §2.2, §4.1, §4.1.
  • L. C. Shimomura, R. S. Oyamada, M. R. Vieira, and D. S. Kaster (2021) A survey on graph-based methods for similarity searches in metric spaces. Information Systems 95, pp. 101507. Cited by: §2.1.
  • Y. Tang, B. Wu, L. Peng, and C. Liu (2017) Semi-supervised transfer learning for convolutional neural network based chinese character recognition. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1, pp. 441–447. Cited by: §1, §2.3.
  • J. E. Van Engelen and H. H. Hoos (2020) A survey on semi-supervised learning. Machine Learning 109 (2), pp. 373–440. Cited by: §2.1, §2.1.
  • W. Wei, D. Meng, Q. Zhao, Z. Xu, and Y. Wu (2019) Semi-supervised transfer learning for image rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3877–3886. Cited by: §1, §2.3.
  • Y. Yang, X. Li, P. Wang, Y. Xia, and Q. Ye (2020) Multi-source transfer learning via ensemble approach for initial diagnosis of alzheimer’s disease. IEEE Journal of Translational Engineering in Health and Medicine 8, pp. 1–10. Cited by: §2.2.
  • Y. Yao and G. Doretto (2010) Boosting for transfer learning with multiple sources. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 1855–1862. Cited by: §2.2.
  • Z. Zhou and M. Li (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on knowledge and Data Engineering 17 (11), pp. 1529–1541. Cited by: §1, §2.1, §2.1, §3.2, §3.2, §5.
  • X. J. Zhu (2008) Semi-supervised learning literature survey. Comput Sci, University of Wisconsin-Madison. Cited by: §2.1.
  • F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He (2021) A comprehensive survey on transfer learning. Proceedings of the IEEE 109 (1), pp. 43–76. Cited by: §1.