Log In Sign Up

Transfer Learning Toolkit: Primers and Benchmarks

The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current state of this toolkit is described and the necessary environment setting and usage are introduced.


page 1

page 2

page 3

page 4


jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

We introduce jiant, an open source toolkit for conducting multitask and ...

LAYERS: Yet another Neural Network toolkit

Layers is an open source neural network toolkit aim at providing an easy...

Building Inspection Toolkit: Unified Evaluation and Strong Baselines for Damage Recognition

In recent years, several companies and researchers have started to tackl...

SciWING – A Software Toolkit for Scientific Document Processing

We introduce SciWING, an open-source software toolkit which provides acc...

The Universal Decompositional Semantics Dataset and Decomp Toolkit

We present the Universal Decompositional Semantics (UDS) dataset (v1.0),...

Text Characterization Toolkit

In NLP, models are usually evaluated by reporting single-number performa...

LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

We present LEGOEval, an open-source toolkit that enables researchers to ...

1 Introduction

Transfer learning is a promising and important direction in machine learning, which attempts to leverage the knowledge contained in a source domain to improve the learning performance or minimize the number of labeled samples required in a target domain. According to the survey by Pan and Yang


, transfer learning approaches can be divided into four categories, i.e., instance-based, feature-based, parameter-based, and relational-based approaches. Instance-based approaches focus on re-weighting the instances in the source domain to help construct a learner on the target domain. Feature-based approaches aim to find a new feature representation for domain adaptation. Parameter-based approaches try to discover shared parameters or priors between the domains. Relational-based approaches build the mapping of knowledge across relational domains. As the development of deep-learning techniques, a number of transfer learning models have been constructed based on deep networks

TSK2018ICANN , which have shown excellent performance on a variety of tasks.

In order to help primary researchers properly select and use some representative models as baselines in their comparative experiments with ease, the toolkit is developed, which contains a number of representative transfer learning models (wrapped or implemented by ourselves with the help of the existing open source code). The current version of this toolkit wraps 17 models and provides unified interfaces. These 17 transfer learning models include: HIDC ZLY2013IJCAI , TriTL ZLD2014TC , CD-PLSA ZLS2010CIKM , MTrick ZLX2011SADM , SFA PNS2010WWW , mSDA CXW2012ICML , SDA GBB2011ICML , GFK G2012CVPR , SCL BMP2006EMNLP , TCA PTK2011TNN , JDA LWD2013ICCV , TrAdaBoost DYX2007ICML , LWE GFJ2008KDD , DAN LCW2015ICML , DCORAL SS2016ECCVW , MRAN ZZW2019NN , and DANN GL2015ICML GUA2016JMLR . With this toolkit, users can simply call unified functions and run desired models, which may be helpful for exploring the transfer learning area or testing the superiority of new designed models. The models in the toolkit are temporarily divided into five groups, i.e., deep-learning-based, feature-based, concept-based, parameter-based, and instance-based groups.

The rest of this paper is organized into four sections. Section 2 describes the models in the four parts, respectively. Section 3 introduces the environment settings. Section 4 provides a case example. Section 5 presents the conclusion and the future work.

2 Models

Several definitions about transfer learning are presented below PY2010TKDE ; ZQD2019ARXIV .

. A domain is composed of two parts, i.e., a feature space and a marginal distribution . In other words, . And the symbol denotes an instance set, which is defined as .

. A task consists of a label space and a decision function , i.e., . The decision function is an implicit one, which is expected to be learned from the sample data.

Transfer learning utilizes the knowledge implied in the source domain to improve the performance of the learned decision function on the target domain PY2010TKDE ; ZQD2019ARXIV . For example, a common scenario of transfer learning is that we have abundant labeled instances in the source domain but only a few or even none of labeled instances in the target domain ZQD2019ARXIV . In such condition, the target of a transfer learning task is to build a more efficient decision function on the target domain with the data from both the source and the target domains.

Given that some models require the labeled instances, while others do not. In order to unify the calling function of models, the interface is designed as follows, i.e.,

where and denote the instances in the source and the target domains, respectively; and denote the instances and the corresponding labels used for test; and refer to the labels of and , respectively. Note that means that the two inputs are optional, which depends on the corresponding model. For example, the calling function in TCA PTK2011TNN is

while the interface in TrAdaBoost DYX2007ICML , which requires the labels of the instances in target domain, is given by

The models in the toolkit are also divided into two general groups, i.e., deep-learning-based and traditional (non-deep-learning) groups. The traditional models are further categorized into feature-based, concept-based, parameter-based, and instance-based ones (corresponding to the four file folders in the toolkit). Besides, the traditional models are wrapped into classes, and the hyper-parameters of these models can be customized by setting the initial function. If their parameters are not altered, the parameters will be set to default. Table 1 shows the categories of the models contained in the toolkit.

Category Models
Concept-based HIDC ZLY2013IJCAI , TriTL ZLD2014TC , CD-PLSA ZLS2010CIKM , MTrick ZLX2011SADM
Parameter-based LWE GFJ2008KDD
Instance-based TrAdaBoost DYX2007ICML
Deep-learning-based DAN LCW2015ICML , DCORAL SS2016ECCVW , MRAN ZZW2019NN , DANN GL2015ICML GUA2016JMLR
Table 1: Categorization used in the toolkit

Feature-based models in the toolkit contain SFA PNS2010WWW , mSDA CXW2012ICML , SDA GBB2011ICML , GFK G2012CVPR , SCL BMP2006EMNLP , TCA PTK2011TNN , and JDA LWD2013ICCV . These models, which are the representative ones in traditional transfer learning, mainly focus on altering the feature representations, i.e., transforming the instances in the original feature space to a designed new feature space. In terms of these models, several additional interfaces are designed including fit(), transform(), etc. It is worth mentioning that some of these models, i.e., TCA PTK2011TNN , JDA LWD2013ICCV , and GFK G2012CVPR are implemented and wrapped based on the repository constructed by Wang et al W0000XYZ . Besides, SDA GBB2011ICML and mSDA CXW2012ICML are implemented and wrapped based on MAD0000SDAE and DOU0000MSDA .

Concept-based models in the toolkit include HIDC ZLY2013IJCAI , TriTL ZLD2014TC , CD-PLSA ZLS2010CIKM , and MTrick ZLX2011SADM . The source code of these four models is written in MATLAB and has been wrapped by Dr. Cheng into a MATLAB toolbox, i.e., TLLibrary64 ZCL2015IJCAI . The interfaces in the toolkit call the entry function of that toolbox directly. Note that the current MATLAB engine API for Python only supports Python 2.7, 3.5, and 3.6. If the Python version is higher than 3.6, the Python interfaces may not be executable. The MATLAB source code is included in the toolkit so that the user may directly call the functions in MATLAB.

The parameter-based and the instance-based models in the toolkit contain LWE GFJ2008KDD and TrAdaBoost DYX2007ICML , respectively. The model of TrAdaBoost DYX2007ICML is implemented and wrapped based on CHEN0000TAB . Note that the experimental results of LWE model in the toolkit are not consistent with the results in the original paper GFJ2008KDD , because the clustering methods used are different. To reproduce the results in paper GFJ2008KDD , please use CLUTO K0000CLUTO , which is a data clustering software tool written in C.

The deep-learning-based models in the toolkit include DAN LCW2015ICML , DCORAL SS2016ECCVW , MRAN ZZW2019NN , and DANN GL2015ICML GUA2016JMLR . The code is from ZW0000GIT , which is a repository containing a number of deep transfer learning implementations. Each deep-learning-based model in the toolkit is implemented by three files, i.e.,,, and To use a new dataset, a data loading file is necessary. The sample files are provided in the toolkit.

3 Environment Setting

In this section, the environment setting steps are introduced. To use the toolkit, a Python environment is necessary and the version 3.6 is recommended because the Python code in the toolkit is written by Python 3. If the Python version is 3.7 or higher, the Python interfaces for concept-based models may not be executable because the official MATLAB engine API for Python only supports version 2.7, 3.5, and 3.6. The support is valid for versions prior to R2019b. For the versions of Python not supported, the MATLAB entry functions of concept-based models are provided. The MATLAB environment should be prepared in advance in order to use the concept-based models. There are two steps. First, add the directory of utilities to the MATLAB paths. Then, install MATLAB engine API for Python by using the following commands.

cd  “matlabroot/extern/engines/python”
python  install

4 Case Example

To introduce the usage of the toolkit, TCA PTK2011TNN is used to show the calling process. After the preparation of the running environment and the datasets, the corresponding class should be imported from and the parameters should be initialized. Then, the entry interface is called to run the model. The code is as follows.

import  TCA  from  TCA
model = TCA(parameters = settings)

Reuters-21578111, which is a hierarchical dataset for text categorization, is used to test the performance of TCA PTK2011TNN . The results are given in Table 2. Besides, the experimental results of all the models are available in ZQD2019ARXIV .

Model Orgs vs Places People vs Places Orgs vs People
TCA 0.7368 0.6065 0.7562
Table 2: Experimental results of TCA.

5 Conclusion and Future Work

In this paper, a new toolkit has been introduced, which contains 17 representative transfer learning models. In the toolkit, several unified interfaces are provided, which makes it easy for primary researchers to use. The toolkit is written in Python 3 and makes use of a MATLAB toolbox. In the future, we will keep on maintaining, improving, and enriching the toolkit. New representative models will be added to the toolkit and various code languages will be unified.

If you are interested in this toolkit and use it for baselines, please cite this paper and our survey paper ZQD2019ARXIV . Also, feel free to contact us if you have any problem about using this toolkit.
(Fuzhen Zhuang:, Keyu Duan:


  • (1) S.J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
  • (2) C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A Survey on deep transfer learning,” in

    Proc. 27th International Conference on Artificial Neural Networks

    , Rhodes, Oct. 2018, pp. 270–279.
  • (3) F. Zhuang, P. Luo, P. Yin, Q. He, and Z. Shi, “Concept learning for cross-domain text classification: A general probabilistic framework,” in

    Proc. 23rd International Joint Conference on Artificial Intelligence

    , Beijing, Aug. 2013, pp. 1960–1966.
  • (4) F. Zhuang, P. Luo, C. Du, Q. He, Z. Shi, and H. Xiong, “Triplex transfer learning: Exploiting both shared and distinct concepts for text classification,” IEEE T. Cybern., vol. 44, no. 7, pp. 1191–1203, Jul. 2014.
  • (5) F. Zhuang, P. Luo, Z. Shen, Q. He, Y. Xiong, Z. Shi, and H. Xiong, “Collaborative Dual-PLSA: Mining distinction and commonality across multiple domains for text classification,” in Proc. 19th ACM International Conference on Information and Knowledge Management, Toronto, Oct. 2010, pp. 359–368.
  • (6) F. Zhuang, P. Luo, H. Xiong, Q. He, Y. Xiong, and Z. Shi, “Exploiting associations between word clusters and document classes for cross-domain text categorization,” Stat. Anal. Data Min., vol. 4, no. 1, pp. 100–114, Feb. 2011.
  • (7) S.J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen, “Cross-domain sentiment classification via spectral feature alignment,” in Proc. 19th International Conference on World Wide Web, Raleigh, Apr. 2010, pp. 751–760.
  • (8)

    M. Chen, Z. Xu, K. Weinberger, and F. Sha, “Marginalized denoising autoencoders for domain adaptation,” in

    Proc. 29th International Conference on Machine Learning, Edinburgh, Jun. 2012, pp. 767–774.
  • (9) X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large-scale sentiment classification: A deep learning approach,” in Proc. 28th International Conference on Machine Learning, Bellevue, Jun. 2011, pp. 513–520.
  • (10) B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for unsupervised domain adaptation,” in

    Proc. IEEE Conference on Computer Vision and Pattern Recognition

    , Providence, Jun. 2012, pp. 2066–2073.
  • (11) J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with structural correspondence learning,” in

    Proc. Conference on Empirical Methods in Natural Language Processing

    , Sydney, Jul. 2006, pp. 120–128.
  • (12) S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,”IEEE Trans. Neural Netw., vol. 22, no. 2, pp. 199–210, Feb. 2011.
  • (13)

    M. Long, J. Wang, G. Ding, J. Sun, and P.S. Yu, “Transfer feature learning with joint distribution adaptation,”in

    Proc. IEEE International Conference on Computer Vision, Sydney, Dec. 2013, pp. 2200–2207.
  • (14) W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transfer learning,” in Proc. 24th International Conference on Machine Learning, Corvalis, Jun. 2007, pp. 193–200.
  • (15) J. Gao, W. Fan, J. Jiang, and J. Han, “Knowledge transfer via multiple model local structure mapping,” in Proc. 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Aug. 2008, pp. 283–291.
  • (16) M. Long, Y. Cao, J. Wang, and M.I. Jordan, “Learning transferable features with deep adaptation networks,” in Proc. 32nd International Conference on Machine Learning, Lille, Jul. 2015, pp. 97–105.
  • (17) B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” in Proc. European Conference on Computer Vision Workshops, Amsterdam, Oct. 2016, pp. 443–450.
  • (18) Y. Zhu, F. Zhuang, J. Wang, J. Chen, Z. Shi, W. Wu, and Q. He, “Multi-representation adaptation network for cross-domain image classification,” Neural Netw., vol. 119. pp. 214–221, Nov. 2019.
  • (19)

    Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in

    Proc. 32nd International Conference on Machine Learning, Lille, Jul. 2015, pp. 1180–1189.
  • (20) Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F.¸Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, pp. 1–35, Apr. 2016.
  • (21) F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” 2019, arXiv:1911.02685v1.
  • (22) J. Wang et al. Everything about Transfer Learning and Domain Adaptation. [Online]. Available:
  • (23) M. Sushil. An Implementation of Stacked Denoising Autoencoder. [Online]. Available:
  • (24) D. Xu. An Implementation of Marginalized Denoising Autoencoders for Domain Adaptation. [Online]. Available:
  • (25) F. Zhuang, X. Cheng, P. Luo, S.J. Pan, and Q. He, “Supervised representation learning: Transfer learning with deep autoencoders,” in Proc. 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Jul. 2015, pp. 4119–4125.
  • (26) C. Chen. An Implementation of TrAdaBoost. [Online]. Available:
  • (27) G. Karypis. CLUTO - Software for Clustering High-Dimensional Datasets. [Online]. Available:
  • (28) Y. Zhu and J. Wang. A Collection of Implementations of Deep Domain Adaptation Algorithms. [Online]. Available: