Augmenting Transfer Learning with Semantic Reasoning

05/31/2019 ∙ by Freddy Lecue, et al. ∙ 0

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learning tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Transfer learning [Pan and Yang2010] aims at solving the problem of lacking training data by utilizing data from other related learning domains, each of which is referred to as a pair of dataset and prediction task. Transfer Learning plays a critical role in real-world applications of ML as (labelled) data is usualy not large enough to train accurate and robust models. Most approaches focus on similarity in raw data distribution with techniques such as dynamic weighting of instances [Dai et al.2007] and model parameters sharing [Benavides-Prado et al.2017] (cf. Related Work).

Despite of a large spectrum of techniques [Weiss et al.2016] in transfer learning, it remains challenging to assess a priori which domain and data set to elaborate from [Dai et al.2009]. To deal with such challenges, [Choi et al.2016] integrated expert feedback as semantic representation on domain similarity for knowledge transfer while [Lee et al.2017] evaluated the graph-based representations of source and target domains. Both studies encode semantics but are limited by the expressivity, which restricts domains interpretability and inhibits a good understanding of transferability. There are also efforts on Markov Logic Networks (MLN) based transfer learning, by using first order [Mihalkova et al.2007, Mihalkova and Mooney2009] or second order [Davis and Domingos2009, Van Haaren et al.2015] rules as declarative prediction models. However, these efforts still cannot answer questions like: What ensures a positive domain transfer? Would learning a model from road traffic congestion in London be the best for predicting congestion in Paris? Or would an air quality model transfer better?

In this paper, we propose to encode the semantics of learning tasks and domains with OWL ontologies, and provide a robust foundation to study transferability between source and target learning domains. From knowledge materialization [Nickel et al.2016]

, feature selection

[Vicient et al.2013], predictive reasoning [Lécué and Pan2015], stream learning [Chen et al.2017] to transfer learning explanation [Chen et al.2018], all are examples of inference tasks where the semantics of data representation are exploited for deriving a priori knowledge from pre-established statements in ML tasks.

We introduce a framework to augment transfer learning by semantics and its reasoning capability, as shown in Figure 1. It deals with (i) when to transfer by suitable transferability measurements (i.e., variability of semantic learning task and consistent transferability knowledge), (ii) what to transfer

by embedding the semantics of learning domains and tasks with transferability vector, consistent vector and variability vector. In addition to expose semantics that drives transfer, a transfer boosting algorithm is developed to integrate the embeddings with existing transfer learning approaches for higher accuracy.

Our approach has demonstrated to be robust with high accuracy for transfer learning tasks in real-world applications: (i) air quality from Beijing to Hangzhou in China, (ii) bus delay from London in UK to Dublin in Ireland and (iii) from bus delay in London to air quality in Beijing.

Figure 1: Ontology-based Transfer Learning Augmentation.

2 Background

OWL Ontologies, underpinned by Description Logic (DL), are widely used for modeling semantics. Our work uses DL [Baader et al.2005] to the semantics of learning domains and tasks, but could be applied with more expressive DLs, using approximate reasoning [Pan et al.2016].

2.1 Description Logics and Ontology

A signature , noted consists of disjoint sets of (i) atomic concepts , (ii) atomic roles , and (iii) individuals . Given a signature, the top concept , the bottom concept , an atomic concept , an individual , an atomic role expression , concept expressions and in can be composed with the following constructs:

A DL ontology is composed of a TBox and an ABox . is a set of concept, role axioms. supports General Concept Inclusion axioms (GCIs e.g., ), Role Inclusion axioms (RIs e.g., ). is a set of class assertion axioms, e.g., , role assertion axioms, e.g., , individual in/equality axioms e.g., , . Given an input ontology , we consider the closure of atomic ABox entailments (or simply entialment closure, denoted as ) as , where represents an atomic concept assertion , or an atomic role assertion entailment , involving only named concepts, named roles and named individuals. Entailment reasoning in is PTime-Complete.

Example 1.

(TBox and ABox Concept Assertion Axioms)
Figure 2 presents (i) a TBox where (1) denotes the concept of “ways which are in a continent”, and (ii) concept assertions (8-9) with individuals and being roads.

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)

Figure 2: Sample of an Ontology’s TBox and ABox .

2.2 Learning Domain and Task

To model the learning domain with ontology, we use Learning Sample Ontology and Target Entailment, as in [Chen et al.2018]. A learning domain consists of an LSO set (i.e., dataset) and a target entailment set (i.e., prediction task).

Definition 1.

(Learning Sample Ontology (LSO))
A learning sample ontology is an ontology annotated by property-value pairs .

The annotation acts as key dimensions to uniquely identify an input sample of ML methods. When the context is clear, we also use LSO to refer to its ontology .

Example 2.

(An LSO in Context of Ireland Traffic)
Assume an LSO is annotated by property-value pairs topic: Road, C_Way, UK. Its TBox includes static axioms like (1); its ABox includes facts e.g., that are observed in in UK.

Definition 2.

(Learning Domain and Target Entailment)
A learning domain consists of a set of LSOs that share the same TBox , and target entailments , each of whose truth in an LSO is to be predicted. Its entailment closure, denoted as is defined as .

Definition 3

revisits supervised learning within a domain. In a training LSO, a target entailment is true if it is entailed by an LSO, and false otherwise.

In a testing LSO, the truth of a target entailment is to be predicted instead of being inferred.

Definition 3.

(Semantic Learning Task)
Given a learning domain , whose LSOs are divided into two disjoint sets and , a semantic learning task, denoted by , within , is defined as: i.e., the task of identifying a function with and to predict the truth of in each in . Here, is called a training LSO set, while is called a testing LSO set.

Example 3.

(Semantic Learning Task)
Given a domain composed of LSOs annotated by Road, UK and target entailments and , the LSOs are divided into a training set and a testing set according to the type of roads involved, the objective is to identify a function from to predict the condition of road , namely the truth of and in each LSO in .

2.3 Transfer Learning Across Domains

Definition 4 revisits transfer learning where and are called source domain and target domain and their entailment closures are denoted as and .

Definition 4.

(Transfer Learning)
Given two learning domains and , where the LSOs of are divided into two disjoint sets and , transfer learning from to is a task of learning a prediction function from , , and to predict the truth of in each LSO in .

Example 4.

(Transfer Learning)
Assume is the domain in Example 3, is a domain with LSOs annotated by : Road, : IE, an example of transfer learning is to identify a function using all the LSOs of Dublin traffic and the training LSOs of London traffic () for predicting the traffic condition of road in each testing LSO of London traffic ().

We demonstrate how ontology-based descriptions can drive transfer learning from one domain to another. To this end, similarities between domains are first characterized. We adopt the variability of ABox entailments [Lécué2015] in Definition 5, where (10) reflects iant knowledge between two domains while (11) denotes ariant knowledge.

Definition 5.

(Entailment-based Domain Variability)
Given a source learning domain and a target learning domain , let , the variability from to , denoted as are ABox entailments:

(10)
(11)
Example 5.

(Entailment-based Domain Variability)
Let Figure 3 and 4, which capture the contexts in IE and UK, be ontologies of and respectively. Table 1 illustrates some variabilities of and through ABox entailments. For instance as a disrupted road in is () w.r.t. knowledge in and axioms (1), (9) and (12-15).

(12)
(13)
(14)
(15)

Figure 3: Source Domain Ontologies in Context of IE Traffic.

(16)
(17)
(18)
(19)
(20)
(21)

Figure 4: Target Domain Ontologies in Context of UK Traffic.
Ontology Variability
Table 1: Examples for Entailment-based Domain Variability.

3 Transferability

We present (i) variability of semantic learning tasks, (ii) semantic transferability, as a basis for qualifying, quantifying transfer learning (i.e., when to transfer), together with (ii) indicators (i.e., what to transfer) driving transferability. They are pivotal properties, as any change in domains, their transfer function and consistency drastically impact the quality of derived models [Long et al.2015, Chen et al.2018].

3.1 Variability of Semantic Learning Tasks

Definition 6 extends entailment-based ontology variability (Definition 5) to capture the learning task variability, where represents using target entailments in (10) (11).

Definition 6.

(Variability of Semantic Learning Tasks)
Let and be semantic learning tasks of source learning domain and target learning domain . The variability of semantic learning tasks is defined by (22), where refers to the cardinality of a set.

(22)

The variability of semantic learning tasks (22), also represented by in , captures the variability of source and target domain LSOs as well as the variability of target entailments. The higher values the stronger variability. The calculation of (22) is in worst case polynomial time w.r.t size , , , in . Its evaluation requires (i) ABox entailment, (ii) basic set theory operations from Definition 5, both in polynomial time [Baader et al.2005].

Example 6.

(Variability of Semantic Learning Tasks)
The variability of learning task between and in Example 4 is as the number of variant and invariant ABox entailments are respectively and , and . i.e., moderate variability of domains, none for target variables.

3.2 Semantic Transferability - When to Transfer?

We define semantic transferability from a source to a target semantic learning task as the existence of knowledge that are captured as ABox entailments in the source and have positive effects on predictive quality of the prediction function of the target semantic learning task.

Definition 7.

(Semantic -Transferability)
Let , be source, target semantic learning tasks with entailment closures , . Semantic -transferability occurs from to iff

(23)
(24)

where is the predictive function w.r.t. . is the ABox closures of .

is knowledge from , to be used for over-performing the predictive quality of with a factor (23) while being new with respect to ABox entailments in (24).

Example 7.

(Semantic -Transferability)
Let , be semantic learning tasks in , in Example 4, be ABox entailment closure of (12-15) in , and . Semantic -transferability occurs from to as (i) an , satisfying condition (23), exists, and (ii) (24) is true cf. Table 1 w.r.t. . Thus, knowledge in IE traffic context () ensures transferability from to for traffic prediction in UK.

ABox entailments satisfying Definition 7 are denoted as transferable knowledge while those contradicting (23) i.e., are non-transferable knowledge as they deteriorate predictive quality of target function .

Example 8.

(Transferable Knowledge)
Consider entailments in : (i) , derived from (13) (19-21), (ii) , derived from (8), (12), (17-18). As part of knowledge positively impacting the quality of the prediction task, they are also separate -transferable knowledge with max : , (computation details omitted).

3.3 Consistent Transferable Knowledge

Transferring knowledge across domains can derive to inconsistency. Definition 8 captures knowledge ensuring transferability while maintaining consistency in the target domain.

Definition 8.

(Consistent Transferable Knowledge)
Let be ABox entailments ensuring . is consistent transferable knowledge from to iff .

ABox entailments satisfying are called inconsistent transferable knowledge. They are interesting ABox entailments as they expose knowledge contradicting the target domain while maintaining transferability. Evaluating if is consistent transferable knowledge is in worst case polynomial time in w.r.t. size of and .

Example 9.

((In-)Consistent Transferable Knowledge)
in of Example 8 is consistent transferable knowledge in as . On contrary and in , derived from (16-18) are inconsistent (7). Thus, in is inconsistent transferable knowledge in .

4 Semantic Transfer Learning

We tackle the problem of transfer learning by (i) computing semantic embeddings (i.e., how to transfer) for knowledge transfer, and (ii) determining a strategy to exploit the semantics of the learning tasks (Section 3) in Algorithm 1.

4.1 Semantic Embeddings - How to transfer?

The semantics of learning tasks exposes three levels of knowledge which are crucial for transfer learning: variability, transferability, consistency. They are encoded as embeddings through Definition 9, 10 and 11.

Definition 9.

(Transferability Vector)
Let be all distinct ABox entailments in . A ransferability vector from to , denoted by , is a vector of dimension such that : if is -transferable knowledge, and otherwise, with and is -transferable knowledge.

A transferability vector (Definition 9) is adapting the concept of feature vector [Bishop2006]

in Machine Learning to represent the qualitative transferability from source to target of all ABox entailments. Each dimension captures the best of transferability of a particular ABox entailment.

Example 10.

(Transferability Vector)
Suppose . Transferability vector is cf. -transferability in Example 8.

A consistency vector (Definition 10) is computed from all entailments by evaluating their (in-)consistency, either or , when transferred in the target semantic learning task. Feature vectors are bounded to only raw data while transferability and consistency vectors, with larger dimensions, embed transferability and consistency of data and its inferred assertions. They ensure a larger, more contextual coverage.

Definition 10.

(Consistency Vector)
Let be all distinct ABox entailments in . A onsistency vector from to , denoted by , is a vector of dimension such that : if , and otherwise

The variability vector (Definition 11) is used as an indicator of semantic variability between the two learning tasks. It is a value in with an emphasis on the domain ontologies and / or label space depending on its parameterization . We characterize any variability weight above as inter-domain transfer learning tasks, below as intra-domain.

Definition 11.

(Variability Vector)
Let be ABox entailments in . A variability vector from to is a vector of dimension with such that is:

(25)
Example 11.

(Variability Vector)
Applying (25) on the variability of semantic learning tasks between and : in Example 6 results in , which represents moderate variability.

4.2 Boosting for Semantic Transfer Learning

Algorithm 1 presents an extension of the transfer learning method TrAdaBoost [Dai et al.2007] by integrating semantic embeddings. It aims at learning a predictive function (line 1) using , for . The semantic embeddings of all entailments in are computed (lines 1-1). They are defined through transferability, consistency, variability effects from source to target domain. Then, their importance / weight are iteratively adjusted (line 1) depending on the evaluation of (lines 1-1

) when comparing estimated prediction

and real values .

The base model (lines 1-1

), which can be derived from any weak learner e.g., Logistic Regression, is built on top of all entailments in source, target tasks. However, entailments from the source might be wrongly predicted due to tasks variability (Definition

6 - line 1) , . Thus, we follow the parameterization of and [Dai et al.2007] by decreasing the weights of such entailments to reduce their effects (lines 1-1). In the next iteration, the misclassified source entailments, which are dissimilar to the target ones w.r.t. semantic embeddings, will affect the learning process less than the current iteration. Finally, StAdaB returns a binary hypothesis (line 1). Multi-class classification can be easily applied.

1 Input: (i) Source / target domains and semantic learning tasks , , (ii) a training LSO set of the target learning domain , (iii) all distinct ABox entailments of , (iv) a base learning alg. , (v) max. nb. iterations , (vi) .
2 Result: : A predictive function utilizing , and for .
3 begin
4        % Initialization of eights for transferability, consistency,
5        % and variability vectors of all ABox entailments in .
6        Initialization of % Computation of semantic embeddings for all .
7        foreach  do % Weight computation iteration
8              

% Probability distribution of

.
9               % Predictive function over .
10              
11               % Error computation of on .
12              
13               % Weights for reducing errors on over iteration.
14               % Weight update of source and target entailments in .
15               % using , , and results from previous iteration: .
16              
17       return Hypothesis ensemble:
18
Algorithm 1 StAdaB()

A brute force approach would consist in generating an exponential number of models with any combination of entailments from source, target. StAdaB reduced its complexity by only evaluating atomic impact and (approximately) computing the optimal combination. As a side effect, StAdaB exposes entailments in the source which are driving transfer learning (cf. final weight assignment of embeddings).

5 Experimental Results

Set-up: StAdaB is evaluated by two Intra-domain transfer learning cases: (i) air quality forecasting from Beijing to Hangzhou (IBH), (ii) traffic condition prediction from London to Dublin (ILD), one Inter-domain case: (iii) from traffic condition prediction in London to air quality forecasting in Beijing (ILB). All tasks are performed with a respective value of , , for variability . .

Intra-Domain Beijing - Hangzhou (IBH)111Air quality data: https://bit.ly/2BUxKsi: Air quality knowledge in Beijing (source) is transferred to Hangzhou (target) for forecasting air quality index, ranging from Good (value ), Moderate (), Unhealthy (), Very Unhealthy (), Hazardous () to Emergent (). The observations include air pollutants (e.g., ), meteorology elements (e.g., wind speed) and weather condition from stations. The semantics of observations is based on a DL ontology, including concepts, roles, axioms. RDF triples are generated on a daily basis. (resp. ) months of observations are used as training (resp. testing). Even though the ontologies are from the same domain, the proportion of similar concepts and roles are respectively (i.e., of concepts are similar) and . For instance, no hazardous air quality concept in Hangzhou.

Intra-Domain London - Dublin Bus Delay (ILD):

Bus delay knowledge in London (source) is transferred to Dublin (target) for predicting traffic conditions classified as Free (value 4), Low (3), Moderate (2), Heavy (1), Stopped (0). Source and target domain data include bus location, delay, congestion status, weather conditions. We enrich the data using a DL

domain ontology ( concepts, roles, axioms). RDF triples are generated on a daily basis. (resp. ) months of observations are used as training (resp. testing). The concept and role similarities among the two ontologies are respectively and .

Inter-Domain London - Beijing (ILB): Bus delay knowledge in London (source) is transferred to a very different domain: Beijing (target) for forecasting air quality index. Data and ontologies from IBH and ILD are considered. Both domains share some common and conflicting knowledge. Inconsistency might then occur. For instance, both domains have the concepts of City, weather such as Wind but are conflicting on their importance and impact on the targeted variable i.e., bus delay in London and air quality in Beijing. The concept and role similarities among the two ontologies are respectively and .

Validation: Accuracy is reported by (i) studying the impact of semantic embeddings, (ii) and comparing prediction results with existing approaches. Cross validation is used.

Semantic Impact: Table 2 reports the impact of considering semantics (cf. Sem. vs. Basic) and (in)consistency (cf. Consistency / Inconsistency) in semantic embeddings

on Random Forest (RF), Stochastic Gradient Descent (SGD), AdaBoost (AB). “

Basic” models are models with no semantics attached. “Plain” models are modelling and prediction in the target domain i.e., no transfer learning, while “TL” refers to transferring entailments from the source. As expected semantics positively boosts accuracy of transfer learning for intra-domain cases (IBH and ILD) with an average improvement of % across models. More surprisingly it even over-performs in the inter-domain case (ILB) with an improvement of %. Inconsistency has shown to drive below-baseline accuracy. On the opposite results are much better when considering consistency for intra-domain cases (), and inter-domain cases (%).

Case Models RF SGD AB
Plain TL Plain TL Plain TL
IBH Basic
Sem. Consistency
Inconsistency
Cons. / Incons. +16.07% +19.23% +30.61%
Semantic / Basic +13.93% +8.18% +12.17%
ILD Basic
Sem. Consistency
Inconsistency
Cons. / Incons. +60.22% +102.86% +152.35%
Semantic / Basic +10.07% +14.70% +19.42%
ILB Basic
Sem. Consistency
Inconsistency
Cons. / Incons. +153.96% 166.25% +243.46%
Semantic / Basic +20.44% +17.33% +22.33%
Table 2: Forecasting Accuracy / Improvement over State-of-the-art Models (noted as Basic) with Consistency / Inconsistency (Consistency ratio ) based Knowledge Transfer.

(In-)Consistency Impact: Figure 5 reports the impact of (in-)consistency on transfer learning by analysing how the ratio of consistent transferable knowledge in is driving accuracy. Accuracy is reported for methods in Table 2 on intra- (average of IBH and ILD) and inter-domains (ILB). Max. (resp. min.) accuracy is ensured with ratio in (resp. ). The more consistent transferable knowledge the more transfer for . Interestingly having only consistent (resp. inconsistent) transferable knowledge does not ensure best (resp. worst) accuracy. This is partially due to under- (resp. over-) populating the target task with conflicting knowledge, ending up to limited transferability.

Figure 5: Forecasting Accuracy vs. Semantic Consistency.

Baseline: We compare StAdaB ( Logistic Regression, ) with (i) Transfer AdaBoost TrAB [Dai et al.2007], (ii) Transfer Component Analysis (TCA) [Pan et al.2011], (iii) TrSVM [Benavides-Prado et al.2017] and (iv) SemTr [Lv et al.2012], which are respectively instance-based, feature-based, parameter-based and semantic-based approaches cf. details in Section 6. We considered intra-domains: IBH, ILD and inter-domains: ILB and ILB (i.e., ILB with same level of semantic expressivity covered by SemTr). Results report that transfer learning has limitations in the Beijing - Hangzhou context cf. Figure 6(a). Although our approach over-performs other techniques (from to ), accuracy does not exceed . The latter is due to the context, which is limited by the (i) semantic expressivity and (ii) data availability in Hangzhou. The results show that TrSVM and TCA reach similar results (average difference of ) in all the cases. However our approach and TrAB tend to maximise the accuracy specially in inter-domains ILB and ILB in Figures 6(c) and 6(d) as both favour heterogenous domains by design. Interestingly the semantic context of ILB in Figure 6(d) (i) does not favour SemTr much ( vs. ILB), (ii) does not have impact for StAdaB compared to ILB, and more surprisingly (iii) does benefit TrAB ( vs. ILB). This shows that expressivity of semantics is crucial in our approach to benefit from (in-)consistency in transfer.

(a) Intra-Domain IBH
(b) Intra-Domain ILD
(c) Inter-Domains ILB
(d) Inter-Domains ILB
Figure 6: Baseline Comparison of Forecasting Accuracy.

Lessons Learnt: Adding semantics to domains for transfer learning has clearly shown the positive impact on accuracy, specially in context of inter-domains transfer. This demonstrates the robustness of models supporting semantics when common / conflicting knowledge is shared. The expressivity of semantics has also shown positive impacts, specially when (in-)consistency can be derived from the domain logics, although some state-of-the-art approaches benefit from taxonomy-like knowledge structure. Our approach also demonstrates that the more semantic axioms the more robust is the model and hence the higher the accuracy cf. Figure 6(a) vs. 6(b). Data size and axiom numbers are critical as they drive and control the semantics of domain and transfer, which improve accuracy, but not scalability (not reported in the paper). It is worst with more expressive DLs due to consistency checks, and with limited impact on accuracy. Enough training data in the source domain is required. Indeed logic reasoning could not help if important data or features are not mapped to the ontology. This is crucial for training and validation of semantics in transfer learning. Our approach is as robust as other transfer learning approaches, it only differentiate on valuing the transferability at semantic level.

6 Related Work

Instance-based Transfer: Selective reuse of the source domain samples with weights [Dai et al.2007]. Positive (negative) degree of influence of each source domain training sample in the target domain is computed. [Tan et al.2017] select data points from intermediate domains to obtain smooth transfer between largely distant domains.

Model-based Transfer: Reuse of model parameters e.g., features (data representation). [Pan et al.2011] introduced a semi-supervised method for source and target domains to be similar. The parameters can also refer to the prediction model e.g., [Gao et al.2008] trained models in source domain and dynamically combined them in the target domain with weights. [Benavides-Prado et al.2017]

selectively shares the hypothesis components learnt by Support Vector Machines. All such approaches do not consider semantics.

Semantic-based Transfer: Incorporate external knowledge to boost the above two groups. [Lv et al.2012] used semantic nets to select features with similar semantic meaning in both source and target domains. [Lee et al.2017]

analyzed knowledge graph-structure data to derive similarity in data and features.

The semantic representations in these works are lightweight, with no reasoning applied. There are efforts on Markov Logic Networks (MLN) based transfer learning, by using first order [Mihalkova et al.2007, Mihalkova and Mooney2009] or second order [Davis and Domingos2009, Van Haaren et al.2015] rules as declarative prediction models. However, these MLN based approaches do not address the problem of “when is feasible to transfer”. Our approach uses OWL reasoning to select transferable samples (thus addressing ‘when to transfer’), then enriching the samples with embedded transferability semantics e.g., consistent vector. Our approach can be used to deal with different machine learning models and is not limited to rule based models.

7 Conclusion

We addressed the problem of transfer learning in expressive semantics settings, by exploiting semantic variability, transferability and consistency to deal with when to transfer and what to transfer, for existing instance-based transfer learning methods. Our approach has been shown to be robust to both intra- and inter-domain transfer learning tasks from real-world applications in Dublin, London, Beijing and Hangzhou. As for future work, we will investigate limits and explanations of transferability with expressive semantics.

Acknowledgments

This work is funded by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889), NSFC91846204 and the EU Marie Curie IAPP K-Drive project (286348).

References

  • [Baader et al.2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the el envelope. In IJCAI, pages 364–369, 2005.
  • [Benavides-Prado et al.2017] Diana Benavides-Prado, Yun Sing Koh, and Patricia Riddle. AccGenSVM: Selectively transferring from previous hypotheses. In IJCAI, pages 1440–1446, 2017.
  • [Bishop2006] Christopher M Bishop. Pattern recognition. Machine Learning, 128:1–58, 2006.
  • [Chen et al.2017] Jiaoyan Chen, Freddy Lécué, Jeff Z Pan, and Huajun Chen. Learning from ontology streams with semantic concept drift. In IJCAI, pages 957–963, 2017.
  • [Chen et al.2018] Jiaoyan Chen, Freddy Lécué, Jeff Z. Pan, Ian Horrocks, and Huajun Chen. Knowledge-based transfer learning explanation. In KR, pages 349–358, 2018.
  • [Choi et al.2016] Jonghyun Choi, Sung Ju Hwang, Leonid Sigal, and Larry S Davis. Knowledge transfer with interactive learning of semantic relationships. In AAAI, pages 1505–1511, 2016.
  • [Dai et al.2007] Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. Boosting for transfer learning. In Proceedings of the 24th international conference on Machine learning, pages 193–200. ACM, 2007.
  • [Dai et al.2009] Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems, pages 353–360, 2009.
  • [Davis and Domingos2009] Jesse Davis and Pedro Domingos. Deep transfer via second-order Markov Logic. In ICML, pages 217–224, 2009.
  • [Gao et al.2008] Jing Gao, Wei Fan, Jing Jiang, and Jiawei Han. Knowledge transfer via multiple model local structure mapping. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 283–291. ACM, 2008.
  • [Lécué and Pan2015] Freddy Lécué and Jeff Z Pan. Consistent knowledge discovery from evolving ontologies. In AAAI, pages 189–195, 2015.
  • [Lécué2015] Freddy Lécué. Scalable maintenance of knowledge discovery in an ontology stream. In IJCAI, pages 1457–1463, 2015.
  • [Lee et al.2017] Jaekoo Lee, Hyunjae Kim, Jongsun Lee, and Sungroh Yoon.

    Transfer learning for deep learning on graph-structured data.

    In AAAI, pages 2154–2160, 2017.
  • [Long et al.2015] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 97–105, 2015.
  • [Lv et al.2012] Wenlong Lv, Weiran Xu, and Jun Guo. Transfer learning in classification based on semantic analysis. In Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on, pages 1336–1339. IEEE, 2012.
  • [Mihalkova and Mooney2009] L. Mihalkova and R. J. Mooney. Transfer learning from minimal target data by mapping across relational domains, 2009.
  • [Mihalkova et al.2007] Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. Mapping and revising Markov Logic Networks for transfer learning. In AAAI, pages 608–614, 2007.
  • [Nickel et al.2016] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016.
  • [Pan and Yang2010] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
  • [Pan et al.2011] Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. Domain adaptation via transfer component analysis.

    IEEE Transactions on Neural Networks

    , 22(2):199–210, 2011.
  • [Pan et al.2016] Jeff Z. Pan, Yuan Ren, and Yuting Zhao. Tractable approximate deduction for OWL. Arteficial Intelligence, pages 95–155, 2016.
  • [Tan et al.2017] Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. Distant domain transfer learning. In AAAI, pages 2604–2610, 2017.
  • [Van Haaren et al.2015] Jan Van Haaren, Andrey Kolobov, and Jesse Davis. TODTLER: Two-order-deep transfer learning. In AAAI, pages 3007–3015, 2015.
  • [Vicient et al.2013] Carlos Vicient, David Sánchez, and Antonio Moreno.

    An automatic approach for ontology-based feature extraction from heterogeneous textual resources.

    Engineering Applications of Artificial Intelligence

    , 26(3):1092–1106, 2013.
  • [Weiss et al.2016] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big Data, 3(1):9, 2016.