Meta Reasoning over Knowledge Graphs

08/13/2019 ∙ by Hong Wang, et al. ∙ ibm The Regents of the University of California 0

The ability to reason over learned knowledge is an innate ability for humans and humans can easily master new reasoning rules with only a few demonstrations. While most existing studies on knowledge graph (KG) reasoning assume enough training examples, we study the challenging and practical problem of few-shot knowledge graph reasoning under the paradigm of meta-learning. We propose a new meta learning framework that effectively utilizes the task-specific meta information such as local graph neighbors and reasoning paths in KGs. Specifically, we design a meta-encoder that encodes the meta information into task-specific initialization parameters for different tasks. This allows our reasoning module to have diverse starting points when learning to reason over different relations, which is expected to better fit the target task. On two few-shot knowledge base completion benchmarks, we show that the augmented task-specific meta-encoder yields much better initial point than MAML and outperforms several few-shot learning baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge Graphs Auer et al. (2007); Bollacker et al. (2008); Vrandecic and Krötzsch (2014) represent entities’ relational knowledge in the form of triples, i.e., (subject, predicate, object), and has been proven to be essential and helpful in various downstream applications such as question answering Yao and Durme (2014); Bordes et al. (2015); Yih et al. (2015); Yu et al. (2017). Since most existing KGs are highly incomplete, a lot of studies Bordes et al. (2013); Trouillon et al. (2016); Lao and Cohen (2010) have been done in automatically completing KGs, i.e., inferring missing triples. However, most of these studies only focus on frequent relations and ignore the relations with limited training samples. As a matter of fact, a large portion of KG relations are actually long-tail, i.e., they have very few instances. Therefore, it is important to consider the task of knowledge graph completion under few-shot learning setting, where limited instances are available for new tasks. Xiong et al. (2018) first propose a graph network based metric-learning framework for this problem but the metric is learned upon graph embeddings and their method does not provide reasoning rationales for the predictions.

In contrast, we propose a meta reasoning agent that learns to make predictions along with multi-hop reasoning chains, thus the prediction of our model is fully explainable. In this problem setting, each task corresponds to a particular relation and the goal is to infer the end entity given the start entity (i.e., the query). Following the recent work Ravi and Larochelle (2017); Finn et al. (2017); Gu et al. (2018); Huang et al. (2018); Sung et al. (2018); Mishra et al. (2018) on meta-learning, we aim to learns a reasoning agent that can effectively adapt to new relations with only a few examples. This is quite challenging since the model must learn to leverage its prior learning experience for fast adaptation and at the same time avoid overfitting on the few-shot training examples. Model-agnostic meta-learning algorithm (MAML) Finn et al. (2017) is a popular and general algorithm to solve this problem. It aims to learn an initial model that captures the common knowledge shared within the tasks so that it can adapt on the new task quickly. But one problem of MAML is that it only learns a single initial model, which can not fit the new task without training, and has limited power in the case of diverse tasks Chen et al. (2019). Another problem is that MAML only learns the common knowledge shared within the tasks without taking advantage of the relationship between them since no task-specific information is used when learning the initial model.

In order to learn the relationship between tasks, the model must be aware of the identity of the current task, such as the query relation in our problem. But simply using task identity will be a problem, since there is no way to initialize the identity of the new task except random initialization. We try to solve this problem via a meta-encoder that learns the task representation from meta-information which is available on the new task as well. Specifically, the meta-encoder is used to encode the task-specific information and generate the representation of the task as part of parameters. Through this way, different tasks will have different representations, thus different initial models. Also, since the presentation of the task is available, the model can leverage the relationship between different tasks. To apply this idea in our problem, we propose two meta-encoder to encode two different kinds of task-specific information. One is to use the neighbor encoder to encode the start entity and the end entity, and then use the difference between the embedding of the start entity and the end entity as the task representation. But this take-specific information is not robust when the number of neighbors is small. Thus we propose another way for the case which encodes the path from the start entity to the end entity. On two constructed few-shot multi-hop reasoning datasets, we show that the augmented meta-encoder yields much better initial point and outperforms several few-shot learning baselines.

The main contributions of this work include:

We introduce few-shot learning on the task of multi-hop reasoning over knowledge graph, and present two constructed datasets for this task.

We propose to use meta-encoder to encode task-specific information so as to generate better task-dependent model for the new task.

We apply neighbor encoder and path encoder to leverage the task-specific information in multi-hop reasoning task, and experiments verify the effectiveness of the augmented meta-encoder.

2 Related Work

Reasoning over Knowledge Graphs

Knowledge graph reasoning aims to infer the existence of a query relation between two entities. There are two general approaches for knowledge graph reasoning. The embedding based approaches Nickel et al. (2011); Bordes et al. (2013); Yang et al. (2015); Trouillon et al. (2016); Wu et al. (2016)

learn the representations of the relations and entities in the KG with some heuristic self-supervised loss functions, while path search based approaches

Lao and Cohen (2010); Neelakantan et al. (2015); Xiong et al. (2017); Das et al. (2018); Chen et al. (2018); Lin et al. (2018); Shen et al. (2018) solve this problem through multi-hop reasoning, i.e., finding the reasoning path between two entities. In spite of the superior performance of embedding-based methods, they can not capture the complex reasoning patterns in the KG and are lack of explainability.

Due to its explainability, multi-hop reasoning has been investigated a lot in recent years. The Path-Ranking Algorithm (PRA) Lao and Cohen (2010) is a primal approach that learns random walkers to leverage the complex path features. Gardner et al. (2013, 2014)

improves upon PRA by computing feature similarity in the vector space. Recursive random walk integrates the background KG and text

Wang and Cohen (2015)

. There are also other methods using convolutional neural network

Toutanova et al. (2015)

and recurrent neural networks

Neelakantan et al. (2015). More recently, Xiong et al. (2017)

first applies reinforcement learning for learning relational paths.

Das et al. (2018) proposes a more practical setting of predicting end entity given the query relation and the start entity. Lin et al. (2018) reshapes the rewards using pre-trained embedding model. Shen et al. (2018) uses Monte Carlo Tree Search to overcome the problem of sparse reward.

Meta-learning

Meta-learning aims to achieve fast adaption on new tasks through meta-training on a set of tasks with abundant training examples. It has been widely applied in few-shot learning settings where limited samples are available Gu et al. (2018); Huang et al. (2018). One important category of meta-learning approaches is initialization based methods, which aims to find a good initial model that can fast adapt to new tasks with limited samples Finn et al. (2017); Nichol et al. (2018). However, they only learn a single initial model and do not leverage the relationship between tasks. Rusu et al. (2018) proposes to learn a data-dependent latent generative representation of the model parameters and conduct gradient-based adaptation procedure in this latent space. Another related work is Relation Network Sung et al. (2018), which consists of am embedding module to encode samples and a relation module to capture the relation between samples.

3 Background

In this section, we will first introduce the multi-hop reasoning task. Then we will extend it to the meta-learning setting and introduce the popular framework (MAML) for few-shot learning.

3.1 Multi-hop Reasoning Problem

In this problem, there is a background graph , and a set of query relations . Each query relation has its own training and testing triple (), where , and are the start entity and end entity in the KB, while is the query relation. Given the start entity and the query relation , the task is to predict the end entity , along with a support reasoning path from to in . The length of the path is set to be fixed, and an additional STOP edge is added for each entity to point at itself so that the model is able to stay in the end entity.

We give an example to better explain this task. Consider the relation of Nationality with a training triple: (Obama, Nationality, American). Given the start entity and the query relation, (Obama, Nationality), the model is expected to find a path with a fixed length in from Obama to American. A general framework to solve this problem is to train an agent that predicts the next relation based on the current entity, the query relation, and the visited path at each step. In expectation, the agent should give the reasoning path (BornIn, CityIn, ProvinceIn), and predict the end entity as American.

3.2 Meta-learning for Multi-hop Reasoning

For multi-hop reasoning problem, we define a task as the inference of a specific relation’s end entity conditioned on the start entity. It is easy to see that each relation forms an individual task. In the meta-learning framework, the tasks are divided into three disjoint sets called meta-training, meta-dev, and meta-test set respectively. The goal of meta-learning is to train an agent that can quickly adapt on the new tasks in meta-test set with limited data by leveraging prior learning experience.

Following standard meta-learning setting as in Finn et al. (2017), our setting consists of two phases, the meta-training and meta-test phase. In the meta-training phase, the agent learns on a set of meta-training tasks , where each task has its own training and validation set denoted as {, }. By learning on the meta-training tasks , the agent is expected to gain some knowledge about the reasoning process, which can help learn faster on new tasks.

In the meta-test phase, the trained agent will be evaluated on a set of new tasks in the meta-dev/meta-test task set . Each task has its own training and testing set denoted as {, }, where only has limited training samples. The agent will be fine-tuned on each task using for fixed gradient steps, and be evaluated after each gradient step. The macro-average on all tasks in is reported as its performance of meta-learning. Note that the number of fine-tuning steps should be chosen according to the model’s performance on meta-dev tasks, and use the fixed chosen steps on meta-test tasks directly, since there are only limited samples in the new task, which are not sufficient for choosing a feasible fine-tuning step.

3.3 MAML Framework

Let denotes the reasoning model in our setting that maps the observation to the action, i.e., next relation to be taken. The objective of MAML Finn et al. (2017) is to find a good model initialization which can quickly adapt to the new tasks after a few adaptions. We will first introduce the objective function of MAML, and then illustrate how to optimize it in the following part.

Let denote the parameter of the current model, and denote the updated parameter using samples from task . For example, suppose we use one gradient update on , then we have:

The meta-objective is to optimize the performance of across tasks sampled from . More formal definition is as follows:

To optimize this problem, we sample a batch of tasks . For each task , two subsets ( and ) of training examples will be sampled independently. is used to compute the updated parameters . Then is optimized to minimize the objective function using . Formally, we have

The above optimization requires the computation of second-order gradient, which is computationally expensive. In practice, people usually use first-order update rule instead, which has similar performance but needs much less computation Finn et al. (2017); Nichol et al. (2018):

4 Meta-Learning of Deep Reasoners

4.1 MAML with Task-specific Initialization

MAML learns a single initial model that does not depend on any task-specific information. It works by adapting the initial model through gradient update on the target task. In other words, the initial model learns some common knowledge shared by the tasks, so that it can adapt to new tasks quickly. However, MAML is not able to capture the relationship between different tasks because it is lack of task-specific information. One easy way to inject task information is to use task identity, such as the embedding of query relation in our KB reasoning problem. But this solution could incur two problems. First, the model will learn some knowledge that only applies to a specific task, which is hard to transfer when adapting to new tasks. Second, when there comes a new task, we can not easily initialize the task identity, e.g. the embedding of a new query. Therefore, we propose to use a meta-encoder to encode the task-specific information, which can not only enable the model to learn the relationship between different tasks but also allows the model adapt in the new task faster since the model can leverage the task-specific information of the new tasks as well.

Let and denote the input data and task-specific information respectively. is the meta-encoder that encodes , and is the model which takes both and as inputs to predict the outputs, i.e., is used for prediction. Note that we hope can encode the information about the whole target task instead of just itself so that can also benefit other instances within the same task , i.e., should perform well for any . This is because the task-specific information may not be available for the testing sample. For example, the end entity we use as the task-specific information is not available in new testing samples. To achieve this goal, we apply meta-gradient methods which is similar to MAML. Given a task , we will sample two subsets of instances and . The updated parameter is computed using :

Then meta-gradient is computed using and , where is used for initialization.

The first order update rule can be written as:

The details are shown in Algorithm 1. At first, a batch of tasks will be sampled. For each task , we sample two subsets of instances , and compute the meta information based on , which is the neighbor of start and end entity or the reasoning path between them for the multi-hop reasoning problem. In the following procedure, the updated parameters will be computed for each task (line 7-9). In meta-update step (line ), we update to minimize the loss of using new instances and the task representation .

For testing on a new task , we obtain the task representation based on the few-shot samples . Then we fine-tune and using the data . The model makes prediction on testing samples using .

Figure 1: The model we use for meta-reasoning over knowledge graph. a) is the general framework of the model. b) and c) are our neighbor encoder and path encoder respectively.
0:    : the distribution of tasks: learning rates for adaptation and meta-update: the number of adaptations: the reasoning model and meta-encoder
1:  Randomly initialize
2:  for step = 0 : M-1 do
3:     for batch of tasks  do
4:         Sample task instances () from
5:         Compute task specific information
6:         Set
7:         for i = 0 : k do
8:            
9:         end for
10:     end for
11:     
12:  end for
Algorithm 1 MAML with Meta-Encoder

4.2 Model

The general framework of our model is shown in Figure 1. The original reasoning agent takes start entity and query relation as inputs, and output the reasoning path and end entity. But this agent will not work well under meta-learning setting, where the embedding of the new query relation is hard to be initialized. Our method replaces the query relation with a meta-encoder that encodes some meta information about the task, which is available on a new task. In the following parts, we will introduce more about the reasoning agent and meta-encoder.

4.2.1 Reasoning Agent

We use the policy proposed in Das et al. (2018), which is called MINERVA. They formulated this problem as a reinforcement learning problem. The state is defined as the combination of the query, the answer, and the current location (an entity in KB). But the answer is not observed, so the observation only includes the query and the current location. The actions are defined as the outgoing edges of the current location. The reward is is reaching the answer, otherwise, it is .

The policy uses LSTM to encode the history information, i.e. the visited path.

where is previous hidden state, is the embedding for the chosen relation at time , and is the embedding of the current entity. The hidden state of the LSTM, is then concatenated with the embedding of the current entity and the query relation . The action distribution is computed by applying softmax on the matching score between the action embedding and the projection of the concatenated embedding, i.e.,

The model structure is the same as proposed in Das et al. (2018), which uses two linear layers ( and ) to encode the observation. Next action is sampled from the action distribution .

4.2.2 Meta-encoder

We can regard the embedding of query relation used in the above MINERVA model as the task identity. But when there comes a new task, there is no good way to find an initial embedding for the new query relation that fits into the reasoning model well. Therefore, we need another meta-encoder that leverage some meta-information about the new task and generate the embedding of query relation, based on which the model will be able to make reasonable outputs. Here we introduce two task-specific encoders to achieve this, neighbor encoder and path encoder.

Neighbor Encoder

Given an instance, i.e., a triple , we use the difference between the embedding of start entity and end entity as an representation of the query relation Bordes et al. (2013). To better represent the entity, we borrow the idea of neighbor encoder from Xiong et al. (2018). Let denotes the neighbor of entity . For each relation-entity pair , We compute the feature representation as

where and are the embedding for and respectively, denotes concatenation, and and are parameters of a linear layer. Then the neighbor embedding of the given entity is computed as the average of the feature representations of all neighbors, i.e.,

where

is the activation function. Then the representation of the query relation is defined as the difference between the neighbor embedding of

and like TransE Bordes et al. (2013):

Path Encoder

The neighbor encoder needs to encode the neighbor as the representation for the start and end entity, and it will not work well when the number of neighbors is small. Thus we propose another encoder for this case called path encoder. Path encoder takes into consideration of the successful path in the graph, i.e., the reasoning path from start entity to end entity for a given query relation. Since not all the paths from start entity to end entity are meaningful, this path encoder is noisier than the neighbor encoder.

Let denotes all the paths from start entity to end entity . For any path , we have , where is the selected relation at step in path , and is the max length of reasoning path. We use LSTM Hochreiter and Schmidhuber (1997) to encode each path:

where is the hidden state of the LSTM at step , and is the embedding for relation . The last hidden state is used as the embedding for path , i.e., . The final path embedding for the given triple is average embedding of all the paths, i.e.,

Dataset # Entities # Relations # Triples # Tasks # Degree
average median
FB15K-237 14505 237 239266 237 20.00 14
NELL 68272 358 181109 67 3.99 1
Table 1: Statistics of the datasets. # Entities, # Relations, # Triples, # Tasks denotes the number of entities, relations, triples, tasks in the corresponding dataset respectively. In the column of # Degree, average and median denote the average and median outgoing degree of each entity respectively.
Setting Method FB15K-237 NELL
Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10 MRR
Full Data MINERVA .124 .146 .187 .142 .137 .176 .202 .163
Best Baselines
Random .017 .028 .043 .027 .047 .100 .165 .086
Transfer .010 .012 .054 .019 .041 .070 .128 .066
MAML .021 .041 .052 .035 .067 .086 .139 .087
MAML-Mask .009 .023 .045 .019 .032 .054 .080 .058
Ours
Neighbor .065 .073 .128 .080 .045 .066 .106 .064
Path .041 .067 .101 .060 .108 .141 .200 .137
Initial Baselines
Random .000 .000 .005 .002 .021 .074 .105 .056
Transfer .000 .005 .023 .006 .037 .055 .077 .051
MAML .005 .005 .023 .010 .017 .031 .054 .032
MAML-Mask .000 .014 .045 .012 .021 .050 .081 .043
Ours
Neighbor .043 .054 .092 .056 .026 .047 .091 .045
Path .000 .005 .058 .012 .082 .109 .164 .104
Table 2: The results on -shot experiments. We also report the performance of MINERVA on these tasks using full data for better comparison. Full Data denotes using MINERVA algorithm on these tasks with full training data. Best denotes the best performance for each method after fine-tuning, and Initial denotes the performance of method at the initial point. We report the average performance on the meta-test tasks. Best result for each evaluation matrix is marked in bold.
(a) FB15K-237 (b) NELL
Figure 2: The change of the performance with the size of few-shot samples for each method. Here we choose the size to be . MRR of each model after fine-tuning is reported.

5 Experiments

To verify the effectiveness of the proposed methods, we compare it with several baselines on two knowledge completion datasets, FB15K-237 Toutanova et al. (2015), and NELL Mitchell et al. (2018). In the following part, we will introduce how we construct the meta-learning setting for knowledge graph reasoning and the baselines we use, then we will show the main results and other analytic experiments.

5.1 Datasets and Settings

We construct the meta-learning setting from two well-known knowledge completion datasets: FB15K-237 Toutanova et al. (2015) and NELL Mitchell et al. (2018). FB15K-237 is created from original FB15K by removing various sources of test leakage. Every relation in the training set of FB15K-237 is regarded as an individual task. For the NELL dataset, we use the modified version from Xiong et al. (2018), which chooses relations with more than triples, and less than triples as one-shot tasks. Here we used those selected tasks as meta-learning tasks. The statistics of the two datasets are shown in Table 1.

Let , , and denotes the training data, validation data and test data in original dataset such as FB15K-237. We choose some tasks with positive transfer (task that has better performance when training together with other tasks than training solely) as meta-dev and meta-test tasks. More specifically, we choose task with at least and positive transfer on FB15K-237 and NELL dataset respectively, from which we only keep tasks with more than samples in the dev set. Note that and are carefully chosen threshold so that we can get enough tasks with reasonable positive transfer. Through this way, we get and relations for meta-dev/meta-test on FB15K-237 and NELL respectively, and other relations left are used for meta-training. We denote the partitioned relation set as , and each relation has its own training/test data.

5.2 Baselines and Hyper-parameters

We compare our methods with the following baselines. Random method trains a separate model for each task from random initialization. Transfer method will learn an initial model by using samples from . MAML uses the training framework of MAML to learn an initial point, and the task identity (the query relation) is given. MAML-Mask uses the same training framework as MAML, the difference is that we mask the task identity by setting the query relation for all tasks to be . Neighbor and Path method means we use the neighbor encoder and path encoder to encode the task-specific information respectively.

We tuned the hyper-parameters for all the baselines and our methods, and they are set as follows. For Transfer, the batch size in the pre-training phase is set to be . For MAML, MAML-Mask, Neighbor, and Path, the batch size is set to be . For Path, adaption step is applied to compute the updated parameters, and . For Neighbor, MAML, and MAML-Mask, and adaption steps are applied on FB15K-237 and NELL respectively, and when the number of adaption step , when , and . Other parameters are set as default as in Das et al. (2018).

5.3 Results

We conduct our experiments under -shot learning setting, i.e., there are training samples for each task in and . We use the mean reciprocal rank (MRR) and Hits@K to evaluate each model. For each method, we will first fine-tune and test the initial model on meta-dev tasks, through which we choose the number of fine-tune steps and fix it on meta-test tasks. For example, if a model has the best performance after fine-tune steps on meta-dev tasks, then the model will be tested after fine-tune steps on meta-test tasks. We report the best performance on meta-test tasks for each method in Table 2 as Best group. We also list the results using full data for better comparison. From the results, we can see that neighbor encoder and path encoder achieves the best performance on FB15K-237 and NELL dataset respectively. It is reasonable that neighbor encoder does not perform well on NELL dataset since the median outgoing degree on this dataset is only . We also note that path encoder outperforms other baselines on FB15K-237, which verify the consistent effectiveness of the task-specific encoder. While other baselines do not show much difference as the simple Random baseline, sometimes they even underperform Random baseline.

In order to show that our model can have better initial point than others, we report the performance of the initial point without any training in Table 2 as Initial group. We notice that the baselines have very poor initial performances on FB15K-237, which is reasonable since the model has never seen the new relation. From the results, we can see that the neighbor encoder and path encoder achieves much better initial point than other baselines in FB15K-237 and NELL respectively. The path encoder has a fair performance which is similar to the best of the baselines MAML-Mask, we think the reason that path encoder does not perform very well is the path encoder is noisier than neighbor encoder as we mentioned before.

5.4 Few-shot Size

To investigate the impact of the few-shot size on the performance of the model, we evaluate the model using various few-shot size: . The results are shown in Figure 2. From the results, we can see that for MAML and MAML-Mask, their performances remain nearly the same after the size reaches on FB15K-237 dataset. The performance of MAML is not stable on NELL dataset, while MAML-Mask keeps increasing. Both methods underperform the Random baseline when the size increases. For Transfer method, its performance increases with the few shot size on FB15K-237, but there is a huge drop on NELL when the size is , which indicates it is not stable enough, and sensitive to the noise in the data. The neighbor encoder has the best performance on FB15K-237 dataset, but not well on NELL due to the small neighbor size. Path encoder seems to be less stable compared with neighbor encoder since there is performance drop once on both datasets, but it achieves the best performance on NELL and second-best performance when size is larger than except .

Setting FB15K-237
Hits@1 Hits@3 Hits@10 MRR
Encoder-1-shot .047 .058 .117 .064
Encoder-50-shot .049 .070 .128 .069
No-encoder .008 .035 .084 .032
Table 3: The comparison of performance for model with different initialization on FB15K-237 dataset. Encoder-1-shot and Encoder-50-shot denotes using neighbor encoder with and samples. No-encoder means using a random initialization. We report the average performance on meta-test tasks. Best result for each evaluation matrix is marked in bold.

5.5 Ablation Study

To verify the effectiveness of the encoder, we compare the model using task-specific initialization with the model using random initialization at the initial point. We choose the neighbor encoder on FB15K-237 dataset to conduct the ablation study. The comparison results are shown in Table 3. The three models in the table use the same reasoning model, the only difference is the task representation. Encoder-1-shot and Encoder-50-shot apply neighbor encoder to generate the task representation using and samples respectively, while No-encoder uses a randomly initialized representation. By comparing Encoder-1-shot with No-encoder, we can see that the model can achieve much better performance through the way of encoding task-related information, even using only one sample, which also indicates the generated task representations are meaningful. Also, better initialization can be achieved when using more samples, since the performance of Encoder-50-shot is better than that of Encoder-1-shot.

6 Conclusion

In this paper, we consider multi-hop reasoning over knowledge graphs under few-shot learning setting, where limited samples are available on new tasks. We improve upon MAML by using a meta-encoder to encode task-specific information. Through this way, our method can create a task-dependent initial model that better fits the target task. Neighbor encoder and path encoder are proposed for our problem. Experiments on FB15K-237 and NELL under meta-learning setting show that our task-specific meta-encoder yields a better initial point and outperforms other baselines.

References

  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives (2007) DBpedia: A nucleus for a web of open data. In The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007., K. Aberer, K. Choi, N. F. Noy, D. Allemang, K. Lee, L. J. B. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux (Eds.), Lecture Notes in Computer Science, Vol. 4825, pp. 722–735. External Links: Link, Document Cited by: §1.
  • K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, J. T. Wang (Ed.), pp. 1247–1250. External Links: Link, Document Cited by: §1.
  • A. Bordes, N. Usunier, S. Chopra, and J. Weston (2015) Large-scale simple question answering with memory networks. CoRR abs/1506.02075. External Links: Link, 1506.02075 Cited by: §1.
  • A. Bordes, N. Usunier, A. García-Durán, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger (Eds.), pp. 2787–2795. External Links: Link Cited by: §1, §2, §4.2.2.
  • W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang (2019) A closer look at few-shot classification. CoRR abs/1904.04232. External Links: Link, 1904.04232 Cited by: §1.
  • W. Chen, W. Xiong, X. Yan, and W. Y. Wang (2018) Variational knowledge graph reasoning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), M. A. Walker, H. Ji, and A. Stent (Eds.), pp. 1823–1832. External Links: Link Cited by: §2.
  • R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum (2018) Go for a walk and arrive at the answer: reasoning over paths in knowledge bases using reinforcement learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §2, §2, §4.2.1, §4.2.1, §5.2.
  • C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In

    Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017

    , D. Precup and Y. W. Teh (Eds.),
    Proceedings of Machine Learning Research, Vol. 70, pp. 1126–1135. External Links: Link Cited by: §1, §2, §3.2, §3.3, §3.3.
  • M. Gardner, P. P. Talukdar, B. Kisiel, and T. M. Mitchell (2013) Improving learning and inference in a large knowledge-base using latent syntactic cues. In

    Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL

    ,
    pp. 833–838. External Links: Link Cited by: §2.
  • M. Gardner, P. P. Talukdar, J. Krishnamurthy, and T. M. Mitchell (2014) Incorporating vector space similarity in random walk inference over knowledge bases. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans (Eds.), pp. 397–406. External Links: Link Cited by: §2.
  • J. Gu, Y. Wang, Y. Chen, V. O. K. Li, and K. Cho (2018)

    Meta-learning for low-resource neural machine translation

    .
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii (Eds.), pp. 3622–3631. External Links: Link Cited by: §1, §2.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. External Links: Link, Document Cited by: §4.2.2.
  • P. Huang, C. Wang, R. Singh, W. Yih, and X. He (2018) Natural language to structured query generation via meta-learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), M. A. Walker, H. Ji, and A. Stent (Eds.), pp. 732–738. External Links: Link Cited by: §1, §2.
  • N. Lao and W. W. Cohen (2010) Relational retrieval using a combination of path-constrained random walks. Machine Learning 81 (1), pp. 53–67. External Links: Link, Document Cited by: §1, §2, §2.
  • X. V. Lin, R. Socher, and C. Xiong (2018) Multi-hop knowledge graph reasoning with reward shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii (Eds.), pp. 3243–3253. External Links: Link Cited by: §2, §2.
  • N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel (2018) A simple neural attentive meta-learner. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §1.
  • T. M. Mitchell, W. W. Cohen, E. R. H. Jr., P. P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. D. Mishra, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. A. Platanios, A. Ritter, M. Samadi, B. Settles, R. C. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling (2018) Never-ending learning. Commun. ACM 61 (5), pp. 103–115. External Links: Link, Document Cited by: §5.1, §5.
  • A. Neelakantan, B. Roth, and A. McCallum (2015) Compositional vector space models for knowledge base inference. In 2015 AAAI Spring Symposia, Stanford University, Palo Alto, California, USA, March 22-25, 2015, External Links: Link Cited by: §2, §2.
  • A. Nichol, J. Achiam, and J. Schulman (2018) On first-order meta-learning algorithms. CoRR abs/1803.02999. External Links: Link, 1803.02999 Cited by: §2, §3.3.
  • M. Nickel, V. Tresp, and H. Kriegel (2011) A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, L. Getoor and T. Scheffer (Eds.), pp. 809–816. External Links: Link Cited by: §2.
  • S. Ravi and H. Larochelle (2017) Optimization as a model for few-shot learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Link Cited by: §1.
  • A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell (2018) Meta-learning with latent embedding optimization. CoRR abs/1807.05960. External Links: Link, 1807.05960 Cited by: §2.
  • Y. Shen, J. Chen, P. Huang, Y. Guo, and J. Gao (2018) M-walk: learning to walk over graphs using monte carlo tree search. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 6787–6798. External Links: Link Cited by: §2, §2.
  • F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In

    2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018

    ,
    pp. 1199–1208. External Links: Link, Document Cited by: §1, §2.
  • K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, and M. Gamon (2015) Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, L. Màrquez, C. Callison-Burch, J. Su, D. Pighin, and Y. Marton (Eds.), pp. 1499–1509. External Links: Link Cited by: §2, §5.1, §5.
  • T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard (2016) Complex embeddings for simple link prediction. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, M. Balcan and K. Q. Weinberger (Eds.), JMLR Workshop and Conference Proceedings, Vol. 48, pp. 2071–2080. External Links: Link Cited by: §1, §2.
  • D. Vrandecic and M. Krötzsch (2014) Wikidata: a free collaborative knowledgebase. Commun. ACM 57 (10), pp. 78–85. External Links: Link, Document Cited by: §1.
  • W. Y. Wang and W. W. Cohen (2015) Joint information extraction and reasoning: A scalable statistical relational learning approach. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 355–364. External Links: Link Cited by: §2.
  • J. Wu, R. Xie, Z. Liu, and M. Sun (2016) Knowledge representation via joint learning of sequential text and knowledge graphs. External Links: 1609.07075 Cited by: §2.
  • W. Xiong, T. Hoang, and W. Y. Wang (2017) DeepPath: A reinforcement learning method for knowledge graph reasoning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, M. Palmer, R. Hwa, and S. Riedel (Eds.), pp. 564–573. External Links: Link Cited by: §2, §2.
  • W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang (2018) One-shot relational learning for knowledge graphs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii (Eds.), pp. 1980–1990. External Links: Link Cited by: §1, §4.2.2, §5.1.
  • B. Yang, W. Yih, X. He, J. Gao, and L. Deng (2015) Embedding entities and relations for learning and inference in knowledge bases. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §2.
  • X. Yao and B. V. Durme (2014) Information extraction over structured data: question answering with freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume 1: Long Papers, pp. 956–966. External Links: Link Cited by: §1.
  • W. Yih, M. Chang, X. He, and J. Gao (2015) Semantic parsing via staged query graph generation: question answering with knowledge base. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 1321–1331. External Links: Link Cited by: §1.
  • M. Yu, W. Yin, K. S. Hasan, C. N. dos Santos, B. Xiang, and B. Zhou (2017) Improved neural relation detection for knowledge base question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, R. Barzilay and M. Kan (Eds.), pp. 571–581. External Links: Link, Document Cited by: §1.