Few-Shot Knowledge Graph Completion

Knowledge graphs (KGs) serve as useful resources for various natural language processing applications. Previous KG completion approaches require a large number of training instances (i.e., head-tail entity pairs) for every relation. The real case is that for most of the relations, very few entity pairs are available. Existing work of one-shot learning limits method generalizability for few-shot scenarios and does not fully use the supervisory information; however, few-shot KG completion has not been well studied yet. In this work, we propose a novel few-shot relation learning model (FSRL) that aims at discovering facts of new relations with few-shot references. FSRL can effectively capture knowledge from heterogeneous graph structure, aggregate representations of few-shot references, and match similar entity pairs of reference set for every relation. Extensive experiments on two public datasets demonstrate that FSRL outperforms the state-of-the-art.


page 1

page 2

page 3

page 4


Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Few-shot Knowledge Graph (KG) completion is a focus of current research,...

One-Shot Relational Learning for Knowledge Graphs

Knowledge graphs (KGs) are the key components of various natural languag...

Learning Relation-Specific Representations for Few-shot Knowledge Graph Completion

Recent years have witnessed increasing interest in few-shot knowledge gr...

Exploring Entity Interactions for Few-Shot Relation Learning (Student Abstract)

Few-shot relation learning refers to infer facts for relations with a li...

Improving Hyper-Relational Knowledge Graph Completion

Different from traditional knowledge graphs (KGs) where facts are repres...

Adaptive Prototypical Networks with Label Words and Joint Representation Learning for Few-Shot Relation Classification

Relation classification (RC) task is one of fundamental tasks of informa...

Dynamic Relation Repairing for Knowledge Enhancement

Dynamic relation repair aims to efficiently validate and repair the inst...


Large-scale knowledge graphs (KGs) such as YAGO [Suchanek, Kasneci, and Weikum2007], NELL [Carlson et al.2010], and Wikidata [Vrandečić and Krötzsch2014] usually represent facts in the form of relations (edges) between (head-tail) entity pairs (nodes). This kind of graph-structured knowledge is essential for many downstream applications such as search, question answering, and semantic web. However, KGs are known for their incompleteness. In order to automate the KG completion process, many work [Nickel, Tresp, and Kriegel2011, Bordes et al.2013, Socher et al.2013, Yang et al.2015, Trouillon et al.2016, Schlichtkrull et al.2018, Dettmers et al.2018] have been proposed to infer missing relations by learning existing ones. For example, RESCAL [Nickel, Tresp, and Kriegel2011]

employs tensor factorization to capture inherent structure of multi-relational data in KGs. TransE

[Bordes et al.2013] interprets relations as translation operation on the low-dimensional embeddings of entities. And recently, G-GCN [Schlichtkrull et al.2018]

models relational structure by graph neural network.

The above methods need a good number of entity pairs for every relation. However, the frequency distributions of relations in real datasets often have long tails. A large portion of relations have only few entity pairs in KGs. It is important and challenging to deal with the relations with limited (few-shot) number of entity pairs. The few-shot scenario incurs the infeasibility of previous models which assume available, sufficient training instances for all relations.

In light of the above issue, Xiong et al. xiong2018one proposed GMatching which introduces a local neighbor encoder to learn entity embeddings. It achieves considerable performance in one-shot relation inference yet still has some limitations. First, GMatching assumes all local neighbors contribute equally to the entity embedding, whereas heterogeneous neighbors could have different impacts. For example, the embedding of “Nadella” may have more influence on the embedding of “Microsoft” than “Apple” as the company has only one CEO and a number of competitors. Thus the neighbor encoder of GMatching learns insufficient graph structure representation and impairs the model performance. Second, GMatching is designed under one-shot learning setting. Although it can be modified to few-shot case by adding a pooling layer over reference set, the general operation ignores the interaction among few-shot reference instances and limits the representation capability of reference set. Therefore, it is crucial to design a model to effectively complete relations with limited reference entity pairs.

To address the above weak points, we propose a Few-Shot Relation L

earning model (FSRL) with the purpose of learning a matching function that can effectively infer the true entity pairs given the set of few-shot reference entity pairs for each relation. To be more specific, first, we propose a relation-aware heterogeneous neighbor encoder to learn entity embeddings based on the heterogeneous graph structure and attention mechanism. It captures both different relation types and impact differences of local neighbors. Next, we design a recurrent autoencoder aggregation network to model interactions of few-shot reference entity pairs and accumulate their expression capabilities for each relation. With the aggregated embedding of reference set, we finally employ a matching network to discover similar entity pairs of reference set. The meta-training based gradient descent approach is employed to optimize model parameters. The learned model can be further applied to infer true entity pairs for any new relation without any fine-tuning step.

To summarize, our main contributions are:

  • [leftmargin=*]

  • We introduce a new few-shot KG completion problem which is different from previous work and more suitable for practical scenarios.

  • We propose a few-shot relation learning model to solve the problem. The model performs joint optimization of several learnable neural network modules.

  • We conduct extensive experiments on two public datasets. Results demonstrate that our model outperforms state-of-the-art baselines.

Related Work

Here we survey two topics relevant to this work: few-shot learning and relation learning for KGs.

Few-Shot Learning

Recent few-shot learning models have two categories: (1) metric based approaches [Koch, Zemel, and Salakhutdinov2015, Vinyals et al.2016, Snell, Swersky, and Zemel2017, Mishra et al.2018]; (2) meta-optimizer based approaches [Ravi and Larochelle2016, Finn, Abbeel, and Levine2017, Li et al.2017, Finn, Xu, and Levine2018, Lee and Choi2018, Yao et al.2019b]. The former one learns an effective metric and corresponding matching function among a set of training instances. For example, matching networks [Vinyals et al.2016] make predictions by comparing the input example with a few-shot labeled support set. Prototypical networks [Snell, Swersky, and Zemel2017]classify each sample by computing the distance to prototype representation of each class. The later one aims to quickly optimize the model parameters given the gradients on few-shot data instances. One example is the model-agnostic meta-learning (MAML) [Finn, Abbeel, and Levine2017] which trains model via a small number of gradient updates and leads to fast learning on a new task. Another example is the LSTM-based meta-learner [Ravi and Larochelle2016] that learns the exact optimization algorithm used to train another neural network classifier in the few-shot regime. Unlike the previous few-shot learning study that focus on vision [Yang et al.2018]

, imitation learning

[Duan et al.2017], spatiotemporal analysis [Yao et al.2019a]

, sentiment analysis 

[Li et al.2019] domains, we leverage few-shot learning to complete KGs.

Relation Learning for KGs

Many work have been proposed to model relational structure in KGs and automate KG completion. For example, Nickel et al. nickel2011three designed RESCAL to model inherent structure of dyadic relational data by tensor factorization. Bordes et al.

bordes2013translating proposed TransE that interprets relationships as translation operating on the low-dimensional embeddings of the entities. Unlike representing entities with single vectors, Socher

et al. socher2013reasoning developed NTN that represents entities as an average of their constituting word vectors. Later, more sophisticated models have been proposed, such as DistMul [Yang et al.2015] and ComplEx [Trouillon et al.2016]. Recently, deep neural network based models like R-GCG [Schlichtkrull et al.2018] and ConvE [Dettmers et al.2018] have been presented for further improvement. Different from those models that assume sufficient training instances are available, Xiong et al. xiong2018one presented GMatching model for one-shot relation learning in KGs. In this work, we study a practical few-shot scenario which deals with long tail or newly added relations with few-shot reference instances.


In this section, we formally define the few-shot knowledge graph completion problem and detail the corresponding few-shot learning settings.

Problem Definition

A KG is represented as a collection of triples , where and denote the entity set and relation set, respectively. The KG completion task is to either predict the tail entity given the head entity and the query relation : , or predict unseen relation between head entity and tail entity: . In this work, we focus on the former case as we want to predict the unseen facts of a given relation. Unlike previous studies that assume enough entity pairs are available for each relation, this work considers a practical scenario that few-shot entity pairs (reference set) are given. The purpose is to rank the true entity higher than false candidate entities , given few-shot reference pairs of relation . Formally, the problem is defined as follows:

Problem 1

Few-Shot Knowledge Graph Completion Given the relation and its few-shot reference entity pairs

, the task is to design a machine learning model which ranks all tail candidate entities

for each new head entity , such that the top ranked are true tail entities of .

The candidate entities set is constructed based on the entity type constraint [Xiong et al.2018], and we only consider a closed set of entities which excludes the unseen entities when predicting facts of new relations in test period.

Few-Shot Learning Settings

The purpose of this work is to design a machine learning model which could be utilized to predict the new facts with few-shot reference instances. Following the standard few-shot learning settings [Ravi and Larochelle2016, Snell, Swersky, and Zemel2017], we can access to a set of training tasks. In the problem, each training task corresponds to a KG relation with its own training/testing entity pairs data: . We denote this kind of task set as meta-training set, . To imitate the few-shot relation prediction in evaluation period, each only contains few-shot entity pairs . Besides, contains all testing entity pairs of , including true tail entities of each query and the remaining candidate entities where is an entity in . The proposed model thus could be tested on this set by ranking all candidate entities given the test query and the few-shot reference pairs in . We denote the ranking loss of relation as , where is the set of model parameters. Thus, the objective of model training is defined as:


where represents the number of tuples in . In next section, we will detail how to formulate and optimize the above objective function.

After sufficient training, the learned model can be utilized to predict facts of each new relation . This step is called the meta-testing. The relations in meta-testing are unseen from meta-training, i.e., . The same as meta-training relations, each relation in meta-testing has its own few-shot training data and testing data . These relations form a meta-testing set which is denoted as . In addition, we leave out a subset of relations in as the meta-validation set . Furthermore, the model can access to a background KG , which is a subset of that excludes all the relations in , and .


In this section, we present the detail of FSRL. FSRL consists of three major parts: (1) encoding heterogeneous neighbors for each entity; (2) aggregating few-shot reference entity pairs for each relation; (3) matching query pairs with reference set for relation prediction. Figure 1 shows the framework of FSRL.

Figure 1: (a) The framework of FSRL: it first generates entity embedding via heterogeneous neighbor encoder, then aggregates few-shot reference entity pairs and generate reference set embedding, finally employs a matching network to compute similarity score between query pair and reference set; (b) the relation-aware heterogeneous neighbor encoder for entity; (c) the recurrent autoencoder aggregation network for reference set; (d) the recurrent matching network for query pair and reference set.

Encoding Heterogeneous Neighbors

Although many work [Nickel, Tresp, and Kriegel2011, Bordes et al.2013, Yang et al.2015] have been proposed to learn entity embeddings by using relational information, Xiong et al. [Xiong et al.2018] demonstrated that explicitly encoding graph local structure (i.e., one-hop neighbors) can benefit relation prediction. The proposed neighbor encoder takes the average of feature representations of all relational neighbors as the embedding of given entity. Despite the desirable performance, it neglects the different impacts of heterogeneous neighbors which may help improve entity embedding [Zhang et al.2019]. In light of this issue, we design a relation-aware heterogeneous neighbor encoder. Specifically, we denote the set of relational neighbors () of given head entity as , where is the background knowledge graph, and represent the -th relation and corresponding tail entity of , respectively. The heterogeneous neighbor encoder should be able to encode and output a feature representation of by considering different impacts of relational neighbors . To achieve this goal, we introduce an attention module and formulate the embedding of as follows:


where denotes activation unit (we use Tanh), represents concatenation operator, , are pre-trained embeddings of and . Besides, , and (: pre-trained embedding dimension) are learnable parameters. Figure 1(b) illustrates the detail of heterogeneous neighbor encoder. According to Eq. 2, the formulation of considers the different impacts of heterogeneous relational neighbors via attention weight and leverages both embeddings of entity and relation to compute .

Aggregating Few-Shot Reference Set

The current models (e.g., GMatching) are not able to model the interactions of few-shot instances in reference set, which limits model capability. Thus, we need to design a module to effectively formulate the aggregated embedding of reference set for each relation . By applying the neighbor encoder to each entity pair , we can obtain the representation of in the form . Learning the representation of a reference set with few-shot entity pairs is challenging as it requires modeling interactions among different entity pairs and accumulating their expression capability. Inspired by the common practices in learning sentence embeddings [Conneau et al.2017] in natural language processing and aggregating node embeddings [Hamilton, Ying, and Leskovec2017] in graph neural networks, we tackle the challenge and formulate the embedding of by aggregating representations of all entity pairs in :



is an aggregation function which can be pooling operation, feed-forward neural network, etc. Motivated by the recent success of recurrent neural network aggregator in order-invariant problems such as graph embedding 

[Hamilton, Ying, and Leskovec2017], we design a recurrent autoencoder aggregator which achieves good capability. Specifically, the entity pair embeddings are sequentially fed into a recurrent autoencoder by:


where is the size of reference set (i.e., few-shot size). The hidden states and of encoder and decoder are computed by:


where and represent recurrent encoder and decoder (e.g., LSTM [Hochreiter and Schmidhuber1997]), respectively. The reconstruction loss for optimizing autoencoder is defined as:


will be incorporated to the relational ranking loss for refining the representation of each entity pair, as we will describe later. In order to formulate the embedding of reference set, we aggregate all hidden states of encoder and extend them by adding residual connection

[He et al.2016] and attention weight. Formally, is computed by:


where , and (: aggregated embedding dimension) are learnable parameters. Figure 1(c) illustrates the detail of recurrent autoencoder aggregator. The formulation of aggregates all representations of and each component in this module will make effect for better performance, as we will show in the ablation study experiment.

Matching Query and Reference Set

With the heterogeneous neighbor encoder and the reference set aggregator , we now present how to effectively match each query entity pair ( is the set of all query pairs of relation ) with the reference set . By applying and to the query entity pair and the reference set , we can obtain two embedding vectors and , respectively. In order to measure the similarity between two vectors, we employ a recurrent processor [Vinyals et al.2016] to perform multiple steps matching. The -th process step is formulated as:


where is LSTM cell [Hochreiter and Schmidhuber1997] with input , hidden state and cell state . The last hidden state after “processing” step is the refined embedding of query pair : . We use the inner product between and as the similarity score for later ranking optimization procedure. Figure 1(d) shows the detail of matching processor. This module is effective for improving model performance, as we will demonstrate in the ablation study experiment.

Objective and Model Training

For the query relation , we randomly sample a set of few positive (true) entity pairs and regard them as the reference set . The remaining positive entity pairs are utilized as positive query pairs. Besides, we construct a group of negative (false) entity pairs by polluting the tail entities. Therefore the ranking loss is formulated as:


where is standard hinge loss and is safety margin distance, and are similarity scores between query pairs and reference set . By leveraging the reconstruction loss of reference set aggregator, we define the final objective function as:


where is trade-off factor between and . To minimize and optimize model parameters, we take each relation as a task and design a batch sampling based meta-training procedure. The detail of this process is summarized in Algorithm 1.

input : Meta-training task (relation) set
Pre-trained KG embeddings
Initial model parameters , and
1 while not done do
2        Shuffle tasks (relations) in
3        for  do
4               Sample few-shot entity pairs as reference set
5               Sample a batch of query entity pairs
6               Pollute the tail entity of to get
7               Accumulate the loss by Eq. 10
8               Update parameters by Adam optimizer
10        end for
12 end while
13return Optimal model parameters , and
Algorithm 1 FSRL Meta-Training


In this section, we conduct extensive experiments to evaluate the performance of proposed model and verify the effectiveness of each component in the model. Few-shot size impact analysis and embedding visualization are also provided.

Experimental Design


We use two public datasets for experiments. The first one is based on NELL [Mitchell et al.2018], a system that continuously collects structured knowledge from webs. The second one is based on Wikidata [Vrandečić and Krötzsch2014]. Table 1 lists the statistics of two datasets. The same as [Xiong et al.2018], we select the relations with less than 500 but more than 50 triples as few-shot tasks. There are 67 and 183 tasks in NELL and Wiki data, respectively. In addition, we use 51/5/11 task relations for training/validation/testing in NELL and the division is set to 133/16/34 in Wiki.

Dataset # Ent. # Triples # Rel. # Tasks
NELL 68,545 181,109 358 67
Wiki 4,838,244 5,859,240 822 183
Table 1: Statistics of datasets. # Ent. denotes the number of all unique entities and # triples denotes the number of all relational triples. # Rel. represents the number of all relations and # Tasks represents the number of relations selected as few-shot tasks.

Baseline Methods

We consider two categories of baseline methods for comparison:

  • [leftmargin=*]

  • Relational embedding methods. This type of model learns entity/relation embeddings by modeling relational structure in KG. We employ four widely used methods: RESCAL [Nickel, Tresp, and Kriegel2011], TransE [Bordes et al.2013], DistMul [Yang et al.2015], and ComplEx [Trouillon et al.2016]. All entity pairs of background relations and training relations, as well as few-shot training entity pairs of validate and test relations are used to train models.

  • Graph neighbor encoder methods. This type of model joints graph local neighbor encoder and matching network to learn entity embeddings and predict facts of new relations. We employ state-of-the-art model GMatching [Xiong et al.2018] for comparison. Note that there are few-shot embeddings of entity pairs in reference set, we use max/mean pooling (denoted as MaxP and MeanP) to obtain the general embedding of reference set. Moreover, we also consider taking the maximum of similarity scores between a query and all references as the final ranking score of this query. Thus in total, this type of model includes three baseline methods which are denoted as GMatching (MaxP), GMatching (MeanP), and GMatching (Max).

Data: NELL Data: Wiki
Model Hits@1 Hits@5 Hits@10 MRR Hits@1 Hits@5 Hits@10 MRR
RESCAL .069/.141 .160/.313 .204/.383 .119/.223 .259/.057 .297/.090 .309/.126 .279/.081
TransE .056/.119 .112/.256 .189/.320 .104/.193 .186/.069 .352/.134 .431/.176 .273/.111
DistMult .066/.164 .123/.306 .178/.375 .109/.231 .271/.069 .419/.156 .459/.195 .339/.112
ComplEx .049/.129 .092/.223 .112/.273 .079/.185 .226/.085 .315/.117 .397/.145 .282/.106
GMatching (MaxP) .244/.198 .418/.370 .524/.464 .331/.279 .313/.095 .402/.235 .468/.324 .346/.171
GMatching (MeanP) .257/.186 .455/.360 .542/.453 .341/.267 .290/.128 407/.274 .484/.350 .352/.203
GMatching (Max) .179/.152 .391/.335 .476/.445 .273/.241 .279/.135 .396/.284 .477/.374 .342/.214
FSRL (Ours) .345/.211 .502/.433 .570/.507 .421/.318 .338/.155 .430/.327 .486/.406 .390/.241
Table 2: The overall results of all methods. GMatching is the best baseline. Our model has the best performances in all cases.

Reproducibility Settings

The above relational embedding methods can be utilized to pre-train KG embeddings, which are further used as the input for GMatching and FSRL. We select ComplEx for pre-training as GMatching and FSRL with it achieve best performances in most cases. For the proposed model, we tune hyper-parameters on the validation dataset. The embedding dimension is set to 100 and 50 for NELL and Wiki dataset, respectively. The maximum number of local neighbors in heterogeneous neighbor encoder is set to 30 for both datasets. In addition, we use LSTM as the reference set aggregator and matching processor. The dimension of LSTM’s hidden state is set to 200 and 100 for NELL and Wiki dataset, respectively. The number of recurrent steps equals 2 in matching network. We use the Adam optimizer [Kingma and Ba2015]

to update model parameters. The initial learning rate equals 0.001 and the weight decay is 0.25 for each 10k training steps. The margin distance and trade-off factor in the objective function are set to 5.0 and 0.0001, respectively. In entity candidate set construction, we set the maximum size to 1000 for both datasets. We employ Pytorch

111https://pytorch.org/ to implement our model and further conduct it on a server with GPU machines.

Evaluation Metrics

Relations and their entity pairs in training data are utilized to train the model while those of validation and test data are respectively used to tune and evaluate model. We use the top-k hit ratio (Hits@k) and the mean reciprocal rank (MRR) to evaluate performances of different methods. The k is set to 1, 5, and 10. The few-shot size is set to 3 for the following experiments. In addition, we also conduct experiment to analyze the impact of .

Results Comparison

Overall Comparison with Baselines

The performances of all models are reported in Table 2, where the best results are highlighted in bold and the best baseline results are indicated by underline. The former/later score denotes result in validation/test dataset. According to this table:

  • [leftmargin=*]

  • The graph neighbor encoder methods (GMatching) outperform the relational embedding methods, showing that incorporating graph local structure and matching network is effective for learning entity embeddings and predicting facts of new relations.

  • FSRL achieves the best performances in all cases. The average relative improvement (%) over the best baseline method is up to 34% and 15% in NELL and Wiki data, respectively. It demonstrates the effectiveness of our model. The heterogeneous neighbor encoder and recurrent autoencoder aggregation network benefit few-shot relation prediction in KGs.

Comparison Over Different Relations

Besides the overall performance for all relations, we also conduct experiments to evaluate model performance for each relation in NELL test data. Table 3 reports the results of FSRL and GMatching. The better result for each case is highlighted in bold. According to this table:

  • [leftmargin=*]

  • The results of both models on different relations are of high variance. It is reasonable since different relations have different sizes of candidate set for evaluation. The relations (e.g., relation 1) with small candidate set (or are easily to be predicted) have relative large scores and two models in these cases are compared.

  • FSRL has better performances than GMatching in most cases. It demonstrates that our model is robust for different relations and outperforms GMatching for most relations.

RId Model Hits@1 Hits@5 Hits@10 MRR
1 GMatching 0.946 1.000 1.000 0.970
FSRL 0.972 0.986 1.000 0.981
2 GMatching 0.279 0.390 0.451 0.360
FSRL 0.972 0.972 0.986 0.975
3 GMatching 0.069 0.115 0.152 0.108
FSRL 0.466 0.418 0.357 0.398
4 GMatching 0.017 0.033 0.070 0.035
FSRL 0.044 0.215 0.343 0.132
5 GMatching 0.069 0.115 0.151 0.107
FSRL 0.044 0.215 0.343 0.132
6 GMatching 0.192 0.515 0.581 0.338
FSRL 0.342 0.549 0.617 0.442
7 FSRL 0.478 0.692 0.804 0.575
GMatching 0.438 0.716 0.842 0.562
8 GMatching 0.151 0.502 0.669 0.312
FSRL 0.201 0.543 0.681 0.347
9 GMatching 0.449 0.707 0.737 0.564
FSRL 0.163 0.750 0.838 0.408
10 GMatching 0.043 0.129 0.206 0.098
FSRL 0.069 0.192 0.291 0.139
11 GMatching 0.076 0.708 0.736 0.341
FSRL 0.104 0.631 0.781 0.416
Table 3: The results of GMatching and FSRL for each relation (RId) in NELL test data.

Ablation Study

FSRL is a joint learning framework of several neural network modules. To investigate the contributions of different components, we conduct the following ablation studies from three perspectives in Table 4, where the results in NELL data are reported and best results are highlighted in bold:

  • [leftmargin=*]

  • (AS_1) We investigate the effectiveness of relation-aware heterogeneous neighbor encoder. We replace it by a mean pooling layer over all neighbors’ embeddings. As shown in the table, our model has much better performances than the variant (in AS_1), indicating the large benefit of heterogeneous neighbor encoder.

  • (AS_2) We analyze the impacts of different modules of aggregation network in (AS_2a)-(AS_2c). In (AS_2a), we replace the recurrent autoencoder aggregation with mean pooling operation. In (AS_2b), we replace the attention weight of recurrent autoencoder with a mean pooling layer. In (AS_2c), we remove the decoder part, and only use encoder for aggregation. According to the results in table, our model outperforms all of three variants in most cases, demonstrating the effect of each component in aggregation network.

  • (AS_3) We further analyze the effectiveness of matching network. We remove the LSTM cell and use inner-product between query embedding and reference embedding as similarity (ranking) score. From the results in table, our model largely outperforms the variant (in AS_3), showing that recurrent matching network has good capability in computing relevance between query and reference.

Model Hits@1 Hits@5 Hits@10 MRR
(AS_1) .121/.179 .312/.379 .432/.464 .212/.272
(AS_2a) .281/.191 .463/.414 .538/.504 .368/.297
(AS_2b) .298/.219 .480/.420 .556/.498 .382/.315
(AS_2c) .333/.226 .478/.418 .552/.499 .401/.313
(AS_3) .282/.205 .463/.394 .548/.478 .370/.299
Ours .345/.211 .502/.433 .570/.507 .421/.318
Table 4: Results of model variants in NELL data. Our model has better performance than all model variants.


Impact of Few-Shot Size

This work studies few-shot relation learning in KGs, thus we conduct experiment to analyze the impact of few-shot size . Figure 2 reports the performances of our model and GMatching (MaxP) in NELL test data with different settings of . According to the figure:

  • [leftmargin=*]

  • With the increment of , performances of both models increase. It indicates that larger reference set can produce better reference set embedding for the relation.

  • Our model consistently outperforms GMatching in different , demonstrating the stability of the proposed model for few-shot relation completion in KGs.

Figure 2: Impact of few-shot size K. Our model consistently outperforms GMatching.

Embedding Visualization

To show a better performance comparison between our model and GMatching, we visualize the 2D embeddings of positive and negative candidate entity pairs for each relation. Figure 3 shows the visualization results of our model and GMatching for two test relations of NELL data, i.e., “produced_by” and “team_coach”, which vary from each other in semantic meaning and size of positive/negative candidate set. According to the figure, both methods can distinguish embeddings of positive and negative candidates well. However, it is clear that our model achieves better performance and embeddings of two classes are clearly discriminated from each other, which further demonstrates the superior performance of our model in terms of visualization.

Figure 3: Embedding visualization of positive and negative candidates of two selected relations. Our model can clearly discriminate embeddings of these two types of candidates.


In this paper, we presented a new few-shot KG completion problem and proposed an innovative few-shot relation learning model, i.e., FSRL, to solve the problem. FSRL performs joint optimization of relation-aware heterogeneous neighbor encoder, recurrent autoencoder aggregation network and matching network. The extensive experiments on two public datasets demonstrate that FSRL can outperform state-of-the-art baseline methods. In addition, the ablation studies verify the effectiveness of each model component. As a new research problem, there are many opportunities for the next steps. The future work might consider utilizing a better model training process such as model-agnostic meta-learning or incorporating contextual information such as entity attributes or text description to improve the quality of entity embeddings.


This work was supported by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 and the National Science Foundation awards #1925607, #1629914, #1652525, #1618448 and #1849816.


  • [Bordes et al.2013] Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In NIPS.
  • [Carlson et al.2010] Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka, E. R.; and Mitchell, T. M. 2010. Toward an architecture for never-ending language learning. In AAAI.
  • [Conneau et al.2017] Conneau, A.; Kiela, D.; Schwenk, H.; Barrault, L.; and Bordes, A. 2017. Supervised learning of universal sentence representations from natural language inference data. In EMNLP.
  • [Dettmers et al.2018] Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018. Convolutional 2d knowledge graph embeddings. In AAAI.
  • [Duan et al.2017] Duan, Y.; Andrychowicz, M.; Stadie, B.; Ho, O. J.; Schneider, J.; Sutskever, I.; Abbeel, P.; and Zaremba, W. 2017. One-shot imitation learning. In NIPS.
  • [Finn, Abbeel, and Levine2017] Finn, C.; Abbeel, P.; and Levine, S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML.
  • [Finn, Xu, and Levine2018] Finn, C.; Xu, K.; and Levine, S. 2018. Probabilistic model-agnostic meta-learning. In NIPS.
  • [Hamilton, Ying, and Leskovec2017] Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS.
  • [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
  • [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
  • [Kingma and Ba2015] Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. In ICLR.
  • [Koch, Zemel, and Salakhutdinov2015] Koch, G.; Zemel, R.; and Salakhutdinov, R. 2015. Siamese neural networks for one-shot image recognition. In

    ICML deep learning workshop

  • [Lee and Choi2018] Lee, Y., and Choi, S. 2018. Gradient-based meta-learning with learned layerwise metric and subspace. In ICML.
  • [Li et al.2017] Li, Z.; Zhou, F.; Chen, F.; and Li, H. 2017. Meta-sgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835.
  • [Li et al.2019] Li, Z.; Li, X.; Wei, Y.; Bing, L.; Zhang, Y.; and Yang, Q. 2019. Transferable end-to-end aspect-based sentiment analysis with selective adversarial learning. In EMNLP-IJCNLP.
  • [Mishra et al.2018] Mishra, N.; Rohaninejad, M.; Chen, X.; and Abbeel, P. 2018. A simple neural attentive meta-learner. ICLR.
  • [Mitchell et al.2018] Mitchell, T.; Cohen, W.; Hruschka, E.; Talukdar, P.; Yang, B.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; et al. 2018. Never-ending learning. Communications of the ACM 61(5):103–115.
  • [Nickel, Tresp, and Kriegel2011] Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011. A three-way model for collective learning on multi-relational data. In ICML.
  • [Ravi and Larochelle2016] Ravi, S., and Larochelle, H. 2016. Optimization as a model for few-shot learning. ICLR.
  • [Schlichtkrull et al.2018] Schlichtkrull, M.; Kipf, T. N.; Bloem, P.; Van Den Berg, R.; Titov, I.; and Welling, M. 2018. Modeling relational data with graph convolutional networks. In ESWC.
  • [Snell, Swersky, and Zemel2017] Snell, J.; Swersky, K.; and Zemel, R. 2017. Prototypical networks for few-shot learning. In NIPS.
  • [Socher et al.2013] Socher, R.; Chen, D.; Manning, C. D.; and Ng, A. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS.
  • [Suchanek, Kasneci, and Weikum2007] Suchanek, F. M.; Kasneci, G.; and Weikum, G. 2007. Yago: a core of semantic knowledge. In WWW.
  • [Trouillon et al.2016] Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In ICML.
  • [Vinyals et al.2016] Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D.; et al. 2016. Matching networks for one shot learning. In NIPS.
  • [Vrandečić and Krötzsch2014] Vrandečić, D., and Krötzsch, M. 2014. Wikidata: a free collaborative knowledge base. Communications of the ACM 57(10):78–85.
  • [Xiong et al.2018] Xiong, W.; Yu, M.; Chang, S.; Guo, X.; and Wang, W. Y. 2018. One-shot relational learning for knowledge graphs. In EMNLP.
  • [Yang et al.2015] Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In ICLR.
  • [Yang et al.2018] Yang, F. S. Y.; Zhang, L.; Xiang, T.; Torr, P. H.; and Hospedales, T. M. 2018. Learning to compare: Relation network for few-shot learning. In CVPR.
  • [Yao et al.2019a] Yao, H.; Liu, Y.; Wei, Y.; Tang, X.; and Li, Z. 2019a. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction. In WWW.
  • [Yao et al.2019b] Yao, H.; Wei, Y.; Huang, J.; and Li, Z. 2019b. Hierarchically structured meta-learning. In ICML.
  • [Zhang et al.2019] Zhang, C.; Song, D.; Huang, C.; Swami, A.; and Chawla, N. V. 2019. Heterogeneous graph neural network. In KDD.