Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph

Knowledge graph question answering (i.e., KGQA) based on information retrieval aims to answer a question by retrieving answer from a large-scale knowledge graph. Most existing methods first roughly retrieve the knowledge subgraphs (KSG) that may contain candidate answer, and then search for the exact answer in the subgraph. However, the coarsely retrieved KSG may contain thousands of candidate nodes since the knowledge graph involved in querying is often of large scale. To tackle this problem, we first propose to partition the retrieved KSG to several smaller sub-KSGs via a new subgraph partition algorithm and then present a graph-augmented learning to rank model to select the top-ranked sub-KSGs from them. Our proposed model combines a novel subgraph matching networks to capture global interactions in both question and subgraphs and an Enhanced Bilateral Multi-Perspective Matching model to capture local interactions. Finally, we apply an answer selection model on the full KSG and the top-ranked sub-KSGs respectively to validate the effectiveness of our proposed graph-augmented learning to rank method. The experimental results on multiple benchmark datasets have demonstrated the effectiveness of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

06/28/2018

Graphs without a partition into two proportionally dense subgraphs

A proportionally dense subgraph (PDS) is an induced subgraph of a graph ...
03/23/2021

Complex Factoid Question Answering with a Free-Text Knowledge Graph

We introduce DELFT, a factoid question answering system which combines t...
12/25/2019

Learning to Answer Ambiguous Questions with Knowledge Graph

In the task of factoid question answering over knowledge base, many ques...
04/21/2019

PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text

We consider open-domain queston answering (QA) where answers are drawn f...
06/11/2019

WikiDataSets : Standardized sub-graphs from WikiData

Developing new ideas and algorithms in the fields of graph processing an...
10/13/2020

Finding Minimum Connected Subgraphs with Ontology Exploration on Large RDF Data

In this paper, we study the following problem: given a knowledge graph (...
09/17/2021

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Existing KBQA approaches, despite achieving strong performance on i.i.d....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the rise of large-scale knowledge graphs (KG) such as DBpedia

(Auer et al., 2007) and Freebase (Bollacker et al., 2008), question answering over knowledge graph (KGQA) has attracted massive attention recently, which aims to leverage the factual information in a KG to answer natural language question. Depending on the complexity of question, KGQA can be divided into two forms: simple and complex. Simple KGQA often requires only one hop of factual knowledge, while complex ones require reasoning over a multi-hop knowledge subgraph (KSG) and selecting the correct answer among several candidate answers. In this paper, we focus on the latter, i.e., complex KGQA, which will be more challenging.

Currently, most KGQA approaches resort to semantic parsing (Berant et al., 2013; Yih et al., 2015; Dong and Lapata, 2018) or retrieve-then-extract methods (Yao and Van Durme, 2014; Bordes et al., 2014). Semantic parsing methods usually translate a natural language question to a KG query and then use it to query the KG directly. However, semantic parsing methods often rely on complex and specialised hand-crafted rules or schemas. In contrast, retrieve-then-extract methods are easier to understand and more interpretable. They first retrieve the KG coarsely to obtain a knowledge subgraph (KSG) containing answer candidates.Then, the target answer is extracted from the retrieved KSG. This paper follows the research idea of the retrieve-then-extract methods.

For most previous works (Bao et al., 2016; Sun et al., 2018; Lin et al., 2019; Yasunaga et al., 2021)

, they retrieve a knowledge subgraph from the original KG by choosing topic entities (e.g., KG entities mentioned in the given question) and their few-hop neighbors. However, since the KG is often of large volume and the initial retrieval process on it is coarse-grained and heuristic, the KSG retrieved by this method may still contain thousands of nodes and most of them are irrelevant to the given question, especially when the number of topic entities or hops significantly increases. The larger the KSG is, the more difficult it is to find the correct answer in it. To reduce the size of the KSG,

Sun et al. (2018) computed the similarity between the question and the relations around the topic entities and then used the personalized PageRank algorithm to select the most relevant relations. However, this method only considers the semantic similarity between the question and the relations while ignoring the structural information around each entity node. Saxena et al. (2020) directly computes knowledge embeddings on the whole retrieved KSG, which is computationally intensive.

Unlike some previous work that concentrated on improving the models’ ability to select answers on a large graph, we mainly focus on how to substantially reduce the size of the retrieved knowledge subgraph and ensure a high answer recall rate. Therefore, a more refined learning to rank model is required. To this end, we propose to partition the KSG into several sub-KSGs and use a learning to rank model to select the most relevant sub-KSGs to the given question. In this way, traditional text matching models (Yang et al., 2016; Wang et al., 2017; Talman et al., 2019; Devlin et al., 2018) can be used to compute the similarity score between a given question and a sub-KSG. However, these sequential based models often ignore the important structure information within the question and the sub-KSG.

To address the aforementioned issues, in this paper, we propose a new knowledge subgraph partition algorithm based on single source shortest path, which can partition a large-scale question-specific KSG to several sub-KSGs. Furthermore, we propose a novel graph-augmented learning to rank model (G-G-E) to select top-ranked sub-KSGs, which combines a novel subgraph matching networks based on Gated Graph Sequence Neural Networks (GGNNs) (Li et al., 2015) to capture global interactions between question and subgraphs, and an enhanced Bilateral Multi-Perspective Matching (BiMPM) model (Wang et al., 2017) to capture local interactions within parts of question and subgraphs. Finally, we apply one of the state-of-the-art (SOTA) KGQA answer selection model to the original complete KSG and the combined top-ranked sub-KSGs separately, and further demonstrate that reducing the size of the answer candidate subgraphs clearly helps to select correct answer effectively and efficiently. To evaluate our approach, we conduct extensive experiments on two benchmark datasets. The experimental results on the datasets have shown that our proposed model can significantly improve subgraph ranking performance compared to existing state-of-the-art methods.

In summary, the contributions of this paper can be summarized as follows:

  • We propose a new knowledge subgraph partition algorithm based on single source shortest path.

  • We propose a novel graph-augmented learning to rank model, which combines a novel subgraph matching networks based on GGNNs and an enhanced BiMPM model.

  • Our proposed graph-augmented learning to rank model outperforms a set of SOTA ranking models.

  • Further answer selection experiments on the original complete KSG and the combined top-ranked sub-KSGs demonstrate reducing the size of the answer candidate subgraphs can help improve the performance of answer selection.

Figure 1. An Example of Knowledge Subgraph Partition Algorithm. The areas surrounded by two dashed lines belong to two different sub-KSGs.

2. Knowledge Subgraph Partition

For better use of the ranking model, we need to partition the knowledge subgraph into several sub-KSGs. Intuitively, nodes and relationships within a sub-KSG should be more related to each other. As shown in Fig.1, “m.051cc” is the topic entity of the given question and nodes on the same path from topic entity node “m.051cc” should be partitioned in the same sub-KSG. In particular, entity nodes in this example graph are denoted by Freebase IDs (starting with “m.”). The first sub-KSG surrounded by the red dashed line is about the education information of topic entity “m.051cc”, which contains the true answer node “m.0gl5_”. The second sub-KSG surrounded by the green dashed line is about the namesake entity “m.076hxb3” of topic entity “m.051cc”. The second sub-KSG is also a confusing subgraph because it contains tokens like “education” and “school”, which are consistent with the context of the question. Therefore, the learning to rank model is expected to distinguish not only irrelevant sub-KSGs, but also confusing ones.

In order to partition related nodes in the same sub-KSG, we propose a knowledge subgraph partition algorithm detailed in Algorithm 1. Given a question and its answer entities , we first use the retrieval method proposed by (Sun et al., 2018) to obtain a question-specific KSG , which may contain thousands of answer candidate entities and relationships. Then, our proposed algorithm is adopted to partition the retrieved KSG into several sub-KSGs serving as inputs to the graph-augmented learning to rank model to select the most relevant sub-KSGs.

1Input: Question with its retrieved KSG and topic entity and answer entities
2 Build KSG as a directed graph ;
3 Set as the root node;
4 Find the shortest paths to all nodes with as the source node;
5 Set an empty set to save all partitioned sub-KSGs;
6 Set an empty set to save the match labels of the partitioned sub-KSGs;
7 for each path ( as target node) in  do
8       if  has child nodes and the child nodes of are all leaf nodes then
9             Partition the path from to as a sub-KSG ;
10             Add the child nodes of to ;
11             Set the match label of as 0;
12             for  in  do
13                  if exists path from to  then
14                        Set the match label as 1;
15                         break;
16            Add to ;
17             Add to ;
18            
Algorithm 1 KSG Partition

The above graph partition algorithm follows the intuition that the answer to the given question is usually found on a multi-hop path from the topic entity node. In order to keep the size of the sub-KSG moderate, we partition it starting from the node whose child nodes are all leaf nodes.

3. Graph-augmented Learning to Rank

Given a question and a set of sub-KSGs . We compute the ranking score representing the relevance of and for subgraph ranking. The overall model architecture is shown in Figure 2. Our model consists of a graph construction module for the input question and the input triples, a BiGGNN encoder capturing global interactions and an Enhanced BiMPM encoder capturing local interactions.

3.1. Graph Constructions

Question Graph.

Question graph is a directed graph constructed by the dependency parser from Stanford CoreNLP (Manning et al., 2014) 222https://stanfordnlp.github.io/CoreNLP. The dependency parsing graph represents the grammatical structure of the input question. Nodes in the dependency parsing graph are the tokens in the question and an edge indicates a modified relationship between two token nodes. In particular, we only use the connection information for the edges, not the labels for the edges.

Sub-Knowledge Subgraph.

A sub-KSG consists of a set of triples , where and denote the entity and relation set. Relation is regarded as an additional node. We assume there is a directed edge from subject node to , and another directed edge from to subject node . In the following sections, we will introduce how to calculate a relevant score between a question and a subgraph ( for short).

Figure 2. The Proposed G-G-E Model Architecture. The model contains two components: (1) A Subgraph Matching Networks component on the left (i.e., G-G in the figure); (2) An Enhanced BiMPM component on the right (i.e., EBiMPM in the figure).

3.2. Subgraph Matching Networks

To better exploit the global contextual information and the structural information, we expand GGNNs from uni-directional embedding to bi-directional embedding. Given a question graph or a sub-KSG , each node is initialized with its word embedding (e.g., average word embeddings for multi-token nodes). To calculate the representation of each node at layer

, the BiGGNN encoder first aggregate the information of neighbouring nodes to compute aggregation vectors:

(1)
(2)

where and denote the neighbours of with outgoing and ingoing edges. and

are trainable weight matrices. Then, a Gated Recurrent Unit (GRU)

(Cho et al., 2014) is used to update the node representation at layer based on the aggregation vectors and the node representation at previous layer:

(3)
(4)

After obtaining all node representations of the question graph

, max pooling is applied to compute the graph embedding

:

(5)

where is the node set and is the concatenation operator. is the maximum number of layers. By stacking layers, BiGGNN encoder is able to consider non-immediate neighbours. The concatenation representation of node is and the set of node representations is in dimension. The max pooling operation is applied on the first dimension and the graph embedding is .

3.3. Enhanced BiMPM

Bilateral Multi-Perspective Matching (BiMPM) is a strong text matching model due to its capacity of capturing the local interactions. To better learn local interactions for sentence between the question and the sub-KSG, we propose to add an attention layer and an enhanced representation layer on the basis of the original BiMPM model. Specifically, our proposed EBiMPM first uses a shared BiLSTM-based context representation layer to encode two input sequences to get two embeddings and , where and are the lengths of the input texts. Second, the newly-added attention layer applies a bi-directional attention mechanism between and . The attentive embedding of the i-th question token over is computed as:

(6)

Similarly, we can compute the attentive embedding of the i-th sub-KSG token over :

(7)

The attention layer outputs the attentive embeddings and . Third, the enhanced representation layer fuses and to get the improved question representation:

(8)

where

is a one-layer perceptron and

is the point-wise multiplication operation. Similarly, we can compute the enhanced subgraph representation :

(9)

Then, and are fed into the BiMPM matching layer (Wang et al., 2017) to get two sequences of matching vectors and , where is the number of perspectives. The matching layer defines four kinds of matching strategies to compare each time-step of one sequence against all time-steps of the other sequence from both forward and backward directions: (1) Full-Matching Each forward or backward hidden state in one sentence is compared with the last hidden state in the other sentence; (2) Maxpooling-Matching Each forward or backward hidden state in one sentence is compared with each hidden state in the other sentence and maxpooling is applied to obtain the maximum value of each dimension; (3) Attentive-Matching The attentive vector of each token in one sentence is computed over the other sentence and each hidden state is compared with its corresponding attentive vector; and (4) Max-Attentive-Matching

The hidden state in the other sentence with the highest cosine similarity is selected as the attentive vector of each token in this sentence.

Finally, [;] and [;] are regarded as inputs to a shared BiLSTM-based aggregation layer to get the final representation:

(10)
(11)

where denotes max pooling and denotes a BiLSTM-based aggregation layer.

3.4. Ranking Score Function

The representations of the question and the sub-KSG learned by the subgraph matching networks and EBiMPM are concatenated separately and inputted to a cosine similarity ranking score function:

(12)

At last, we take Mean Square Error (MSE) as the loss function:

(13)

where is the number of samples and is the label.

3.5. Answer Selection Model

After using the ranking model to obtain the top sub-KSGs, we merge them into a smaller graph compared to the original large KG graph and feed it into an answer selection model. In this paper, we use one of the state-of-the-art KGQA model GraftNet (Sun et al., 2018) as our answer selection model, which is a heterogeneous graph neural network model. To improve the overall performance, GraftNet also incorporates external Wikipedia knowledge and computes a PageRank (Haveliwala, 2003) score for each entity node. However, we only use the basic model of GraftNet as our answer selection model to better validate the effectiveness of our proposed graph-augmented learning to rank model. GraftNet performs a binary classification to select the answer:

(14)

where is the final nodes representation learned by GraftNet and

is the sigmoid function. The answer selection model is trained with binary cross-entropy loss, using the full KSG and the merged top-ranked sub-KSGs as input respectively.

4. Experiments

dataset train dev test # entities in KSG # sub-KSGs coverage rate
WebQSP 2848 250 1639 1429.8 1279.9 94.9%
CWQ 18391 2299 2299 95.9 50 95.7%
Table 1. Statistics information of the WebQSP dataset and the CWQ dataset.

4.1. Datasets

We conducted experiments on two multi-hop question answering datasets, i.e., WebQuestionsSP (WebQSP) (Yih et al., 2015) and ComplexWebQuestions (CWQ) (Talmor and Berant, 2018). Table 1 shows the statistical information of the datasets. For WebQSP, we use the partition algorithm to construct the sub-KSGs based on the data processed by (He et al., 2021), which follows the retrieval method proposed by (Sun et al., 2018). Because the WebQSP dataset is small, the train and dev matching datasets used for training phase are constructed by selecting a sub-KSG containing true answers and random sampling 20 sub-KSGs for each example. For the test dataset, each example contains a natural language question and all partitioned sub-KSGs. The model computes a ranking score for each (question, sub-KSG) pair. As shown in Table 1, the average number of entities in each KSG is 1429.9 and each KSG produces an average of 1279.9 sub-KSGs after the partition process. The coverage rate means that about 94.9% of examples in the dataset can find answers in their corresponding KSGs.

For CWQ, we use the preprocessed datasets released by Kumar et al. (2019). Each sample contains a question, a subgraph from which the question is derived and a set of answer entities. The CWQ dataset contains 22989 matched (question, subgraph) pairs. The division ratio of train set, dev set and test set is 8:1:1. For the train set and the dev set, we produce the same number of negative examples as the positive ones. For each question, we select a confusion-prone subgraph from the training subgraph set that is similar to the matched subgraph but contains no answer nodes as a negative sample. TF-IDF is used to compute the similarity of the text of two subgraphs. For the test dataset used for ranking evaluation, it consists of a matched subgraph and 49 unmatched subgraphs which are similar to the matched one. Therefore, the average number of sub-KSG (subgraph) for the CWQ dataset is 50. We merge these 50 sub-KSGs (subgraphs) to form a pseudo KSG for each example. As shown in Table 1, the average number of entities in a pseudo KSG is 95.9 and the coverage rate of the test dataset is 95.7%.

4.2. Models and Metrics

In the next experiments, our proposed BiGGNN-BiGGNN-EBiMPM (G-G-E) model is compared with the following baselines:

  • BiMPM (Wang et al., 2017): an LSTM-based model for text matching;

  • EBiMPM: adding an attention layer and an enhanced representation layer on BiMPM.

  • BERT (Devlin et al., 2018): a shared BERT model to encode the question sequence and the subgraph triples sequence;

  • BiGGNN-BiGGNN (G-G): both question graph and sub-KSG are encoded by a BiGGNN respectively;

To evaluate the graph-augmented learning to rank model, we use Recall@K (R@K) and Mean Reciprocal Rank (MRR) as the evaluation metrics. Recall@K is the proportion of examples that can find sub-KSGs containing answers in the top-K sub-KSGs. Mean reciprocal rank is the average of the reciprocal ranks of the sub-KSGs containing answers. Furthermore, we use Hits, precision, recall and F1 to evaluate whether reducing the size of the KSG is beneficial to the subsequent answer selection model. Hits is the proportion of examples where GraftNet can select answer nodes in the subgraph merging the top-K sub-KSGs.

4.3. Experimental Settings

Our proposed model are implemented by MatchZoo-py (Guo et al., 2019) and Graph4NLP (Wu et al., 2021). We use Adam (Kingma and Ba, 2014) optimization with an initial learning rate 0.0005. The batch size is 64 for CWQ and is 50 for WebQSP. Word embeddings are initialized with 300-dimensional pretrained GloVe (Pennington et al., 2014) embeddings333http://nlp.stanford.edu/data/glove.840B.300d.zip

. BiGGNN encoder is stacked to 2-layer. Early stopping is introduced during the training phase and the validation set is evaluated every epoch. All models use cosine similarity as ranking score function. All experiments are run on Tesla V100.

[1pt] Dataset WebQSP CWQ
Model MRR R@1 R@10 R@100 R@200 R@300 MRR R@1 R@10 R@20
BiMPM 0.6118 0.5308 0.7663 0.8816 0.9029 0.9121 0.6802 0.5698 0.9056 0.9652
EBiMPM 0.6612 0.5954 0.7803 0.8798 0.8993 0.9090 0.7074 0.6089 0.9064 0.9643
BERT 0.6817 0.6186 0.7895 0.8852 0.9048 0.9139 0.7361 0.6642 0.8838 0.9508
G-G 0.6867 0.6320 0.7895 0.8828 0.9048 0.9176 0.7123 0.6372 0.8708 0.9395
G-G-E 0.6967 0.6430 0.7968 0.8913 0.9133 0.9237 0.7540 0.6746 0.9234 0.9669
[1pt]
Table 2. Ranking Experimental Results. Bold fonts indicates the best results.
[1pt] Dataset WebQSP CWQ
Data Hits Precision Recall F1 Data Hits Precision Recall F1
top 100 0.6043 0.6041 0.5824 0.5128 top 10 0.4236 0.5296 0.4105 0.3274
top 200 0.5975 0.6561 0.5858 0.5355 top 20 0.4000 0.5149 0.3772 0.2923
top 300 0.6049 0.62 0.6387 0.5495 full 0.3956 0.5673 0.3389 0.2736
full 0.579 0.5738 0.6245 0.5219
[1pt]
Table 3. Answer selection results on WebQSP and CWQ.

4.4. Results Analysis

Table 2 shows the ranking performance on two datasets. In particular, the upper limit of Recall@K is 100% rather than the coverage rate because we eliminate examples for which we can not find an answer. It can be seen that our proposed full model G-G-E consistently outperforms other baselines on all datasets, including the BERT model. To guarantee a high answer recall for the merged subgraph, we are more concerned about Recall@K than Recall@1, especially when K is large. Our proposed G-G-E model is 0.6 to 1 percentage point higher than the best baseline models for metrics Recall@100, Recall@200 and Recall@300 in dataset WebQSP. In the dataset CWQ, the Recall@10 of the G-G-E model also improved by 1.7% compared to the best baseline model. Moreover, on the WebQSP dataset, G-G is significantly better than BiMPM, increasing by 0.07 on MRR and 0.1 on Recall@1 respectively, which indicates the graph structure information plays a more important role on this dataset.

To further validate that reducing the size of KSG help improve the performance of answer selection, we merge the top 100, 200 and 300 sub-KSGs of the WebQSP dataset and the top 10, 20 sub-KSGs of the CWQ dataset. The experimental results are shown in Table 3. For WebQSP, the answer selection model performs best on the top-300 merged subgraph, increasing by 0.026 on Hits and 0.027 on F1. The top-300 merged subgraph is almost a third of the size of the original full KSG, which contains an average of 1280 sub-KSGs. The improvements also verify the effectiveness of our proposed partition algorithm. For CWQ, the answer selection model performs best on the top-10 merged subgraph, increasing by 2.8% on Hits and 5.4% on F1. The top-10 merged subgraph is a fifth of the size of the full KSG. In general, by using our proposed partition algorithm and graph-augmented learning to rank model, we can further reduce the size of the KSG, while ensuring the answer recall rate.

[1pt] Question: what artistic movement did m.0gct_ belong to ?
Mispredicted Subgraph: (m.0gct_ , influence_influence_node_influenced_by, m.0160zv)
(m.0160zv, visual_art_visual_artist_associated_periods_or_movements , m.0160zb)
Ground Truth Subgraph: (m.0gct_, visual_art_visual_artist_associated_periods_or_
movements, m.049xrv)
Question: who did m.01ps2h8 play in lord of the rings ?
Mispredicted Subgraph: (m.01ps2h8, film_actor_film, m.0k5s9k), (m.0k5s9k, film_
performance_film, m.017gl1)
Ground Truth Subgraph: (m.01ps2h8, film_actor_film m.0k5sfk), (m.0k5sfk, film_
performance_character, m.0gwlg)
[1pt]
Table 4. Examples of mispredicted KG subgraphs by the G-G-E model on the WebQSP dataset.

4.5. Ablation Study

We conduct an ablation study to investigate the contribution of each component to the proposed model. As shown in Table 2, we evaluate models with only graph neural network encoder (G-G) and with only sequence encoder (EBiMPM), respectively. The performance gain of G-G-E model compared to G-G and EBiMPM can empirically demonstrate the effectiveness of combining the two encoders for capturing both global and local interactions between the question and the knowledge subgraph.

4.6. Case Study and Error Analysis

To investigate the limitations of the graph-augmented learning to rank model, we manually checked the sub-KSGs that were incorrectly considered as containing answers in Table 4. The topic entity in the question and the entities in the subgraph are replaced by their Freebase ID. The first mispredicted subgraph contains a redundant hop “influence_influence_node_influenced_by”. This may because our model ignores the number of hops of the question. The second example fails to map “play” in the question to the relation “film_performance_character”. It confuses the model because the mispredicted subgraph is very similar to the real one.

5. Related Work

5.1. Knowledge Graph Question Answering

With the rapid development of large-scale knowledge graphs (KG) such as DBpedia (Auer et al., 2007) and Freebase (Bollacker et al., 2008), question answering over knowledge graph has attracted widespread attention from a growing number of researchers. However, due to the large volume of the knowledge graph, using the knowledge in it to answer questions is a challenging task. In general, Knowledge Graph Question Answering has two mainstream research methods, namely semantic parsing based methods and retrieve-then-extract methods.

Semantic parsing based methods

Semantic parsing based methods convert natural language questions to knowledge base readable queries. Lan et al. (2021) summarises these methods in the following steps: (1) Using a Question Understanding module to analyze questions semantically and syntactically. Common question analysis techniques include dependency parsing (Abujabal et al., 2017), AMR parsing (Kapanipathi et al., 2020) and skeleton parsing (Sun et al., 2020). (2) Using a Logical Parsing module to convert the question embedding into an uninstantiated logic form. This module creates a syntactic representation of the question such as template based queries (Bast and Haussmann, 2015) and query graphs (Hu et al., 2018). (3) Using a KB Grounding module to align the logic form to KB (Bhutani et al., 2019; Chen et al., 2019a). The logical query obtained from the above steps can be searched directly in KB to find the final answer. These approaches usually require the design of specific rules in the second step mentioned above and limit the domains where such approaches can be applied.

Retrieve-then-extract methods

Retrieve-then-extract methods are also known as information retrieval based methods. These methods first retrieve the KG coarsely to obtain a knowledge subgraph containing answer candidates and then extract answers from the retrieved subgraph. Bordes et al. (2014) first proposed a subgraph retreival method and a subgraph embedding model which can score every candidate answer. In the following work, Miller et al. (2016) adopted a memory table to store KB facts encoded into key-value pairs. Sun et al. (2018) proposed a graph neural network model to perform multi-hop reasoning on heterogeneous graphs. PullNet (Sun et al., 2019) improved the graph retrieval module by iteratively expanding the question-specific subgraph. BAMnet (Chen et al., 2019b) modeled the bidirectional flow of interactions between the questions and the KB using an attentive memory network. EmbedKGQA (Saxena et al., 2020) directly matched pretrained entity KG embeddings with question embedding, which is computationally intensive.

5.2. Learning to Rank

Modern search engines often leverages a multi-level cascade structure including a stack of ranking models with the aim of providing users with a ranked list of results efficiently. The multi-level structure begins with an efficient high-recall ranker, which aims at retrieving as many possible correct answers as possible. The goal of this ranker is consistent with the coarse-grained retrieval module in information retrieval based KGQA. However, the learning to rank model is often regarded as the last ranker of the multi-level structure, which re-ranks a small set of documents retrieved by the previous rankers.

Over the past few years, learning to rank models have already achieved great success in the information retrieval area. Traditional learning to rank models rely on hand-crafted features, which are often time-consuming to design. Recently, many ranking models based on neural networks have emerged. Deep Structured Semantic Model (DSSM) (Huang et al., 2013) in 2013 is the first neural network ranking model using fully connected neural networks. In this same year, DeepMatch (Lu and Li, 2013) was proposed to model the complex matching relations between responses to a tweet, or relevant answers to a given question. (Hu et al., 2014)

proposed convolutional neural network models ARC-I and ARC-II for matching two sentences. Since 2016, researches on neural ranking models have surged, with more related work, deeper discussions and wider applications.

(Wang and Jiang, 2016) proposed a model based on match-LSTM and Pointer Net (Vinyals et al., 2015). ANMM (Yang et al., 2016) is an attention based neural matching model combining different matching signals for ranking short answer text. (Wang et al., 2017)

proposed the BiMPM model under the ”matching-aggregation” framework to match the sentences from multiple perspectives. With the development of pretrained language models such as BERT

(Devlin et al., 2018), the performance of neural ranking models is taken to a next level. These neural ranking models have limitations when applied to information retrieval based KGQA because the inputs are considered as raw text sequences and the structural information in the KG is ignored.

6. Conclusions

In the information retrieval based KGQA, this paper focuses on a subgraph ranking task with the aim of reducing the size of the coarsely retrieved knowledge subgraph and capturing both local and global interactions between question and sub-KSGs. We propose a KSG partition algorithm and a graph-augmented learning to rank model to match-then-rank them. We further validate that reducing the size of knowledge subgraph is beneficial to the subsequent answer selection in an information retrieval based KGQA process. The experimental results on multiple datasets demonstrate the effectiveness of our model. In the future, we will further explore a more effective answer selection model over the small-scale knowledge subgraph selected by our learning to rank model.

Acknowledgements.
The work is partially supported by the National Nature Science Foundation of China (No. 61976160), Technology research plan project of Ministry of Public and Security (Grant No. 2020JSYJD01), Thirteen Five-year Research Planning Project of National Language Committee (No. YB135-149) and the Self-determined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE (No. CCNU20ZT012, CCNU20TD001).

References