With the rise of large-scale knowledge graphs (KG) such as DBpedia(Auer et al., 2007) and Freebase (Bollacker et al., 2008), question answering over knowledge graph (KGQA) has attracted massive attention recently, which aims to leverage the factual information in a KG to answer natural language question. Depending on the complexity of question, KGQA can be divided into two forms: simple and complex. Simple KGQA often requires only one hop of factual knowledge, while complex ones require reasoning over a multi-hop knowledge subgraph (KSG) and selecting the correct answer among several candidate answers. In this paper, we focus on the latter, i.e., complex KGQA, which will be more challenging.
Currently, most KGQA approaches resort to semantic parsing (Berant et al., 2013; Yih et al., 2015; Dong and Lapata, 2018) or retrieve-then-extract methods (Yao and Van Durme, 2014; Bordes et al., 2014). Semantic parsing methods usually translate a natural language question to a KG query and then use it to query the KG directly. However, semantic parsing methods often rely on complex and specialised hand-crafted rules or schemas. In contrast, retrieve-then-extract methods are easier to understand and more interpretable. They first retrieve the KG coarsely to obtain a knowledge subgraph (KSG) containing answer candidates.Then, the target answer is extracted from the retrieved KSG. This paper follows the research idea of the retrieve-then-extract methods.
, they retrieve a knowledge subgraph from the original KG by choosing topic entities (e.g., KG entities mentioned in the given question) and their few-hop neighbors. However, since the KG is often of large volume and the initial retrieval process on it is coarse-grained and heuristic, the KSG retrieved by this method may still contain thousands of nodes and most of them are irrelevant to the given question, especially when the number of topic entities or hops significantly increases. The larger the KSG is, the more difficult it is to find the correct answer in it. To reduce the size of the KSG,Sun et al. (2018) computed the similarity between the question and the relations around the topic entities and then used the personalized PageRank algorithm to select the most relevant relations. However, this method only considers the semantic similarity between the question and the relations while ignoring the structural information around each entity node. Saxena et al. (2020) directly computes knowledge embeddings on the whole retrieved KSG, which is computationally intensive.
Unlike some previous work that concentrated on improving the models’ ability to select answers on a large graph, we mainly focus on how to substantially reduce the size of the retrieved knowledge subgraph and ensure a high answer recall rate. Therefore, a more refined learning to rank model is required. To this end, we propose to partition the KSG into several sub-KSGs and use a learning to rank model to select the most relevant sub-KSGs to the given question. In this way, traditional text matching models (Yang et al., 2016; Wang et al., 2017; Talman et al., 2019; Devlin et al., 2018) can be used to compute the similarity score between a given question and a sub-KSG. However, these sequential based models often ignore the important structure information within the question and the sub-KSG.
To address the aforementioned issues, in this paper, we propose a new knowledge subgraph partition algorithm based on single source shortest path, which can partition a large-scale question-specific KSG to several sub-KSGs. Furthermore, we propose a novel graph-augmented learning to rank model (G-G-E) to select top-ranked sub-KSGs, which combines a novel subgraph matching networks based on Gated Graph Sequence Neural Networks (GGNNs) (Li et al., 2015) to capture global interactions between question and subgraphs, and an enhanced Bilateral Multi-Perspective Matching (BiMPM) model (Wang et al., 2017) to capture local interactions within parts of question and subgraphs. Finally, we apply one of the state-of-the-art (SOTA) KGQA answer selection model to the original complete KSG and the combined top-ranked sub-KSGs separately, and further demonstrate that reducing the size of the answer candidate subgraphs clearly helps to select correct answer effectively and efficiently. To evaluate our approach, we conduct extensive experiments on two benchmark datasets. The experimental results on the datasets have shown that our proposed model can significantly improve subgraph ranking performance compared to existing state-of-the-art methods.
In summary, the contributions of this paper can be summarized as follows:
We propose a new knowledge subgraph partition algorithm based on single source shortest path.
We propose a novel graph-augmented learning to rank model, which combines a novel subgraph matching networks based on GGNNs and an enhanced BiMPM model.
Our proposed graph-augmented learning to rank model outperforms a set of SOTA ranking models.
Further answer selection experiments on the original complete KSG and the combined top-ranked sub-KSGs demonstrate reducing the size of the answer candidate subgraphs can help improve the performance of answer selection.
2. Knowledge Subgraph Partition
For better use of the ranking model, we need to partition the knowledge subgraph into several sub-KSGs. Intuitively, nodes and relationships within a sub-KSG should be more related to each other. As shown in Fig.1, “m.051cc” is the topic entity of the given question and nodes on the same path from topic entity node “m.051cc” should be partitioned in the same sub-KSG. In particular, entity nodes in this example graph are denoted by Freebase IDs (starting with “m.”). The first sub-KSG surrounded by the red dashed line is about the education information of topic entity “m.051cc”, which contains the true answer node “m.0gl5_”. The second sub-KSG surrounded by the green dashed line is about the namesake entity “m.076hxb3” of topic entity “m.051cc”. The second sub-KSG is also a confusing subgraph because it contains tokens like “education” and “school”, which are consistent with the context of the question. Therefore, the learning to rank model is expected to distinguish not only irrelevant sub-KSGs, but also confusing ones.
In order to partition related nodes in the same sub-KSG, we propose a knowledge subgraph partition algorithm detailed in Algorithm 1. Given a question and its answer entities , we first use the retrieval method proposed by (Sun et al., 2018) to obtain a question-specific KSG , which may contain thousands of answer candidate entities and relationships. Then, our proposed algorithm is adopted to partition the retrieved KSG into several sub-KSGs serving as inputs to the graph-augmented learning to rank model to select the most relevant sub-KSGs.
The above graph partition algorithm follows the intuition that the answer to the given question is usually found on a multi-hop path from the topic entity node. In order to keep the size of the sub-KSG moderate, we partition it starting from the node whose child nodes are all leaf nodes.
3. Graph-augmented Learning to Rank
Given a question and a set of sub-KSGs . We compute the ranking score representing the relevance of and for subgraph ranking. The overall model architecture is shown in Figure 2. Our model consists of a graph construction module for the input question and the input triples, a BiGGNN encoder capturing global interactions and an Enhanced BiMPM encoder capturing local interactions.
3.1. Graph Constructions
Question graph is a directed graph constructed by the dependency parser from Stanford CoreNLP (Manning et al., 2014) 222https://stanfordnlp.github.io/CoreNLP. The dependency parsing graph represents the grammatical structure of the input question. Nodes in the dependency parsing graph are the tokens in the question and an edge indicates a modified relationship between two token nodes. In particular, we only use the connection information for the edges, not the labels for the edges.
A sub-KSG consists of a set of triples , where and denote the entity and relation set. Relation is regarded as an additional node. We assume there is a directed edge from subject node to , and another directed edge from to subject node . In the following sections, we will introduce how to calculate a relevant score between a question and a subgraph ( for short).
3.2. Subgraph Matching Networks
To better exploit the global contextual information and the structural information, we expand GGNNs from uni-directional embedding to bi-directional embedding. Given a question graph or a sub-KSG , each node is initialized with its word embedding (e.g., average word embeddings for multi-token nodes). To calculate the representation of each node at layer
, the BiGGNN encoder first aggregate the information of neighbouring nodes to compute aggregation vectors:
where and denote the neighbours of with outgoing and ingoing edges. and
are trainable weight matrices. Then, a Gated Recurrent Unit (GRU)(Cho et al., 2014) is used to update the node representation at layer based on the aggregation vectors and the node representation at previous layer:
After obtaining all node representations of the question graph
, max pooling is applied to compute the graph embedding:
where is the node set and is the concatenation operator. is the maximum number of layers. By stacking layers, BiGGNN encoder is able to consider non-immediate neighbours. The concatenation representation of node is and the set of node representations is in dimension. The max pooling operation is applied on the first dimension and the graph embedding is .
3.3. Enhanced BiMPM
Bilateral Multi-Perspective Matching (BiMPM) is a strong text matching model due to its capacity of capturing the local interactions. To better learn local interactions for sentence between the question and the sub-KSG, we propose to add an attention layer and an enhanced representation layer on the basis of the original BiMPM model. Specifically, our proposed EBiMPM first uses a shared BiLSTM-based context representation layer to encode two input sequences to get two embeddings and , where and are the lengths of the input texts. Second, the newly-added attention layer applies a bi-directional attention mechanism between and . The attentive embedding of the i-th question token over is computed as:
Similarly, we can compute the attentive embedding of the i-th sub-KSG token over :
The attention layer outputs the attentive embeddings and . Third, the enhanced representation layer fuses and to get the improved question representation:
is a one-layer perceptron andis the point-wise multiplication operation. Similarly, we can compute the enhanced subgraph representation :
Then, and are fed into the BiMPM matching layer (Wang et al., 2017) to get two sequences of matching vectors and , where is the number of perspectives. The matching layer defines four kinds of matching strategies to compare each time-step of one sequence against all time-steps of the other sequence from both forward and backward directions: (1) Full-Matching Each forward or backward hidden state in one sentence is compared with the last hidden state in the other sentence; (2) Maxpooling-Matching Each forward or backward hidden state in one sentence is compared with each hidden state in the other sentence and maxpooling is applied to obtain the maximum value of each dimension; (3) Attentive-Matching The attentive vector of each token in one sentence is computed over the other sentence and each hidden state is compared with its corresponding attentive vector; and (4) Max-Attentive-Matching
The hidden state in the other sentence with the highest cosine similarity is selected as the attentive vector of each token in this sentence.
Finally, [;] and [;] are regarded as inputs to a shared BiLSTM-based aggregation layer to get the final representation:
where denotes max pooling and denotes a BiLSTM-based aggregation layer.
3.4. Ranking Score Function
The representations of the question and the sub-KSG learned by the subgraph matching networks and EBiMPM are concatenated separately and inputted to a cosine similarity ranking score function:
At last, we take Mean Square Error (MSE) as the loss function:
where is the number of samples and is the label.
3.5. Answer Selection Model
After using the ranking model to obtain the top sub-KSGs, we merge them into a smaller graph compared to the original large KG graph and feed it into an answer selection model. In this paper, we use one of the state-of-the-art KGQA model GraftNet (Sun et al., 2018) as our answer selection model, which is a heterogeneous graph neural network model. To improve the overall performance, GraftNet also incorporates external Wikipedia knowledge and computes a PageRank (Haveliwala, 2003) score for each entity node. However, we only use the basic model of GraftNet as our answer selection model to better validate the effectiveness of our proposed graph-augmented learning to rank model. GraftNet performs a binary classification to select the answer:
where is the final nodes representation learned by GraftNet and
is the sigmoid function. The answer selection model is trained with binary cross-entropy loss, using the full KSG and the merged top-ranked sub-KSGs as input respectively.
|dataset||train||dev||test||# entities in KSG||# sub-KSGs||coverage rate|
We conducted experiments on two multi-hop question answering datasets, i.e., WebQuestionsSP (WebQSP) (Yih et al., 2015) and ComplexWebQuestions (CWQ) (Talmor and Berant, 2018). Table 1 shows the statistical information of the datasets. For WebQSP, we use the partition algorithm to construct the sub-KSGs based on the data processed by (He et al., 2021), which follows the retrieval method proposed by (Sun et al., 2018). Because the WebQSP dataset is small, the train and dev matching datasets used for training phase are constructed by selecting a sub-KSG containing true answers and random sampling 20 sub-KSGs for each example. For the test dataset, each example contains a natural language question and all partitioned sub-KSGs. The model computes a ranking score for each (question, sub-KSG) pair. As shown in Table 1, the average number of entities in each KSG is 1429.9 and each KSG produces an average of 1279.9 sub-KSGs after the partition process. The coverage rate means that about 94.9% of examples in the dataset can find answers in their corresponding KSGs.
For CWQ, we use the preprocessed datasets released by Kumar et al. (2019). Each sample contains a question, a subgraph from which the question is derived and a set of answer entities. The CWQ dataset contains 22989 matched (question, subgraph) pairs. The division ratio of train set, dev set and test set is 8:1:1. For the train set and the dev set, we produce the same number of negative examples as the positive ones. For each question, we select a confusion-prone subgraph from the training subgraph set that is similar to the matched subgraph but contains no answer nodes as a negative sample. TF-IDF is used to compute the similarity of the text of two subgraphs. For the test dataset used for ranking evaluation, it consists of a matched subgraph and 49 unmatched subgraphs which are similar to the matched one. Therefore, the average number of sub-KSG (subgraph) for the CWQ dataset is 50. We merge these 50 sub-KSGs (subgraphs) to form a pseudo KSG for each example. As shown in Table 1, the average number of entities in a pseudo KSG is 95.9 and the coverage rate of the test dataset is 95.7%.
4.2. Models and Metrics
In the next experiments, our proposed BiGGNN-BiGGNN-EBiMPM (G-G-E) model is compared with the following baselines:
BiMPM (Wang et al., 2017): an LSTM-based model for text matching;
EBiMPM: adding an attention layer and an enhanced representation layer on BiMPM.
BERT (Devlin et al., 2018): a shared BERT model to encode the question sequence and the subgraph triples sequence;
BiGGNN-BiGGNN (G-G): both question graph and sub-KSG are encoded by a BiGGNN respectively;
To evaluate the graph-augmented learning to rank model, we use Recall@K (R@K) and Mean Reciprocal Rank (MRR) as the evaluation metrics. Recall@K is the proportion of examples that can find sub-KSGs containing answers in the top-K sub-KSGs. Mean reciprocal rank is the average of the reciprocal ranks of the sub-KSGs containing answers. Furthermore, we use Hits, precision, recall and F1 to evaluate whether reducing the size of the KSG is beneficial to the subsequent answer selection model. Hits is the proportion of examples where GraftNet can select answer nodes in the subgraph merging the top-K sub-KSGs.
4.3. Experimental Settings
Our proposed model are implemented by MatchZoo-py (Guo et al., 2019) and Graph4NLP (Wu et al., 2021). We use Adam (Kingma and Ba, 2014) optimization with an initial learning rate 0.0005. The batch size is 64 for CWQ and is 50 for WebQSP. Word embeddings are initialized with 300-dimensional pretrained GloVe (Pennington et al., 2014) embeddings333http://nlp.stanford.edu/data/glove.840B.300d.zip
. BiGGNN encoder is stacked to 2-layer. Early stopping is introduced during the training phase and the validation set is evaluated every epoch. All models use cosine similarity as ranking score function. All experiments are run on Tesla V100.
|top 100||0.6043||0.6041||0.5824||0.5128||top 10||0.4236||0.5296||0.4105||0.3274|
|top 200||0.5975||0.6561||0.5858||0.5355||top 20||0.4000||0.5149||0.3772||0.2923|
4.4. Results Analysis
Table 2 shows the ranking performance on two datasets. In particular, the upper limit of Recall@K is 100% rather than the coverage rate because we eliminate examples for which we can not find an answer. It can be seen that our proposed full model G-G-E consistently outperforms other baselines on all datasets, including the BERT model. To guarantee a high answer recall for the merged subgraph, we are more concerned about Recall@K than Recall@1, especially when K is large. Our proposed G-G-E model is 0.6 to 1 percentage point higher than the best baseline models for metrics Recall@100, Recall@200 and Recall@300 in dataset WebQSP. In the dataset CWQ, the Recall@10 of the G-G-E model also improved by 1.7% compared to the best baseline model. Moreover, on the WebQSP dataset, G-G is significantly better than BiMPM, increasing by 0.07 on MRR and 0.1 on Recall@1 respectively, which indicates the graph structure information plays a more important role on this dataset.
To further validate that reducing the size of KSG help improve the performance of answer selection, we merge the top 100, 200 and 300 sub-KSGs of the WebQSP dataset and the top 10, 20 sub-KSGs of the CWQ dataset. The experimental results are shown in Table 3. For WebQSP, the answer selection model performs best on the top-300 merged subgraph, increasing by 0.026 on Hits and 0.027 on F1. The top-300 merged subgraph is almost a third of the size of the original full KSG, which contains an average of 1280 sub-KSGs. The improvements also verify the effectiveness of our proposed partition algorithm. For CWQ, the answer selection model performs best on the top-10 merged subgraph, increasing by 2.8% on Hits and 5.4% on F1. The top-10 merged subgraph is a fifth of the size of the full KSG. In general, by using our proposed partition algorithm and graph-augmented learning to rank model, we can further reduce the size of the KSG, while ensuring the answer recall rate.
|[1pt] Question: what artistic movement did m.0gct_ belong to ?|
|Mispredicted Subgraph: (m.0gct_ , influence_influence_node_influenced_by, m.0160zv)|
|(m.0160zv, visual_art_visual_artist_associated_periods_or_movements , m.0160zb)|
|Ground Truth Subgraph: (m.0gct_, visual_art_visual_artist_associated_periods_or_|
|Question: who did m.01ps2h8 play in lord of the rings ?|
|Mispredicted Subgraph: (m.01ps2h8, film_actor_film, m.0k5s9k), (m.0k5s9k, film_|
|Ground Truth Subgraph: (m.01ps2h8, film_actor_film m.0k5sfk), (m.0k5sfk, film_|
4.5. Ablation Study
We conduct an ablation study to investigate the contribution of each component to the proposed model. As shown in Table 2, we evaluate models with only graph neural network encoder (G-G) and with only sequence encoder (EBiMPM), respectively. The performance gain of G-G-E model compared to G-G and EBiMPM can empirically demonstrate the effectiveness of combining the two encoders for capturing both global and local interactions between the question and the knowledge subgraph.
4.6. Case Study and Error Analysis
To investigate the limitations of the graph-augmented learning to rank model, we manually checked the sub-KSGs that were incorrectly considered as containing answers in Table 4. The topic entity in the question and the entities in the subgraph are replaced by their Freebase ID. The first mispredicted subgraph contains a redundant hop “influence_influence_node_influenced_by”. This may because our model ignores the number of hops of the question. The second example fails to map “play” in the question to the relation “film_performance_character”. It confuses the model because the mispredicted subgraph is very similar to the real one.
5. Related Work
5.1. Knowledge Graph Question Answering
With the rapid development of large-scale knowledge graphs (KG) such as DBpedia (Auer et al., 2007) and Freebase (Bollacker et al., 2008), question answering over knowledge graph has attracted widespread attention from a growing number of researchers. However, due to the large volume of the knowledge graph, using the knowledge in it to answer questions is a challenging task. In general, Knowledge Graph Question Answering has two mainstream research methods, namely semantic parsing based methods and retrieve-then-extract methods.
Semantic parsing based methods
Semantic parsing based methods convert natural language questions to knowledge base readable queries. Lan et al. (2021) summarises these methods in the following steps: (1) Using a Question Understanding module to analyze questions semantically and syntactically. Common question analysis techniques include dependency parsing (Abujabal et al., 2017), AMR parsing (Kapanipathi et al., 2020) and skeleton parsing (Sun et al., 2020). (2) Using a Logical Parsing module to convert the question embedding into an uninstantiated logic form. This module creates a syntactic representation of the question such as template based queries (Bast and Haussmann, 2015) and query graphs (Hu et al., 2018). (3) Using a KB Grounding module to align the logic form to KB (Bhutani et al., 2019; Chen et al., 2019a). The logical query obtained from the above steps can be searched directly in KB to find the final answer. These approaches usually require the design of specific rules in the second step mentioned above and limit the domains where such approaches can be applied.
Retrieve-then-extract methods are also known as information retrieval based methods. These methods first retrieve the KG coarsely to obtain a knowledge subgraph containing answer candidates and then extract answers from the retrieved subgraph. Bordes et al. (2014) first proposed a subgraph retreival method and a subgraph embedding model which can score every candidate answer. In the following work, Miller et al. (2016) adopted a memory table to store KB facts encoded into key-value pairs. Sun et al. (2018) proposed a graph neural network model to perform multi-hop reasoning on heterogeneous graphs. PullNet (Sun et al., 2019) improved the graph retrieval module by iteratively expanding the question-specific subgraph. BAMnet (Chen et al., 2019b) modeled the bidirectional flow of interactions between the questions and the KB using an attentive memory network. EmbedKGQA (Saxena et al., 2020) directly matched pretrained entity KG embeddings with question embedding, which is computationally intensive.
5.2. Learning to Rank
Modern search engines often leverages a multi-level cascade structure including a stack of ranking models with the aim of providing users with a ranked list of results efficiently. The multi-level structure begins with an efficient high-recall ranker, which aims at retrieving as many possible correct answers as possible. The goal of this ranker is consistent with the coarse-grained retrieval module in information retrieval based KGQA. However, the learning to rank model is often regarded as the last ranker of the multi-level structure, which re-ranks a small set of documents retrieved by the previous rankers.
Over the past few years, learning to rank models have already achieved great success in the information retrieval area. Traditional learning to rank models rely on hand-crafted features, which are often time-consuming to design. Recently, many ranking models based on neural networks have emerged. Deep Structured Semantic Model (DSSM) (Huang et al., 2013) in 2013 is the first neural network ranking model using fully connected neural networks. In this same year, DeepMatch (Lu and Li, 2013) was proposed to model the complex matching relations between responses to a tweet, or relevant answers to a given question. (Hu et al., 2014)
proposed convolutional neural network models ARC-I and ARC-II for matching two sentences. Since 2016, researches on neural ranking models have surged, with more related work, deeper discussions and wider applications.(Wang and Jiang, 2016) proposed a model based on match-LSTM and Pointer Net (Vinyals et al., 2015). ANMM (Yang et al., 2016) is an attention based neural matching model combining different matching signals for ranking short answer text. (Wang et al., 2017)
proposed the BiMPM model under the ”matching-aggregation” framework to match the sentences from multiple perspectives. With the development of pretrained language models such as BERT(Devlin et al., 2018), the performance of neural ranking models is taken to a next level. These neural ranking models have limitations when applied to information retrieval based KGQA because the inputs are considered as raw text sequences and the structural information in the KG is ignored.
In the information retrieval based KGQA, this paper focuses on a subgraph ranking task with the aim of reducing the size of the coarsely retrieved knowledge subgraph and capturing both local and global interactions between question and sub-KSGs. We propose a KSG partition algorithm and a graph-augmented learning to rank model to match-then-rank them. We further validate that reducing the size of knowledge subgraph is beneficial to the subsequent answer selection in an information retrieval based KGQA process. The experimental results on multiple datasets demonstrate the effectiveness of our model. In the future, we will further explore a more effective answer selection model over the small-scale knowledge subgraph selected by our learning to rank model.
Acknowledgements.The work is partially supported by the National Nature Science Foundation of China (No. 61976160), Technology research plan project of Ministry of Public and Security (Grant No. 2020JSYJD01), Thirteen Five-year Research Planning Project of National Language Committee (No. YB135-149) and the Self-determined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE (No. CCNU20ZT012, CCNU20TD001).
- Abujabal et al. (2017) Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, and Gerhard Weikum. 2017. Automated template generation for question answering over knowledge graphs. In Proceedings of the 26th international conference on world wide web. 1191–1200.
- Auer et al. (2007) Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In The semantic web. Springer, 722–735.
- Bao et al. (2016) Junwei Bao, Nan Duan, Zhao Yan, Ming Zhou, and Tiejun Zhao. 2016. Constraint-based question answering with knowledge graph. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2503–2514.
- Bast and Haussmann (2015) Hannah Bast and Elmar Haussmann. 2015. More accurate question answering on freebase. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 1431–1440.
et al. (2013)
Jonathan Berant, Andrew
Chou, Roy Frostig, and Percy Liang.
Semantic parsing on freebase from question-answer
Proceedings of the 2013 conference on empirical methods in natural language processing. 1533–1544.
- Bhutani et al. (2019) Nikita Bhutani, Xinyi Zheng, and HV Jagadish. 2019. Learning to answer complex questions over knowledge bases with query composition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 739–748.
- Bollacker et al. (2008) Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. AcM, 1247–1250.
- Bordes et al. (2014) Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676 (2014).
- Chen et al. (2019b) Yu Chen, Lingfei Wu, and Mohammed J Zaki. 2019b. Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2913–2923.
- Chen et al. (2019a) Zi-Yuan Chen, Chih-Hung Chang, Yi-Pei Chen, Jijnasa Nayak, and Lun-Wei Ku. 2019a. UHop: An unrestricted-hop relation extraction framework for knowledge-based question answering. arXiv preprint arXiv:1904.01246 (2019).
- Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Dong and Lapata (2018) Li Dong and Mirella Lapata. 2018. Coarse-to-fine decoding for neural semantic parsing. arXiv preprint arXiv:1805.04793 (2018).
- Guo et al. (2019) Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR’19). ACM, New York, NY, USA, 1297–1300. https://doi.org/10.1145/3331184.3331403
- Haveliwala (2003) Taher H Haveliwala. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE transactions on knowledge and data engineering 15, 4 (2003), 784–796.
- He et al. (2021) Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. In WSDM.
- Hu et al. (2014) Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042–2050.
- Hu et al. (2018) Sen Hu, Lei Zou, and Xinbo Zhang. 2018. A state-transition framework to answer complex questions over knowledge base. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2098–2108.
- Huang et al. (2013) Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333–2338.
- Kapanipathi et al. (2020) Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramon Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue, et al. 2020. Question Answering over Knowledge Bases by Leveraging Semantic Parsing and Neuro-Symbolic Reasoning. arXiv preprint arXiv:2012.01707 (2020).
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Kumar et al. (2019) Vishwajeet Kumar, Yuncheng Hua, Ganesh Ramakrishnan, Guilin Qi, Lianli Gao, and Yuan-Fang Li. 2019. Difficulty-controllable multi-hop question generation from knowledge graphs. In International Semantic Web Conference. Springer, 382–398.
- Lan et al. (2021) Yunshi Lan, Gaole He, Jinhao Jiang, Jing Jiang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions. arXiv preprint arXiv:2105.11644 (2021).
- Li et al. (2015) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
- Lin et al. (2019) Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. 2019. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2822–2832.
- Lu and Li (2013) Zhengdong Lu and Hang Li. 2013. A Deep Architecture for Matching Short Texts. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/file/8a0e1141fd37fa5b98d5bb769ba1a7cc-Paper.pdf
- Manning et al. (2014) Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.
- Miller et al. (2016) Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1400–1409.
- Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
- Saxena et al. (2020) Apoorv Saxena, Aditay Tripathi, and Partha Talukdar. 2020. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4498–4507.
- Sun et al. (2019) Haitian Sun, Tania Bedrax-Weiss, and William Cohen. 2019. PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2380–2390.
- Sun et al. (2018) Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William Cohen. 2018. Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4231–4242.
- Sun et al. (2020) Yawei Sun, Lingling Zhang, Gong Cheng, and Yuzhong Qu. 2020. SPARQA: skeleton-based semantic parsing for complex questions over knowledge bases. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8952–8959.
- Talman et al. (2019) Aarne Talman, Anssi Yli-Jyrä, and Jörg Tiedemann. 2019. Sentence embeddings in NLI with iterative refinement encoders. Natural Language Engineering 25, 4 (2019), 467–482.
- Talmor and Berant (2018) Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. arXiv preprint arXiv:1803.06643 (2018).
- Vinyals et al. (2015) Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. arXiv preprint arXiv:1506.03134 (2015).
- Wang and Jiang (2016) Shuohang Wang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).
- Wang et al. (2017) Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017).
- Wu et al. (2021) Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, and Bo Long. 2021. Graph Neural Networks for Natural Language Processing: A Survey. arXiv preprint arXiv:2106.06090 (2021).
- Yang et al. (2016) Liu Yang, Qingyao Ai, Jiafeng Guo, and W Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the 25th ACM international on conference on information and knowledge management. 287–296.
- Yao and Van Durme (2014) Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 956–966.
- Yasunaga et al. (2021) Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. arXiv preprint arXiv:2104.06378 (2021).
- Yih et al. (2015) Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1321–1331.