Knowledge bases (KBs) are considered as an essential resource for answering factoid questions. However, accurately constructing KB with a well-designed and complicated schema requires lots of human efforts, which inevitably limits the coverage of KBs Min et al. (2013). As a matter of fact, KBs are often incomplete and insufficient to cover full evidence required by open-domain questions.
On the other hand, the vast amount of unstructured text on the Internet can easily cover a wide range of evolving knowledge, which is commonly used for open-domain question answering Chen et al. (2017); Wang et al. (2018). Therefore, to improve the coverage of KBs, it is straightforward to augment KB with text data. Recently, text-based QA models along Seo et al. (2016); Xiong et al. (2017); Yu et al. (2018) have achieved remarkable performance when dealing with a single passage that is guaranteed to include the answer. However, they are still insufficient when multiple documents are presented. We hypothesize this is partially due to the lack of background knowledge while distinguishing relevant information from irrelevant ones (see Figure 1 for a real example).
To better utilize textual evidence for improving QA over incomplete KBs, this paper presents a new end-to-end model, which consists of (1) a simple yet effective subgraph reader that accumulates knowledge of each KB entity from a question-related KB subgraph; and (2) a knowledge-aware text reader that selectively incorporates the learned KB knowledge about entities with a novel conditional gating mechanism. With the specifically designed gate functions, our model has the ability to dynamically determine how much KB knowledge to incorporate while encoding questions and passages, thus is able to make the structured knowledge more compatible with the text information. Compared to the previous state-of-the-art Sun et al. (2018), our model achieves consistent improvements with a much more efficient pipeline, which only requires a single pass of the evidence resources.
2 Task Definition
The QA task we consider here requires answering questions by reading knowledge base tuples and retrieved Wikipedia documents . To build a scalable system, we follow Sun et al. (2018) and only consider a subgraph for each question. The subgraph is retrieved by running Personalized PageRank Haveliwala (2002) from the topic entities222Annotated by STAGG Yih et al. (2014). (entities mentioned by the question: ). The documents are retrieved by an existing document retriever Chen et al. (2017) and further ranked by Lucene index. The entities in documents are also annotated and linked to KB entities. For each question, the model tries to retrieve answer entities from a candidate set including all KB and document entities.
3.1 SubGraph Reader
This section describes the KB subgraph reader (SGReader), which employs graph-attention techniques to accumulate knowledge of each subgraph entity () from its linked neighbors (). The graph attention mechanism is particularly designed to take into account two important aspects: (1) whether the neighbor relation is relevant to the question; (2) whether the neighbor entity is a topic entity mentioned by the question. After the propagation, the SGReader
finally outputs a vectorized representation for each entity, encoding the knowledge indicated by its linked neighbors.
Question-Relation Matching To match the question and KB relation in an isomorphic latent space, we apply a shared LSTM to encode the question and the tokenized relation . With the derived hidden states and for each word, we first compute the representation of relations with a self-attentive encoder:
where is the -th row of and is a trainable vector. Since a question needs to be matched with different relations and each relation is only described by part of the question, instead of matching the relations with a single question vector, we calculate the matching score in a more fine-grained way. Specifically, we first use to attend each question token and then model the matching by a dot product as follows:
Extra Attention over Topic Entity Neighbors
In addition to the question-relation similarities, we find another binary indicator feature derived from the topic entity is very useful. This indicator is defined as for a neighbor of an arbitrary entity . Intuitively, if one neighbor links to a topic entity that appear in the question then the corresponding tuple could be more relevant than other non-topic neighbors for question answering. Formally, the final attention score over each neighbor is defined as:
Information Propagation from Neighbors To accumulate the knowledge from the linked tuples, we define the propagation rule for each entity :
are pre-computed knowledge graph embeddings,is a trainable transformation matrix and
is an activation function. In addition,is a trade-off parameter calculated by a linear gate function as 333. , which controls how much information in the original entity representation should be retained.444The above step can be viewed as a gated version of the graph encoding techniques in NLP, e.g., Song et al. (2018); Xu et al. (2018). These general graph-encoders and graph-attention techniques may help when the questions require more hops and we leave the investigation to future work.
3.2 Knowledge-Aware Text Reader
With the learned KB embeddings, our model enhances text reading with KAReader. Briefly, we use an existing reading comprehension model Chen et al. (2017) and improve it by learning more knowledge-aware representations for both question and documents.
Query Reformulation in Latent Space
First, we update the question representation in a way that the KB knowledge of the topic entity can be incorporated. This allows the reader to discriminate relevant information beyond text matching.
Formally, we first take the original question encoding and apply a self-attentive encoder to get a stand-alone question representation: . We collect the topic entity knowledge of the question by . Then we apply a gating mechanism to fuse the original question representation and the KB knowledge:
where , and is a linear gate.
Knowledge-aware Passage Enhancement
To encode the retrieved passages, we use a standard bi-LSTM, which takes several token-level features555We use the same set of features as in Chen et al. (2017) except for the tagging labels.. With the entity linking annotations in passages, we fuse the entity knowledge with the token-level features in a similar fashion as the query reformulation process. However, instead of applying a standard gating mechanism Yang and Mitchell (2017); Mihaylov and Frank (2018), we propose a new conditional gating function that explicitly conditions on the question . This simple modification allows the reader to dynamically select the inputs according to their relevance to the question. Considering a passage token with its token features and its linked entity 666Non-entity tokens are encoded with token-level features only., we define the conditional gating function as:
denotes the entity embedding learned by our SGReader.
Entity Info Aggregation from Text Reading
Finally we feed the knowledge-augmented inputs into the biLSTM and use the output token-level hidden state to calculate the attention scores . Afterwards, we get each document’s representation as . For a certain entity and all the documents containing : , we simply aggregate the information by averaging the representations of linked documents as .
3.3 Answer Prediction
With entities representations ( and
), we predict the probability of an entity being the answer by matching the query vectors and the entity representations:.
|Model||10% KB||30% KB||50% KB||100% KB|
|SGReader + KAReader (Ours)||33.6||18.9||42.6||27.1||52.7||36.1||67.2||57.3|
Our experiments are based on the WebQSP dataset Yih et al. (2016). To simulate the real-world scenarios, we test our models following the settings of Sun et al. (2018), where the KB is downsampled to different extents. For a fair comparison, the retrieved document set is the same as the previous work.
Baselines and Evaluation Key-Value (KV) Memory Network Miller et al. (2016) is a simple baseline that treats KB triples and documents as memory cells. Specifically, we consider its two variants, KV-KB and KV-KB+Text. The former is a KB-only model while the latter uses both KB and text. We also compare to the latest method GraftNet (GN) Sun et al. (2018), which treats documents as a special genre of nodes in KBs and utilizes graph convolution Kipf and Welling (2016) to aggregate the information. Similar to the KV-based baselines, we denote GN-KB as the KB-only version. Further, both GN-LF (late fusion) and GN-EF (early fusion) consider both KB and text. The former one considers KB and texts as two separate graphs, and then ensembles the answer scores. GN-EF is the existing best single model, which considers KB and texts as a single heterogeneous graph and aggregate the evidence to predict a single answer score for each entity. F1 and His@1 are used for evaluation since multiple correct answers are possible.
Throughout our experiments, we use the 300-dimension GloVe embeddings trained on the Common Crawl corpus. The hidden dimension of LSTM and the dimension of entity embeddings are both 100. We use the same pre-trained entity embeddings as used by Sun et al. (2018)
. For graph attention over the KB sub-graph, we limit the max number of neighbors for each entity to be 50. We use the norm for gradient clipping as 1.0. We apply dropout=0.2 on both word embeddings and LSTM hidden states. The max question length is set to 10 and the max document length is set to 50. For optimization, we apply label smoothing with a factor of 0.1 on the binary cross-entropy loss. During training, we use the Adam with a learning rate of 0.001.
4.2 Results and Analysis
We show the main results of different incomplete KB settings in Table 1. For reference, we also show the results under full KB settings (i.e., 100%, all of the required evidence is covered by KB). The row of SGReader shows the results of our model using only KB evidence. Compared to the previous KBQA methods (KV-KB and GN-KB), SGReader achieves better results in incomplete KB settings and competitive performance with the full KB. Here we do not compare with existing methods that utilize semantic parsing annotations Yih et al. (2016); Yu et al. (2017). It is worth noting that SGReader only needs one hop of graph propagation while the compared methods typically require multiple hops.
Augmenting the SGReader with our knowledge-aware reader (KAReader) results in consistent improvements in the settings with incomplete KBs. Compared to other baselines, although our model is built upon a stronger KB-QA base model, it achieves the largest absolute improvement. It is worth mentioning that our model is still a single model, but it achieves competitive results to the existing ensemble model (GN-LF+EF). The results demonstrate the advantage of our knowledge-aware text reader.
|- w/o query reformulation||44.4||27.6|
|- w/o knowledge enhancement||45.2||27.0|
|- w/o conditional knowledge gate||44.4||27.0|
|1)||Question: Which airport to fly into Rome?|
|Groundtruth: Leonardo da Vinci-Fiumicino Airport (fb:m.01ky5r), Ciampino-G. B. Pastine International|
|SGReader: Italian Met Office Airport (fb:m.04fngkc)|
|SGReader + KAReader: Leonardo da Vinci-Fiumicino Airport (fb:m.01ky5r)|
|Missing knowledge of the incomplete KB: No airport info about Rome.|
|1)||Question: Where did George Herbert Walker Bush go to college?|
|Groundtruth: Yale (fb:m.08815)|
|SGReader: United States of America (fb:m.09c7w0)|
|SGReader + KAReader: Yale (fb:m.08815)|
|Missing knowledge of the incomplete KB: No college info about George Herbert Walker Bush.|
|2)||Question: When did Juventus win the champions league?|
|Groundtruth: 1996 UEFA Champions League Final (fb:m.02pt_57)|
|SGReader: 1996 UEFA Super Cup (fb:m.02rw0yt)|
|SGReader + KAReader: 1996 UEFA Champions League Final (fb:m.02pt_57)|
|Missing knowledge of the incomplete KB: UEFA Super Cup is not UEFA Champions League Final (fb:m.05nblxt)|
|2)||Question: What college did Albert Einstein go to?|
|Groundtruth: ETH Zurich (fb:m.01dyk8), University of Zurich (fb:m.01tpvt)|
|SGReader: Sri Krishnaswamy matriculation higher secondary school (fb:m.0127vh33)|
|SGReader + KAReader: ETH Zurich (fb:m.01dyk8)|
|Missing knowledge of the incomplete KB: the answer should be a college (fb:m.01y2hnl)|
|3)||Question: When is the last time the Denver Broncos won the Superbowl?|
|Groundtruth: Super Bowl XXXIII (fb:m.076y0)|
|SGReader: Super Bowl XXXIII (fb:m.076y0)|
|SGReader + KAReader: 1999 AFC Championship game (fb:m.0100z7bp)|
|3)||Question: What was Lebron James first team?|
|Groundtruth: Cleveland Cavaliers (fb:m.0jm7n)|
|SGReader: Cleveland Cavaliers (fb:m.0jm7n)|
|SGReader + KAReader: Toronto Raptors (fb:m.0jmcb)|
To study the effect of each KAReader component, we conduct ablation analysis under the 30% KB setting (Table 2). We see that both query reformulation and knowledge enhancement are essential to the performance. Additionally, we find the conditional gating mechanism proposed in §3.2 is important. When replacing it with a standard gate function (see the row w/o conditional knowledge gate), the performance is even lower than the reader without knowledge enhancement, suggesting our proposed new gate function is crucial for the success of knowledge-aware text reading. The potential reason is that without the question information, the gating mechanism might introduce some irrelevant and misleading knowledge.
In Table 3, there are two major categories of questions that can be better answered using our full model. In the first category, indicated by 1), the answer fact is missing in the KB, mainly because there are no links from the question entities to the answer entity. In these cases, the SGReader sometimes can predict an answer with a correct type, but the answers are mostly irrelevant to the question.
The second category, denoted as 2), indicates examples where the KB provides relevant information but does not cover some of the constraints on answers’ properties (e.g., answers’ entity types). In the two examples shown above, we can see that SGReader is able to give some reasonable answers but the answers do not satisfy the constraints indicated by the question.
Finally, when the KB is sufficient to answer a question, there are some cases where the KAReader introduces wrong answers into the top-ranked answer list. We list two examples at the bottom of the Table 3. These newly included incorrect answers are usually relevant to the original questions but come from the noises in machine reading. These cases suggest that our concatenation-based knowledge aggregation still has some room for improvement, which we leave for future work.
We present a new QA model that operates over incomplete KB and text documents to answer open-domain questions, which yields consistent improvements over previous methods on the WebQSP benchmark with incomplete KBs. The results show that (1) with the graph attention technique, we can efficiently and accurately accumulate question-related knowledge for each KB entity in one-pass of the KB sub-graph; (2) our designed gating mechanisms could successfully incorporate the encoded entity knowledge while processing the text documents. In future work, we will extend the proposed idea to other QA tasks with evidence of multimodality, e.g. combining with symbolic approaches for visual QA Gan et al. (2017); Mao et al. (2019); Hu et al. (2019).
- Chen et al. (2017) Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 1870–1879.
- Gan et al. (2017) Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, and Boqing Gong. 2017. Vqs: Linking segmentations to questions and answers for supervised attention in vqa and question-focused semantic segmentation. In ICCV, pages 1811–1820.
- Haveliwala (2002) Taher H Haveliwala. 2002. Topic-sensitive pagerank. In Proceedings of the 11th international conference on World Wide Web, pages 517–526. ACM.
- Hu et al. (2019) Ronghang Hu, Anna Rohrbach, Trevor Darrell, and Kate Saenko. 2019. Language-conditioned graph networks for relational reasoning. arXiv preprint arXiv:1905.04405.
- Kipf and Welling (2016) Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907.
- Mao et al. (2019) Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B Tenenbaum, and Jiajun Wu. 2019. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint arXiv:1904.12584.
- Mihaylov and Frank (2018) Todor Mihaylov and Anette Frank. 2018. Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge. In ACL 2018, pages 821–832.
Miller et al. (2016)
Alexander H. Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine
Bordes, and Jason Weston. 2016.
memory networks for directly reading documents.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 1400–1409.
- Min et al. (2013) Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 777–782.
- Seo et al. (2016) Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603.
- Song et al. (2018) Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. A graph-to-sequence model for amr-to-text generation. arXiv preprint arXiv:1805.02473.
- Sun et al. (2018) Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William W. Cohen. 2018. Open domain question answering using early fusion of knowledge bases and text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4231–4242. Association for Computational Linguistics.
- Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
Wang et al. (2018)
Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu
Chang, Gerry Tesauro, Bowen Zhou, and Jing Jiang. 2018.
R 3: Reinforced ranker-reader for open-domain question answering.
Thirty-Second AAAI Conference on Artificial Intelligence.
- Xiong et al. (2017) Caiming Xiong, Victor Zhong, and Richard Socher. 2017. DCN+: mixed objective and deep residual coattention for question answering. CoRR, abs/1711.00106.
- Xu et al. (2018) Kun Xu, Lingfei Wu, Zhiguo Wang, and Vadim Sheinin. 2018. Graph2seq: Graph to sequence learning with attention-based neural networks. arXiv preprint arXiv:1804.00823.
- Yang and Mitchell (2017) Bishan Yang and Tom Mitchell. 2017. Leveraging knowledge bases in lstms for improving machine reading. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1436–1446.
- Yih et al. (2014) Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic parsing for single-relation question answering. In ACL 2014, volume 2, pages 643–648.
- Yih et al. (2016) Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang, and Jina Suh. 2016. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 201–206.
- Yu et al. (2018) Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. 2018. Qanet: Combining local convolution with global self-attention for reading comprehension. CoRR, abs/1804.09541.
- Yu et al. (2017) Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cícero Nogueira dos Santos, Bing Xiang, and Bowen Zhou. 2017. Improved neural relation detection for knowledge base question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 571–581.