Log In Sign Up

Towards an Engine for Lifelong Interactive Knowledge Learning in Human-Machine Conversations

Although chatbots have been very popular in recent years, they still have some serious weaknesses which limit the scope of their applications. One major weakness is that they cannot learn new knowledge during the conversation process, i.e., their knowledge is fixed beforehand and cannot be expanded or updated during conversation. In this paper, we propose to build a general knowledge learning engine for chatbots to enable them to continuously and interactively learn new knowledge during conversations. As time goes by, they become more and more knowledgeable and better and better at learning and conversation. We model the task as an open-world knowledge base completion problem and propose a novel technique called lifelong interactive learning and inference (LiLi) to solve it. LiLi works by imitating how humans acquire knowledge and perform inference during an interactive conversation. Our experimental results show LiLi is highly promising.


page 1

page 2

page 3

page 4


Towards a Continuous Knowledge Learning Engine for Chatbots

Although chatbots have been very popular in recent years, they still hav...

Lifelong and Interactive Learning of Factual Knowledge in Dialogues

Dialogue systems are increasingly using knowledge bases (KBs) storing re...

Lifelong Knowledge Learning in Rule-based Dialogue Systems

One of the main weaknesses of current chatbots or dialogue systems is th...

A Neural Conversation Generation Model via Equivalent Shared Memory Investigation

Conversation generation as a challenging task in Natural Language Genera...

Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM

Modeling human conversations is the essence for building satisfying chat...

A Review on Dyadic Conversation Visualizations - Purposes, Data, Lens of Analysis

Many professional services are provided through text and voice systems, ...

Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

As Massive Open Online Courses (MOOCs) become increasingly popular, it i...

1 Introduction

Chatbots such as dialog and question-answering systems have a long history in AI and natural language processing. Early such systems were mostly built using markup languages such as AIML

111, handcrafted conversation generation rules, and/or information retrieval techniques Banchs and Li (2012); Ameixa et al. (2014); Lowe et al. (2015); Serban et al. (2015). Recent neural conversation models Vinyals and Le (2015); Xing et al. (2017); Li et al. (2017b) are even able to perform open-ended conversations. However, since they do not use explicit knowledge bases and do not perform inference, they often suffer from generic and dull responses Xing et al. (2017); Li et al. (2017a). More recently, Ghazvininejad et al. (2017) and Le et al. (2016) proposed to use knowledge bases (KBs) to help generate responses for knowledge-grounded conversation. However, one major weakness of all existing chat systems is that they do not explicitly or implicitly learn new knowledge in the conversation process. This seriously limits the scope of their applications. In contrast, we humans constantly learn new knowledge in our conversations. Even if some existing systems can use very large knowledge bases either harvested from a large data source such as the Web or built manually, these KBs still miss a large number of facts (knowledge) West et al. (2014). It is thus important for a chatbot to continuously learn new knowledge in the conversation process to expand its KB and to improve its conversation ability.

In recent years, researchers have studied the problem of KB completion, i.e., inferring new facts (knowledge) automatically from existing facts in a KB. KB completion (KBC) is defined as a binary classification problem: Given a query triple, (, , ), we want to predict whether the source entity and target entity can be linked by the relation . However, existing approaches Lao et al. (2011, 2015); Bordes et al. (2011, 2013); Nickel et al. (2015); Mazumder and Liu (2017) solve this problem under the closed-world assumption, i.e., , and are all known to exist in the KB. This is a major weakness because it means that no new knowledge or facts may contain unknown entities or relations. Due to this limitation, KBC is clearly not sufficient for knowledge learning in conversations because in a conversation, the user can say anything, which may contain entities and relations that are not already in the KB.

In this paper, we remove this assumption of KBC, and allow all , and to be unknown. We call the new problem open-world knowledge base completion (OKBC). OKBC generalizes KBC. Below, we show that solving OKBC naturally provides the ground for knowledge learning and inference in conversations. In essence, we formulate an abstract problem of knowledge learning and inference in conversations as a well-defined OKBC problem in the interactive setting.

From the perspective of knowledge learning in conversations, essentially we can extract two key types of information, true facts and queries, from the user utterances. Queries are facts whose truth values need to be determined222In this work we do not deal with subjective information such as beliefs and opinions, which we leave it to future work.. Note that we do not study fact or relation extraction in this paper as there is an extensive work on the topic. (1) For a true fact, we will incorporate it into the KB. Here we need to make sure that it is not already in the KB, which involves relation resolution and entity linking. After a fact is added to the KB, we may predict that some related facts involving some existing relations in the KB may also be true (not logical implications as they can be automatically inferred). For example, if the user says “Obama was born in USA,” the system may guess that (Obama, CitizenOf, USA) (meaning that Obama is a citizen of USA) could also be true based on the current KB. To verify this fact, it needs to solve a KBC problem by treating (Obama, CitizenOf, USA) as a query. This is a KBC problem because the fact (Obama, BornIn, USA) extracted from the original sentence has been added to the KB. Then Obama and USA are in the KB. If the KBC problem is solved, it learns a new fact (Obama, CitizenOf, USA) in addition to the extracted fact (Obama, BornIn, USA). (2) For a query fact, e.g., (Obama, BornIn, USA) extracted from the user question “Was Obama born in USA?” we need to solve the OKBC problem if any of “Obama, “BornIn”, or “USA” is not already in the KB.

We can see that OKBC is the core of a knowledge learning engine for conversation. Thus, in this paper, we focus on solving it. We assume that other tasks such as fact/relation extraction and resolution and guessing of related facts of an extracted fact are solved by other sub-systems.

We solve the OKBC problem by mimicking how humans acquire knowledge and perform reasoning in an interactive conversation. Whenever we encounter an unknown concept or relation while answering a query, we perform inference using our existing knowledge. If our knowledge does not allow us to draw a conclusion, we typically ask questions to others to acquire related knowledge and use it in inference. The process typically involves an inference strategy (a sequence of actions), which interleaves a sequence of processing and interactive actions. A processing action can be the selection of related facts, deriving inference chain, etc., that advances the inference process. An interactive action can be deciding what to ask, formulating a suitable question, etc., that enable us to interact. The process helps grow the knowledge over time and the gained knowledge enables us to communicate better in the future. We call this lifelong interactive learning and inference (LiLi). Lifelong learning is reflected by the facts that the newly acquired facts are retained in the KB and used in inference for future queries, and that the accumulated knowledge in addition to the updated KB including past inference performances are leveraged to guide future interaction and learning. LiLi should have the following capabilities:

  1. to formulate an inference strategy for a given query that embeds processing and interactive actions.

  2. to learn interaction behaviors (deciding what to ask and when to ask the user).

  3. to leverage the acquired knowledge in the current and future inference process.

  4. to perform 1, 2 and 3 in a lifelong manner for continuous knowledge learning.

This setting is ideal for many NLP applications like dialog and question-answering systems that naturally provide the scope for human interaction and demand real-time inference.

LiLi starts with the closed-world KBC approach path-ranking (PR) Lao et al. (2011); Gardner and Mitchell (2015)333We choose path-ranking (PR) due to its high interpretability and better performance than latent features Wang et al. (2016a); Toutanova (2015) and extends KBC in a major way to open-world knowledge base completion (OKBC). For a relation , PR works by enumerating paths (except single-link path ) between entity-pairs linked by

in the KB and use them as features to train a binary classifier to predict whether a query

should be in the KB. Here, a path between two entities is a sequence of relations linking them. In our work, we adopt the latest PR method, C-PR Mazumder and Liu (2017)

and extend it to make it work in the open-world setting. C-PR enumerates paths by performing bidirectional random walks over the KB graph while leveraging the context of the source-target entity-pair. We also adopt and extend the compositional vector space model

Neelakantan et al. (2015); Das et al. (2016) with continual learning capability for prediction.

User: (Obama, CitizenOf, USA)? “Is Obama a citizen of USA?” [Query]
LiLi: I do not know what “CitizenOf” mean? Can you provide me an example? [Ask for Clue]
User: (David Cameron, CitizenOf, UK). “David Cameron is a citizen of UK.” [SF1]
LiLi: Got it. Can you tell me how “Obama” and “Honolulu” are related? [CLQ]
User: (Obama, BornIn, Honolulu). “Obama was born in Honolulu.” [SF2]
LiLi: Got it. Can you tell also me how “Honolulu” and “Hawaii” are related? [MLQ]
User: (Honolulu, CapitalOfState, Hawaii). “Honolulu is the state capital of Hawaii.” [SF3]
LiLi: (Obama, CitizenOf, USA) is correct. [Answer]

figureAn example of interactive inference and learning. Note that LiLi only works with triples. Each triple above is assumed to be extracted from the sentence after it.

Given an OKBC query (, , ) (e.g., “(Obama, CitizenOf, USA), which means whether Obama a citizen of USA), LiLi interacts with the user (if needed) by dynamically formulating questions (see the interaction example in Figure 1444Note that the user query and responses are in triples as we are not building a conversation system but a knowledge acquisition system. Also, the query may be from a user or a system (e.g., a question-answer system, a conversation system that has extracted a candidate fact and wants to verify it and add it to the KB. This paper will not study the case that the query fact is already in the KB, which is easy to verify. Also note that, as our work focuses on knowledge learning and inference, rather than conversation modeling, we simply use template-based question generation to model LiLi’s interaction with the user.

, which will be further explained in §3) and leverages the interactively acquired knowledge (supporting facts (SFs) in the figure) for continued inference. To do so, LiLi formulates a query-specific inference strategy and executes it. We design LiLi in a Reinforcement Learning (RL) setting that performs sub-tasks like formulating and executing strategy, training a prediction model for inference, and knowledge retention for future use. To the best of our knowledge, our work is the first to address the OKBC problem and to propose an interactive learning mechanism to solve it in a continuous or lifelong manner. We empirically verify the effectiveness of LiLi on two

standard real-world KBs: Freebase and WordNet. Experimental results show that LiLi is highly effective in terms of its predictive performance and strategy formulation ability.

2 Related Work

To the best of our knowledge, we are not aware of any knowledge learning system that can learn new knowledge in the conversation process. This section thus discusses other related work.

Among existing KB completion approaches, Neelakantan et al. (2015) extended the vector space model for zero-shot KB inference. However, the model cannot handle unknown entities and can only work on fixed set of unknown relations with known embeddings. Recently, Shi and Weninger (2017) proposed a method using external text corpus to perform inference on unknown entities. However, the method cannot handle unknown relations. Thus, these methods are not suitable for our open-world setting. None of the existing KB inference methods perform interactive knowledge learning like LiLi.

NELL Mitchell et al. (2015) continuously updates its KB using facts extracted from the Web. Our task is very different as we do not do Web fact extraction (which is also useful). We focus on user interactions in this paper.

Our work is related to interactive language learning (ILL) Wang et al. (2016b, 2017), but these are not about KB completion. The work in Li et al. (2016b) allows a learner to ask questions in dialogue. However, this work used RL to learn about whether to ask the user or not. The “what to ask aspect” was manually designed by modeling synthetic tasks. LiLi formulates query-specific inference strategies which embed interaction behaviors. Also, no existing dialogue systems Vinyals and Le (2015); Li et al. (2016a); Bordes and Weston (2016); Weston (2016); Zhang et al. (2017) employ lifelong learning to train prediction models by using information/knowledge retained in the past.

Our work is related to general lifelong learning in Chen and Liu (2016); Ruvolo and Eaton (2013); Chen and Liu (2014, 2013); Bou Ammar et al. (2015); Shu et al. (2017). However, they learn only one type of tasks, e.g., supervised, topic modeling or reinforcement learning (RL) tasks. None of them is suitable for our setting, which involves interleaving of RL, supervised and interactive learning. More details about lifelong learning can be found in the book Chen and Liu (2016).

3 Interactive Knowledge Learning (LiLi)

We design LiLi as a combination of two interconnected models: (1) a RL model that learns to formulate a query-specific inference strategy for performing the OKBC task, and (2) a lifelong prediction model to predict whether a triple should be in the KB, which is invoked by an action while executing the inference strategy and is learned for each relation as in C-PR. The framework improves its performance over time through user interaction and knowledge retention. Compared to the existing KB inference methods, LiLi overcomes the following three challenges for OKBC:

1. Mapping open-world to close-world. Being a closed-world method, C-PR cannot extract path features and learn a prediction model when any of , or is unknown. LiLi solves this problem through interactive knowledge acquisition. If is unknown, LiLi asks the user to provide a clue (an example of ). And if or is unknown, LiLi asks the user to provide a link (relation) to connect the unknown entity with an existing entity (automatically selected) in the KB. We refer to such a query as a connecting link query (CLQ). The acquired knowledge reduces OKBC to KBC and makes the inference task feasible.

2. Spareseness of KB. A main issue of all PR methods like C-PR is the connectivity of the KB graph. If there is no path connecting and in the graph, path enumeration of C-PR gets stuck and inference becomes infeasible. In such cases, LiLi uses a template relation (“@-?-@”) as the missing link

marker to connect entity-pairs and continues feature extraction. A path containing “@-?-@” is called an

incomplete path. Thus, the extracted feature set contains both complete (no missing link) and incomplete paths. Next, LiLi selects an incomplete path from the feature set and asks the user to provide a link for path completion. We refer to such a query as missing link query (MLQ).

3. Limitation in user knowledge. If the user is unable to respond to MLQs or CLQs, LiLi uses a guessing mechanism (discussed later) to fill the gap. This enables LiLi to continue its inference even if the user cannot answer a system question.

3.1 Components of LiLi

As lifelong learning needs to retain knowledge learned from past tasks and use it to help future learning Chen and Liu (2016), LiLi uses a Knowledge Store (KS) for knowledge retention. KS has four components:

(i) Knowledge Graph

(): (the KB) is initialized with base KB triples (see §4) and gets updated over time with the acquired knowledge. (ii) Relation-Entity Matrix (): is a sparse matrix, with rows as relations and columns as entity-pairs and is used by the prediction model. Given a triple (, , ) , we set [, (, )] = 1 indicating occurs for pair (, ). (iii) Task Experience Store (): stores the predictive performance of LiLi on past learned tasks in terms of Matthews correlation coefficient (MCC)555 that measures the quality of binary classification. So, for two tasks and (each relation is a task), if [] [] [where []=MCC()], we say C-PR has learned well compared to . (iv) Incomplete Feature DB (): stores the frequency of an incomplete path in the form of a tuple (, , ) and is used in formulating MLQs. [(, , )] = implies LiLi has extracted incomplete path times involving entity-pair [(, )] for query relation .

Para. Description
learning rate of Q-learning agent
discount factor of Q-learning agent
If [ILO]=0 and feature set contains # complete features, we consider feature set as complete and set [CPF]=1.
max # times LiLi is allowed to ask user per query (we refer as the interaction limit of LiLi per query).
maximum path length for C-PR
number of random walks per query for C-PR
low and high contextual similarity threshold
rank of trancated SVD
clue acquisition rate
past task selection rate
Table 2: State bits and their meanings.
State bits Name Description
Query entities and
relation searched
Whether the query source () and target () entities
and query relation () have been searched in KB or not.
SEF Source Entity Found Whether the source entity () has been found in KB or not.
TEF Target Entity Found Whether the target entity () has been found in KB or not
QRF Query Relation Found Whether the query relation () has been found in KB or not
CLUE Clue bit set Whether the query is a clue or not.
ILO Interaction Limit Over Whether the interaction limit is over for the query or not.
PFE Path Feature extracted Whether path feature extraction has been done or not.
NEFS Non-empty Feature set Whether the extracted feature set is non-empty or empty.
CPF Complete path Found Whether the extracted path features are complete or not.
INFI Inference Invoked Whether Inference instruction has been invoked or not.
Table 1: Parameters of LiLi.
Id Description Reward Structure   [condition type]
Search source (), target () entities
and query relation () in KB.
Ask user to provide an example/clue for
query relation
Ask user to provide missing link
for path feature completion.
Ask user to provide the connecting link
for augmenting a new entity with KB.
Extract path features between source
() and target () entities using C-PR
Store query data instance in data buffer
and invoke prediction model for inference.
Table 3: Actions and their descriptions.

The RL model learns even after training whenever it encounters an unseen state (in testing) and thus, gets updated over time. KS is updated continuously over time as a result of the execution of LiLi and takes part in future learning. The prediction model uses lifelong learning (LL), where we transfer knowledge (parameter values) from the model for a past most similar task to help learn for the current task. Similar tasks are identified by factorizing and computing a task similarity matrix . Besides LL, LiLi uses to identify poorly learned past tasks and acquire more clues for them to improve its skillset over time.

LiLi also uses a stack, called Inference Stack () to hold query and its state information for RL. LiLi always processes stack top ([top]). The clues from the user get stored in on top of the query during strategy execution and processed first. Thus, the prediction model for is learned before performing inference on query, transforming OKBC to a KBC problem. Table 1 shows the parameters of LiLi used in the following sections.

3.2 Working of LiLi

Given an OKBC query (, , ), we represent it as a data instance . consists of (the query triple), (interaction limit set for ), (experience list storing the transition history of MDP for in RL) and (mode of ) denoting if is ‘’ (training), ‘’ (validation), ‘’ (evaluation) or ‘’ (clue) instance and (feature set). We denote ( ) as the set of all complete (incomplete) path features in . Given a data instance , LiLi starts its initialization as follows: it sets the state as (based on , explained later), pushes the query tuple (, ) into and feeds [top] to the RL-model for strategy formulation from .

Inference Strategy Formulation. We view solving the strategy formulation problem as learning to play an inference game, where the goal is to formulate a strategy that ”makes the inference task possible”. Considering PR methods, inference is possible, iff (1) becomes known to its KB (by acquiring clues when is unknown) and (2) path features are extracted between and (which inturn requires and to be known to KB). If these conditions are met at the end of an episode (when strategy formulation finishes for a given query) of the game, LiLi wins and thus, it trains the prediction model for and uses it for inference.

LiLi’s strategy formulation is modeled as a Markov Decision Process (MDP) with finite state (

) and action () spaces. A state consists of 10 binary state variables (Table 2), each of which keeps track of results of an action taken by LiLi and thus, records the progress in inference process made so far. is the initial state with all state bits set as 0. If the data instance (query) is a clue [], [CLUE] is set as 1. consists of 6 actions (Table 3). , , are processing actions and , , are interactive actions. Whenever is executed, the MDP reaches the terminal state. Given an action in state , if is invalid666 “invalid” means performing in is meaningless (doesn’t advance reasoning) like choosing repetitive processing actions during training (by random exploration of ). in or the objective of is unsatisfied (* marked the condition in ), RL receives a negative reward (empirically set); else receives a positive reward.777

Unlike existing RL-based interactive or active learning

Li et al. (2016b); Woodward and Finn (2017), the user doesn’t provide feedback to guide the learning process of LiLi. Rather, the RL-model uses an internal feedback mechanism to self-learn its optimal policy. This is analogous to the idea of learning by self-realization observed in humans: whenever we try to solve a problem, we often try to formulate strategy and refine it ourselves based on whether we can derive the answer of the problem without external guidance. Likewise, here the RL-model gets feedback based on whether it is able to advance the reasoning process or not.
. We use Q-learning Watkins and Dayan (1992) with -greedy strategy to learn the optimal policy for training the RL model. Note that, the inference strategy is independent of KB type and correctness of prediction. Thus, the RL-model is trained only once from scratch (reused thereafter for other KBs) and also, independently of the prediction model.

Sometimes the training dataset may not be enough to learn optimal policy for all . Thus, encountering an unseen state during test can make RL-model clueless about the action. Given a state , whenever an invalid is chosen, LiLi remains in . For , LiLi remains in untill (see Table 1 for ). So, if the state remains the same for (+1) times, it implies LiLi has encountered a fault (an unseen state). RL-model instantly switches to the training mode and randomly explores to learn the optimal action (fault-tolerant learning). While exploring , the model chooses only when it has tried all other to avoid abrupt end of episode.

Execution of Actions. At any given point in time, let (, ) be the current [top], is the chosen action and the current version of KS components are , , and . Then, if is invalid in , LiLi only updates [top] with (, ) and returns [top] to RL-model. In this process, LiLi adds experience (, , , ) in and then, replaces [top] with (, ). If is valid in , LiLi first sets the next state and performs a sequence of operations based on (discussed below). Unless specified, in , LiLi always monitors and if becomes 0, LiLi sets . Also, whenever LiLi asks the user a query, is decremented by 1. Once ends, LiLi updates [top] with (, ) and returns [top] to RL-model for choosing the next action.

In , LiLi searches , , in and sets appropriate bits in (see Table 2). If was unknown before and is just added to or is in the bottom % (see Table 1 for ) of 888LiLi selects % tasks from for which it has performed poorly (evaluated on validation data in our case) and acquires clue with probability (while processing test data). This helps in improving skills of LiLi continuously on past poorly learned tasks., LiLi randomly sets

with probability

. If is a clue and , LiLi updates KS with triple , where (, , ) and (, , ) gets added to and , are set as 1.

In , LiLi asks the user to provide a clue (+ve instance) for and corrupts and of the clue once at a time, to generate -ve instances by sampling nodes from . These instances help in training prediction model for while executing .

In , LiLi selects an incomplete path from to formulate MLQ, such that is most frequently observed for and is high, given by . Here, denotes the contextual similarity Mazumder and Liu (2017) of entity-pair . If is high, is more likely to possess a relation between them and so, is a good candidate for formulating MLQ. When the user does not respond to MLQ (or CLQ in ), the guessing mechanism is used, which works as follows: Since contextual similarity of entity-pairs is highly correlated with their class labels Mazumder and Liu (2017), LiLi divides the similarity range [-1, 1] into three segments, using a low () and high () similarity threshold and replaces the missing link with in to make it complete as follows: If , = “@-LooselyRelatedTo-@”; else if , =“@-NotRelatedTo-@”; Otherwise, =“@-RelatedTo-@”.

In , LiLi asks CLQs for connecting unknown entities and/or with by selecting the most contextually relevant node (wrt , ) from , given by link 999If # nodes in is very large, a candidate set for is sampled for computing .. We adopt the contextual relevance idea in Mazumder and Liu (2017) which is computed using word embedding Mikolov et al. (2013)101010Although and may be unknown to , to avoid unnecessary complexity, we assume that LiLi has access to embedding vectors for all entities (known and unknown) in our datasets. In practice, we can update the embedding model continuously by fetching documents from the Web for unknown entities.

In , LiLi extracts path features between (, ) and updates with incomplete features from . LiLi always trains the prediction model with complete features and once or , LiLi stops asking MLQs. Thus, in both and , LiLi always monitors to check for the said requirements and sets to control interactions.

In , if LiLi wins the episode, it adds in one of data buffers based on its mode . E.g., if or , is used for training and added to . Similarly validation buffer and evaluation buffer are populated. If , LiLi invokes the prediction model for 111111We invoke the prediction model only when all instances for are populated in the data buffers to enable batch processing..

Lifelong Relation Prediction. Given a relation , LiLi uses and (see ) to train a prediction model (say, ) with parameters . For a unknown , the clue instances get stored in and . Thus, LiLi populates by taking 10% (see §4) of the instances from and starts the training. For , LiLi uses a LSTM Hochreiter and Schmidhuber (1997) to compose the vector representation of each feature as and vector representation of as . Next, LiLi computes the prediction value,

as sigmoid of the mean cosine similarity of all features and

, given by ) and maximize the log-likelihood of for training. Once is trained, LiLi updates [] using . We also train an inverse model for , by reversing the path features in and which help in lifelong learning (discussed below). Unlike Neelakantan et al. (2015); Das et al. (2016), while predicting the label for , we compute a relation-specific prediction threshold corresponding to using as: and infer as +ve if and -ve otherwise. Here, () is the mean prediction value for all +ve (-ve) examples in .

Models trained on a few examples (e.g., clues acquired for unknown ) with randomly initialized weights often perform poorly due to underfitting. Thus, we transfer knowledge (weights) from the past most similar (wrt ) task in a lifelong learning manner Chen and Liu (2016). LiLi uses to find the past most similar task for as follows: LiLi computes trancated SVD of as and then, the similarity matrix . provides the similarity between relations and in . Thus, LiLi chooses a source relation to transfer weights. Here, is the set of all and for which LiLi has already learned a prediction model. Now, if or , LiLi randomly initializes the weights for and proceeds with the training. Otherwise, LiLi uses as initial weights and fine-tunes with a low learning rate.

A Running Example. Considering the example shown in Figure 1, LiLi works as follows: first, LiLi executes and detects that the source entity “Obama” and query relation “CitizenOf” are unknown. Thus, LiLi executes to acquire clue (SF1) for “CitizenOf” and pushes the clue (+ve example) and two generated -ve examples into . Once the clues are processed and a prediction model is trained for “CitizenOf” by formulating separate strategies for them, LiLi becomes aware of “CitizenOf”. Now, as the clues have already been popped from , the query becomes and the strategy formulation process for the query resumes. Next, LiLi asks user to provide a connecting link for “Obama” by performing . Now, the query entities and relation being known, LiLi enumerates paths between “Obama” and “USA” by performing . Let an extracted path be “” with missing link between (, ). LiLi asks the user to fill the link by performing and then, extracts the complete feature “”. The feature set is then fed to the prediction model and inference is made as a result of . Thus, the formulated inference strategy is: “”.

4 Experiments

We now evaluate LiLi in terms of its predictive performance and strategy formulation abilities.

Freebase (FB) WordNet (WN)
# Rels ( / ) 1,345 / 1,248 18 / 14
# Entities ( /) 13, 871 / 12, 306 13, 595 / 12, 363
# Triples ( /) 854, 362 / 529,622 107, 146 / 58, 946
# Test Rels (kwn / unk) 50 (38 / 12) 18 (14 / 4)
Avg. # train / valid
hline / test instances/rel. 1715 / 193 / 557 994 / 109 / 326
Entity statistics (Avg. %) train valid test train valid test
only source () unk 15.5 15.8 15.6 12.4 10.4 19.0
only target () unk 13.0 12.7 13.4 14.2 15.6 13.8
both and unk 2.9 3.3 2.8 3.6 3.6 6.2
Table 4: Dataset statistics [kwn = known, unk = unknown]

Data: We use two standard datasets (see Table 4): (1) Freebase FB15k121212, and (2) WordNet. Using each dataset, we build a fairly large graph and use it as the original KB () for evaluation. We also augment with inverse triples (, , ) for each (, , ) following existing KBC methods.

[C: 0.47] [ C: 1.0]   [C: 1.0]
[ C: 0.97]
 [C: 1.0]
Table 5: Inference strategies formulated by LiLi (ordered by frequency).
KB Test Rel type Avg. +ve F1 Score Avg. MCC
FB kwn 0.3796 0.5741 0.5069 0.5643 0.5547 0.5859 0.0937 0.2638 0.2382 0.2443 0.2573 0.2763
unk 0.5477 0.5425 0.4876 0.5398 0.5421 0.5567 0.2175 0.1752 0.1802 0.1664 0.1748 0.2119
all 0.4199 0.5665 0.5023 0.5584 0.5517 0.5789 0.1234 0.2425 0.2243 0.2256 0.2375 0.2609
WN kwn 0.3846 0.5851 0.5817 0.5554 0.6083 0.6343 0.2494 0.3838 0.3603 0.2980 0.4159 0.4096
unk 0.5732 0.5026 0.5861 0.5694 0.5539 0.5871 0.3348 0.2501 0.3123 0.3148 0.2667 0.3387
all 0.4265 0.5668 0.5827 0.5586 0.5962 0.6238 0.2684 0.3541 0.3496 0.3017 0.3828 0.3939
Table 6: Comparison of predictive performance of various versions of LiLi [kwn = known, unk = unknown, all = overall].

Parameter Settings. Unless specified, the empirically set parameters (see Table 1) of LiLi are: , , , , , , , , , , . For training RL-model with -greedy strategy, we use ,

, pre-training steps=50000. We used Keras deep learning library to implement and train the prediction model. We set batch-size as 128, max. training epoch as 150, dropout as 0.2, hidden units and embedding size as 300 and learning rate as 5e-3 which is reduced gradually on plateau with factor 0.5 and patience 5. Adam optimizer and early stopping were used in training. We also shuffle

in each epoch and adjust class weights inversely proportional to class frequencies in .

Labeled Dataset Generation and Simulated User Creation. We create a simulated user for each KB to evaluate LiLi131313Crowdsourced-based training and evaluation is expensive and time consuming as user-interaction is needed in training.. We create the labeled datasets, the simulated user’s knowledge base (), and the base KB () from . used as the initial KB graph () of LiLi.

We followed Mazumder and Liu (2017) for labeled dataset generation. For Freebase, we found 86 relations with triples and randomly selected 50 from various domains. We randomly shuffle the list of 50 relations, select 25% of them as unknown relations and consider the rest (75%) as known relations. For each known relation , we randomly shuffle the list of distinct triples for , choose 1000 triples and split them into 60% training, 10% validation and 20% test. Rest 10% along with the leftover (not included in the list of 1000) triples are added to . For each unknown relation , we remove all triples of from and add them to . In this process, we also randomly choose 20% triples as test instances for unknown which are excluded from . Note that, now has at least 10% of chosen triples for each (known and unknown) and so, user is always able to provide clues for both cases. For each labeled dataset, we randomly choose 10% of the entities present in dataset triples, remove triples involving those entities from and add to . At this point, gets reduced to and is used as for LiLi. The dataset stats in Table 4 shows that the base KB (60% triples of ) is highly sparse (compared to original KB) which makes the inference task much harder. WordNet dataset being small, we select all 18 relations for evaluation and create labeled dataset, and following Freebase. Although the user may provide clues 100% of the time, it often cannot respond to MLQs and CLQs (due to lack of required triples/facts). Thus, we further enrich with external KB triples141414Due to fair amount of entity overlapping, we choose NELL for enriching in case of Freebase and ConceptNet for enriching in case of WordNet..

Given a relation and an observed triple (, , ) in training or testing, the pair (, ) is regarded as a +ve instance for . Following Wang et al. (2016a), for each +ve instance (, ), we generate two negative ones, one by randomly corrupting the source , and the other by corrupting the target . Note that, the test triples are not in or and none of the -ve instances overlap with the +ve ones.

Baselines. As none of the existing KBC methods can solve the OKBC problem, we choose various versions of LiLi as baselines.
Single: Version of LiLi where we train a single prediction model for all test relations.
Sep: We do not transfer (past learned) weights for initializing , i.e., we disable LL.
F-th): Here, we use a fixed prediction threshold 0.5 instead of relation-specific threshold .
BG: The missing or connecting links (when the user does not respond) are filled with “@-RelatedTo-@” blindly, no guessing mechanism.
w/o PTS: LiLi does not ask for additional clues via past task selection for skillset improvement.

Evaluation Metrics. To evaluate the strategy formulation ability, we introduce a measure called Coverage(), defined as the fraction of total query data instances, for which LiLi has successfully formulated strategies that lead to winning. If LiLi wins on all episodes for a given dataset, is 1.0. To evaluate the predictive performance, we use Avg. MCC and avg. +ve F1 score.

4.1 Results and Analysis

Evaluation-I: Strategy Formulation Ability. Table 5 shows the list of inference strategies formulated by LiLi for various and , which control the strategy formulation of LiLi. When , LiLi cannot interact with user and works like a closed-world method. Thus, drops significantly (0.47). When , i.e. with only one interaction per query, LiLi acquires knowledge well for instances where either of the entities or relation is unknown. However, as one unknown entity may appear in multiple test triples, once the entity becomes known, LiLi doesn’t need to ask for it again and can perform inference on future triples causing significant increase in (0.97). When , LiLi is able to perform inference on all instances and becomes 1. For , LiLi uses only once (as only one MLQ satisfies ) compared to . In summary, LiLi’s RL-model can effectively formulate query-specific inference strategies (based on specified parameter values).

Evaluation-II: Predictive Performance.

Table 6 shows the comparative performance of LiLi with baselines. To judge the overall improvements, we performed paired t-test considering +ve F1 scores on each relation as paired data. Considering both KBs and all relation types, LiLi outperforms Sep with

. If we set (training with very few clues), LiLi outperforms Sep with on Freebase considering MCC. Thus, the lifelong learning mechanism is effective in transferring helpful knowledge. Single model performs better than Sep for unknown relations due to the sharing of knowledge (weights) across tasks. However, for known relations, performance drops because, as a new relation arrives to the system, old weights get corrupted and catastrophic forgetting occurs. For unknown relations, as the relations are evaluated just after training, there is no chance for catastrophic forgetting. The performance improvement () of LiLi over F-th on Freebase signifies that the relation-specific threshold works better than fixed threshold 0.5 because, if all prediction values for test instances lie above (or below) 0.5, F-th predicts all instances as +ve (-ve) which degrades its performance. Due to the utilization of contextual similarity (highly correlated with class labels) of entity-pairs, LiLi’s guessing mechanism works better () than blind guessing (BG). The past task selection mechanism of LiLi also improves its performance over w/o PTS, as it acquires more clues during testing for poorly performed tasks (evaluated on validation set). For Freebase, due to a large number of past tasks [9 (25% of 38)], the performance difference is more significant (). For WordNet, the number is relatively small [3 (25% of 14)] and hence, the difference is not significant.

Evaluation-III: User Interaction vs. Performance. Table 7 shows the results of LiLi by varying clue acquisition rate (). We use Freebase for tuning due to its higher number of unknown test relations compared to WordNet. LiLi’s performance improves significantly as it acquires more clues from the user. The results on outperforms () that on . Table 8 shows the results of LiLi on user responses to MLQ’s and CLQ’s. Answering MLQ’s and CLQ’s is very hard for simulated users (unlike crowd-sourcing) as often lacks the required triple. Thus, we attempt to analyze how the performance is effected if the user does not respond at all. The results show a clear trend in overall performance improvement when the user responds. However, the improvement is not significant as the simulated user’s query satisfaction rate (1% MLQs and 10% CLQs) is very small. But, the analysis shows the effectiveness of LiLi’s guessing mechanism and continual learning ability that help in achieving avg. +ve F1 of 0.57 and 0.62 on FB and WN respectively with minimal participation of the user.

Rel Type
F(+) F(+) F(+)
known 0.5796 0.5820 0.5859
unknown 0.5231 0.5414 0.5567
overall 0.5660 0.5722 0.5789
Table 7: LiLi’s performance on FB by varying .
KB Rel Type
No Response to
CLQs and MLQs
Response to
CLQs and MLQs
F(+) MCC F(+) MCC
FB known 0.5823 0.2775 0.5859 0.2763
unknown 0.5529 0.2049 0.5567 0.2119
overall 0.5753 0.2601 0.5789 0.2609
WN known 0.5990 0.3590 0.6343 0.4096
unknown 0.5952 0.3457 0.5871 0.3387
overall 0.5982 0.3561 0.6238 0.3939
Table 8: Performance of LiLi on user’s responses.

5 Conclusion

In this paper, we are interested in building a generic engine for continuous knowledge learning in human-machine conversations. We first showed that the problem underlying the engine can be formulated as an open-world knowledge base completion (OKBC) problem. We then proposed an lifelong interactive learning and inference (LiLi) approach to solving the OKBC problem. OKBC is a generalization of KBC. LiLi solves the OKBC problem by first formulating a query-specific inference strategy using RL and then executing it to solve the problem by interacting with the user in a lifelong learning manner. Experimental results showed the effectiveness of LiLi in terms of both predictive quality and strategy formulation ability. We believe that a system with the LiLi approach can serve as a knowledge learning engine for conversations. Our future work will improve LiLi to make more accurate.


This work was supported in part by National Science Foundation (NSF) under grant no. IIS-1407927 and IIS-1650900, and a gift from Huawei Technologies Co Ltd.