Knowledge representation and reasoning (KRR) is the process of representing the domain knowledge in formal languages (e.g., SPARQL, Prolog) such that it can be used by expert systems to execute querying and reasoning services. KRR have been applied in many fields including financial regulations, medical diagnosis, laws, and so on. One major obstacle in KRR is the creation of large-scale knowledge bases with high quality. For one thing, this requires the knowledge engineers (KEs) not only to have the background knowledge in a certain domain but have enough skills in knowledge representation as well. Unfortunately, qualified KEs are also in short supply. Therefore, it would be useful to build a tool that allows the domain experts without any background in logic to construct and query the knowledge base simply from text.
Controlled natural languages (CNLs)  were developed as a technology that achieves this goal. CNLs are designed based on natural languages (NLs) but with restricted syntax and interpretation rules that determine the unique meaning of the sentence. Representative CNLs include Attempto Controlled English  and PENG . Each CNL is developed with a language parser which translates the English sentences into an intermediate structure, discourse representation structure (DRS) . Based on the DRS structure, the language parsers further translate the DRS into the corresponding logical representations, e.g., Answer Set Programming (ASP)  programs. One main issue with the aforementioned CNLs is that the systems do not provide enough background knowledge to preserve semantic equivalences of sentences that represent the same meaning but are expressed via different linguistic structures. For instance, the sentences Mary buys a car and Mary makes a purchase of a car are translated into different logical representations by the current CNL parsers. As a result, if the user ask a question who is a buyer of a car, these systems will fail to find the answer.
In this thesis proposal, I will present KALM [7, 6], a system for knowledge authoring and question answering. KALM is superior to the current CNL systems in that KALM has a complex frame-semantic parser which can standardize the semantics of the sentences that express the same meaning via different linguistic structures. The frame-semantic parser is built based on FrameNet  and BabelNet  where FrameNet is used to capture the meaning of the sentence and BabelNet  is used to disambiguate the meaning of the extracted entities from the sentence. Experiment results show that KALM achieves superior accuracy in knowledge authoring and question answering as compared to the state-of-the-art systems.
The rest parts are organized as follows: Section 2 discusses the related works, Section 3 presents the KALM architecture, Section 4 presents KALM-QA, the question answering part of KALM, Section 5 shows the evaluation results, Section 6 shows the future work beyond the thesis, and Section 7 concludes the paper.
2 Related Works
As is described in Section 1, CNL systems were proposed as the technology for knowledge representation and reasoning. Related works also include knowledge extraction tools, e.g., OpenIE , SEMEFOR , SLING , and Standford KBP system . These knowledge extraction tools are designed to extract semantic relations from English sentences that capture the meaning. The limitations of these tools are two-fold: first, they lack sufficient accuracy to extract the correct semantic relations and entities while KRR is very sensitive to incorrect data; second, these systems are not able to map the semantic relations to logical forms and therefore not capable of doing KRR. Other related works include the question answering frameworks, e.g., Memory Network , Variational Reasoning Network , ATHENA , PowerAqua 
. The first two belong to end-to-end learning approaches based on machine learning models. The last two systems have implemented semantic parsers which translate natural language sentences into intermediate query languages and then query the knowledge base to get the answers. For the machine learning based approaches, the results are not explainable. Besides, their accuracy is not high enough to provide correct answers. For ATHENA and PowerAqua, these systems perform question answering based on a priori knowledge bases. Therefore, they do not support knowledge authoring while KALM is able to support both knowledge authoring and question answering.
3 The KALM Architecture
Figure 1 shows the architecture of KALM which translates a CNL sentence to the corresponding logical representations, unique logical representations (ULR).
Attempto Parsing Engine. The input sentences are CNL sentences based on ACE grammar.111http://attempto.ifi.uzh.ch/site/docs/syntax_report.html KALM starts with parsing the input sentence using ACE Parser222https://github.com/Attempto/APE and generates the DRS structure  which captures the syntactic information of the sentences.
Frame Parser. KALM performs frame-based parsing based on the DRS and produces a set of frames that represent the semantic relations a sentence implies. A frame  represents a semantic relation of a set of entities where each plays a particular role in the frame relation. We have designed a frame ontology, called FrameOnt, which is based on the frames in FrameNet  and encoded as a Prolog fact. For instance, the Commerce_Buy frame is shown below:
fp(Commerce_Buy,[ role(Buyer,[bn:00014332n],), role(Seller,[bn:00053479n],), role(Goods,[bn:00006126n,bn:00021045n],), role(Recipient,[bn:00066495n],), role(Money,[bn:00017803n],[currency])]).
In each role-term, the first argument is the name of the role and the second is a list of role meanings represented via BabelNet synset IDs . The third argument of a role-term is a list of constraints on that role. For instance, the sentence Mary buys a car implies the Commerce_Buy frame where Mary is the Buyer and car is the Goods. To extract a frame instance from a given CNL sentence, KALM uses logical valence patterns (lvps) which are learned via structural learning. An example of the lvp is shown below:
lvp(buy,v,Commerce_Buy, [ pattern(Buyer,verb->subject,required), pattern(Goods,verb->object,required), pattern(Recipient,verb->pp(for)->dep,optnl), pattern(Money,verb->pp(for)->dep,optnl), pattern(Seller,verb->pp(from)->dep,optnl)]).
The first three arguments of an lvp-fact identify the lexical unit, its part of speech, and the frame. The fourth argument is a set of pattern-terms, each having three parts: the name of a role, a grammatical pattern, and the required/optional flag. The grammatical pattern determines the grammatical context in which the lexical unit, a role, and a role-filler word can appear in that frame. Each grammatical pattern is captured by a parsing rule (a Prolog rule) that can be used to extract appropriate role-filler words based on the APE parses.
Role-filler Disambiguation. Based on the extracted frame instance, the role-filler disambiguation module disambiguates the meaning of each role-filler word for the corresponding frame role a BabelNet Synset ID. A complex algorithm  was proposed to measure the semantic similarity between a candidate BabelNet synset that contains the role-filler word and the frame-role synset. The algorithm also has optimizations that improve the efficiency of the algorithm e.g., priority-based search, caching, and so on. In addition to disambiguating the meaning of the role-fillers, this module is also used to prune the extracted frame instances where the role-filler word and the frame role are semantically incompatible.
Constructing ULR. The extracted frame instances are translated into the corresponding logical representations, unique logical representation (ULR). Examples can be found in reference .
4 KALM-QA for Question Answering
Based on KALM, KALM-QA  is developed for question answering. KALM-QA shares the same components with KALM for syntactic parsing, frame-based parsing and role-filler disambiguation. Different from KALM, KALM-QA translates the questions to unique logical representation for queries (ULRQ), which are used to query the authored knowledge base.
This section provides a summary of the evaluation of KALM and KALM-QA, where KALM is evaluated for knowledge authoring and KALM-QA is evaluated for question answering. We have created a total of 50 logical frames, mostly derived from FrameNet but also some that FrameNet is missing (like Restaurant, Human_Gender) for representing the meaning of English sentences. Based on the 50 frames, we have manually constructed 250 sentences that are adapted from FrameNet exemplar sentences and evaluate these sentences on KALM, SEMAFOR, SLING, and Stanford KBP system. KALM achieves an accuracy of 95.6%—much higher than the other systems.
For KALM-QA, we evaluate it on two datasets. The first dataset is manually constructed general questions based on the 50 logical frames. KALM-QA achieves an accuracy of 95% for parsing the queries. The second dataset we use is MetaQA dataset , which contains contains almost 29,000 test questions and over 260,000 training questions. KALM-QA achieves 100% accuracy—much higher than the state-of-the-art machine learning approach . Details of the evaluations can be found in  and .
6 Future Work Beyond The Thesis
This section discusses the future work beyond the thesis: (1) enhancing KALM to author rules, and (2) supporting time reasoning.
Authoring Rules from CNL. There are two research problems with rules. The first problem is the standardization of rules parses that express the same information but via different syntactic forms or using different expressions. Suppose the knowledge base contains sentences like: (1) if a person buys a car then the person owns the car, (2) every person who is a purchaser of a car is an owner of the car, (3) if a car is bought by a person then the person possesses the car. All the above sentences represent rules and express exactly the same meaning. However, KALM’s current syntactic parser will represent them in different DRSs and therefore not being able to map them into the same logical form. The second problem involves the recognition and representation of different types of rules in logic. For instance, defeasible rules are very common in text. However, this type of rules cannot be handled by first order logic. We believe defeasible logic  is a good fit.
Time Reasoning. Time-related information is a crucial part of human knowledge, but semantic parsing that takes the time into account is rather hard. However, we can develop a CNL that would incorporate enough time related idioms to be useful in a number of domains of discourse (e.g., tax law). Time can then be added to DRSs and incorporated into our frame based approach down to the very level of the logical facts into which sentences will be translated. This time information can be represented either via special time-aware relations among events (e.g., before, after, causality, triggering) or using a reserved argument to represent time in each fluent.
This thesis proposal provides an overview of KALM, a system for knowledge authoring. In addition, it introduces KALM-QA, the question answering part of KALM. Experimental results show that both KALM and KALM-QA achieve superior accuracy as compared to the state-of-the-art systems.
-  Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: 53rd Annual Meeting of the Association for Computational Linguistics. pp. 344–354. ACL, Beijing, China (2015)
-  Das, D., Chen, D., Martins, A.F.T., Schneider, N., Smith, N.A.: Frame-semantic parsing. Comp, Linguistics 40(1), 9–56 (2014)
-  Fillmore, C.J., Baker, C.F.: Frame semantics for text understanding. In: WordNet and Other Lexical Resources Workshop. NAACL, NAACL, Pittsburgh (June 2001)
-  Fuchs, N.E., Kaljurand, K., Kuhn, T.: Attempto controlled english for knowledge representation. In: Reasoning Web. pp. 104–124. Springer, Venice, Italy (2008)
-  Fuchs, N.E., Kaljurand, K., Kuhn, T.: Discourse Representation Structures for ACE 6.6. Tech. Rep. 2010.0010, Department of Informatics, University of Zurich, Switzerland (2010)
-  Gao, T., Fodor, P., Kifer, M.: High accuracy question answering via hybrid controlled natural language. In: 2018 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2018, Santiago, Chile, December 3-6, 2018. pp. 17–24. IEEE, Santiago, Chile (2018)
-  Gao, T., Fodor, P., Kifer, M.: Knowledge authoring for rule-based reasoning. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) On the Move to Meaningful Internet Systems. OTM 2018 Conferences - Confederated International Conferences: CoopIS, C&TC, and ODBASE 2018, Valletta, Malta, October 22-26, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11230, pp. 461–480. Springer, Valletta, Malta (2018)
-  Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., Schneider, M.T.: Potassco: The potsdam answer set solving collection. AI Commun. 24(2), 107–124 (2011)
-  Johnson, C.R., Fillmore, C.J., Petruck, M.R., Baker, C.F., Ellsworth, M.J., Ruppenhofer, J., Wood, E.J.: FrameNet: Theory and Practice (2002)
-  Kamp, H., Reyle, U.: From discourse to logic: Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory, vol. 42. Springer Science & Business Media (2013)
-  Kuhn, T.: A survey and classification of controlled natural languages. Comp. Linguistics 40(1), 121–170 (2014)
-  López, V., Fernández, M., Motta, E., Stieler, N.: Poweraqua: Supporting users in querying and exploring the semantic web. Semantic Web 3(3), 249–265 (2012)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Assoc. for Computational Linguistics, ACL, System Demonstrations. pp. 55–60. Baltimore, MD, USA (2014)
-  Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1400–1409. The Association for Computational Linguistics, Austin, TX (2016)
-  Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)
-  Ringgaard, M., Gupta, R., Pereira, F.C.N.: SLING: A framework for frame semantic parsing. CoRR 1710.07032, 1–9 (2017)
-  Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB 9(12), 1209–1220 (2016)
-  Schwitter, R.: Controlled natural languages for knowledge representation. In: COLING 2010, 23rd Intl. Conf. on Computational Linguistics, Posters Volume, 23-27 August 2010. pp. 1113–1121. ACL, Beijing, China (2010)
Wan, H., Grosof, B.N., Kifer, M., Fodor, P., Liang, S.: Logic programming with defaults and argumentation theories. In: Hill, P.M., Warren, D.S. (eds.) Logic Programming, 25th Intl. Conf., ICLP 2009, Pasadena, CA, July 14-17, 2009. Lecture Notes in Computer Science, vol. 5649, pp. 432–448. Springer (2009)
Zhang, Y., Dai, H., Kozareva, Z., Smola, A.J., Song, L.: Variational reasoning for question answering with knowledge graph. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18). pp. 6069–6076. AAAI Press, New Orleanz, LA (2018)