# Introducing MathQA – A Math-Aware Question Answering System

We present an open source math-aware Question Answering System based on Ask Platypus. Our system returns as a single mathematical formula for a natural language question in English or Hindi. This formulae originate from the knowledge-base Wikidata. We translate these formulae to computable data by integrating the calculation engine sympy into our system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. In a user study, our system outperformed a commercial computational mathematical knowledge engine by 13 heavily depends on the size and quality of the formula data available in Wikidata. Since only a few items in Wikidata contained formulae when we started the project, we facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the article, 80

• 29 publications
• 7 publications
• 1 publication
• 1 publication
• 17 publications
• 53 publications
12/02/2014

### Watsonsim: Overview of a Question Answering Engine

The objective of the project is to design and run a system similar to Wa...
04/28/2021

### Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data

The problem of natural language processing over structured data has beco...
11/08/2021

### Ontology-based question answering over corporate structured data

Ontology-based approach to the Natural Language Understanding (NLU) proc...
04/11/2021

### Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation

Mathematical information retrieval (MathIR) applications such as semanti...
09/04/2002

### Question answering: from partitions to Prolog

We implement Groenendijk and Stokhof's partition semantics of questions ...
07/01/2019

### Katecheo: A Portable and Modular System for Multi-Topic Question Answering

We introduce a modular system that can be deployed on any Kubernetes clu...
08/21/2017

### Economic Design of Memory-Type Control Charts: The Fallacy of the Formula Proposed by Lorenzen and Vance (1986)

The memory-type control charts, such as EWMA and CUSUM, are powerful too...

## 1. Introduction

##### Vision

The mathematical QA system is a first motivating application that exploits the mathematical knowledge seeded into Wikidata. It is a first step towards our long-term goal of building a collaborative, semi-formal, language independent math(s) encyclopedia hosted by Wikimedia at math.wikipedia.org (corneli2017math). Using the popular Wikipedia framework as frontend will help popularize the project and motivate many experts from the mathematical sciences to contribute. We envisage a future centralized, machine-readable repository for mathematical world-knowledge that can be utilized to enable cross-article queries, e.g., to automate proofs of mathematical theorems. A crucial foundation for a path towards this long-term goal is having a large amount of mathematical data in Wikidata. This paper is a starting point for the development of effective methods to automatically seed Wikidata with mathematical formulae from Wikipedia or STEM documents.

##### Research Objectives.

Motivated by the lack of mathematical knowledge in Wikidata, the following research objective was defined:

Identify and extract defining formulae from all the available mathematical articles on Wikipedia to seed them into the Wikidata knowledge-base.

To achieve this objective, the following tasks were performed: 1.) Identification of mathematical articles from the Wikipedia data dump. 2.) Manual analysis to determine the defining formula of an individual article. 3.) Seeding of the retrieved formulae into Wikidata using the Primary sources tool (PrimarySources). 5.) Evaluation of the overall correctness and accuracy of the data migration by precision, recall, and f-measure.

Subsequently, we capitalized on the formulae seeded into Wikidata to

Build a math-aware QA system, processing a mathematical natural language question to retrieve a formula from Wikidata and allow a calculation based on input values for the occurring variables provided by the user.

We performed the following subtasks: 1.) Development of a Question Parsing Module that determines a triple representation of the user’s input. 2.) Development of a Formula Retrieval Module to query Wikidata using pywikibot (Pywikibot). 3.) Development of a Calculation Module that performs a calculation based on the retrieved formula for the question and input values for the variables provided by the user. 4.) Evaluation of the overall performance and comparison to a commercial computational mathematical knowledge-engine. 5.) Development of regular expressions to maximize the number of answerable questions provided by the user in the Hindi language.

##### Section Outline.

This paper is organized as follows: Section Background contains details about the Wikimedia sister projects Wikipedia and Wikidata and the concept of QA systems. Subsection Implementation describes our approach of transferring formulae from Wikipedia to Wikidata and the structure of the QA system which uses the seed. In subsection Evaluation we describe the construction of a random sample to assess the quality of the data transfer by precision, recall and f-measure. Subsequently, we evaluate the performance of the QA system and discuss its limitations. Finally, we conclude with a summary and suggested improvements for future work.

## 2. Background

### 2.1. Wikipedia and Wikidata

Started in 2001, mainly as a text-based resource, Wikipedia is the world’s largest online encyclopedia which allows its users to edit articles and add new information into it (wiki:Wikipedia). Wikipedia has collected an rapidly increasing amount of information, including numbers, coordinates, dates and other types of relationships among different domains of knowledge. Denny Vrandecic, ontologist at Google, claims that It has become a resource of enormous value, with potential applications across all areas of science, technology and culture (DBLP:journals/cacm/VrandecicK14).

Wikipedia is open and welcomes everyone who wants to make a positive contribution. Ward Cunningham, the inventor of Wiki, describes Wikipedia as The simplest online database that could possibly work (leuf2001wiki).

The following are some characteristics of Wikipedia which enable it to manage its data on a global scale:

• Open and instantaneous editing: Wikipedia allows its users to extend and edit the available information even without creating an account. All changes are instantaneously released online.

• Record of editing history: Wikipedia keeps a record of all the changes made to a page. The page history can be viewed by everyone. Each time a page is edited, the new version is released, and the old version is saved in the page history.

• Multilingual: Wikipedia exists in many languages. Every article on Wikipedia consists of a list of languages it is available in.

• Content standard: the information contributed to Wikipedia must be encyclopedic, neutral and verifiable.

• Community control: Wikipedia is always supported by a team of dedicated volunteers who take the responsibility of developing content, policies, and practices.

• Continuous evolution: Wikipedia is always in a state of continuous growth. Information is continuously being added and updated and new features are added into Wikipedia to make it more useful.

• Totally free: all Wikipedia content is free to use, anyone is free to contribute, and the content is released under a free license which means anyone may reuse it elsewhere. Wikipedia is a non-commercial project, and it has no advertisements.

Although Wikipedia comprises a huge amount of data, it does not provide direct access to specific facts, as it is still unstructured, which is unfortunate for anyone who wants to retrieve information systematically. To remedy this shortcoming, the Wikimedia Foundation launched Wikidata in October 2012. Wikidata is a free and open knowledge-base that can be read and edited by humans and machines. It acts as a common source of data which can be used by Wikimedia projects such as Wikipedia, Wikivoyage, Wikisource, and others. As Wikidata is one of the most recent projects of the Wikimedia Foundation, it is still in an early phase of development. Therefore, it encourages its users and organizations to donate data so that it can grow and distribute open source multilingual and educational content free of charge. Wikidata’s data model basically consists of item and statement. Each item represents an entity, such as a person’s name, a scientific term, a mathematical theorem, etc. and is identified by a unique number prefixed by the letter Q. For instance, the item number (QID) for the topic Computer science is Q21198. Additionally, items may have labels, description, and aliases in multiple languages. Information is added to items by creating statements and stored in the form of key-value pairs with each statement consisting of a property as a key and a value associated with that property.

Figure 2 illustrates the data model used in Wikidata.

In this example, London is the main item with a statement which consists of one claim and one reference for the claim. The claim itself contains a main property-value pair which represents the main fact, which is population and the corresponding value in this case. Some optional qualifiers can also be added into the claim to append additional information related to the main property. In this example, the time at which the population was recorded and the determination method are the qualifiers with their corresponding values June 2012 and estimation, respectively. Wikidata does not claim to provide users with true facts because many facts are disputed or uncertain. Therefore, Wikidata allows such conflicting facts to coexist so that different opinions can be expressed properly. The content of Wikidata is in public domain under Creative Commons CC0 license which allows every user to use, extend and edit the stored information. The Wikidata requirements (WDRequirements) state that the data uploaded to Wikidata must not overwhelm the community. Research in understanding how systems for collaborative knowledge creation are impacted by events like data migration is still in its early stages (DBLP:journals/ce/MoskaliukKC12), in particular for structured knowledge (DBLP:journals/bioinformatics/HorridgeTNVNM14). Most of the research is focused on Wikipedia (flock2015towards), which is understandable considering the availability of its data sets, in particular the whole edit history (DBLP:journals/expert/SchindlerV11) and the availability of tools for working with Wikipedia (DBLP:journals/ai/MilneW13).

## 3. Implementation

This chapter describes the implementation details of the formula seeding and QA system. Figure 3 shows the workflow of the data transfer from Wikipedia to Wikidata. The first task was to download the Wikipedia data dump and identify the articles that are related to mathematics. Subsequently, we needed to determine and extract the defining formulae from each article. The extraction process was divided into two categories. We distinguished articles related to geometry from the rest of general mathematics. After the extraction, the formulae were added into Wikidata using its Primary sources tool (PrimarySources) that allows users to approve or reject a claim and its reference.

### 3.1. Seeding Math into Wikidata

As mentioned before, Wikipedia consists of millions of documents in various languages. Since its content is barely machine-interpretable, we needed to find a way to distinguish mathematical articles from the rest. To achieve this, the first step was to reduce the number of articles. We only considered English Wikipedia as the primary dataset to profit from two advantages. Firstly, it reduced the total number of articles from over 40 to around 5,5 million. Secondly, English Wikipedia contains the highest number of articles compared to other languages. So, we assume that a mathematical article available in any other language is also available in English Wikipedia, however, vice-versa it might not be true in numerous cases. To recover the mathematical formulae, we needed to distinguish formulae from their surrounding text by identifying the [itex] tags. Performing the math tag search, we found 32.682 pages containing math formulae After the discovery of mathematical articles in the English Wikipedia dataset, we were confronted with a more significant challenge: How to determine the defining formula within a mathematical article. A Wikipedia page contains all the information related to a particular topic. Mathematical pages often contain derivations along with an equation. If we extract all the equations included in math tags, we get a lot of unrelated and irrelevant formulae. To solve this problem, we found a simple yet effective solution; we came up with after manually analyzing a set of random mathematical articles. We observed that in most of the mathematical Wikipedia articles, the first formula was the most important one related to that topic. As the approach gave false results for the Wikipedia articles related to geometry, we divided the math articles into the categories general mathematics and geometry. We then used different approaches to math extraction for both categories.

#### 3.1.1. Geometry Questions.

The main reason behind separating geometry related articles from the rest is that the structure of these articles is different from those of general mathematical articles. A geometric object has various properties such as volume, area, perimeter, etc. Thus, a single topic can have more than one property and multiple defining formulae. The Wikipedia articles of such geometric shapes (cube, circle, ellipse, etc.) may contain the formula of these properties within different subsections of the article. To solve this problem, we first identified all the pages related to geometry by using a list of 16 Wikipedia geometry categories: Elementary geometry, Theorems in geometry, Polygons, Elementary shapes, Quadrilateral, Area, Volume, Conic sections, Geometric centers, Circles, Curves, Surfaces, Cubes, Platonic solids, Polytopes and Euclidean plane geometry. We discovered 292 pages belonging to these categories, each containing multiple relevant formulae. We subsequently retrieved the first formula from each of these subsections. However, not all the subsections of the page provided defining formulae related to the topic. For further refinement, we used a simple keywords based filtration of the following property names: Area, Volume, Circumference, Perimeter, Circumradius, Inradius and Median, we considered most important for describing 2- and 3-dimensional shapes. These properties have a unique defining formula that can easily be checked for its correctness in the evaluation. We are strictly limited to adding only one defining formula for each property into Wikidata. As a result, we got 65 formulae for the properties mentioned above belonging to 49 Wikipedia articles related to geometry.

#### 3.1.2. General Formulae.

As stated previously, we discovered 32.682 pages in English Wikipedia which are related to mathematics. Out of these, 292 were filtered out as a separate category of geometry. For the remaining pages, we chose a different formula retrieval approach. We extracted the first formula from each Wikipedia article, as in most cases this, in fact, yielded the defining formula instead of, e.g., parts of a derivation or proof. After the discovery and extraction of math formulae from Wikipedia, we handed the list to Wikimedia who seeded the formulae to the Primary sources (PrimarySources) where they can now be approved or rejected by Wikidata users.

### 3.2. Building the Math QA system

Having the formulae seeded into Wikidata, we could build our Math-aware QA system. It consists of three modules written in Python that will be described in the following.

The main aim of the Question Parsing Module is to transform questions into a tree of triples - producing a simplified and well-structured tree, without losing relevant information about the original question that was provided by the user. This is done by analyzing the grammatical structure of the question, mapping it into a normal form. For our module, we used the simplified dependency tree representation output of the Stanford Parser (StanfordParser).

Receiving the triple representation from the Question Parsing Module, the Formula Retrieval Module is responsible for extracting formulae from Wikidata using Pywikibot (Pywikibot), a python library, and collection of tools that automate the work on Mediawiki sites. Typically, the triple representation (subject, predicate, object) is incomplete, with either a missing predicate or object. Once the Wikidata item for the subject is available, the module tries to retrieve the value of the predicate. There are two cases for the values of the predicates.

In the first case, if the value of the predicate is formula, Pywikibot looks for the value of the Wikidata property named defining formula (P2534) and, if available, replaces the triple object with its value. For instance, What is the formula for Pythagorean theorem? has the triple representation (Pythagorean theorem, formula, ?). The module maps the subject of the triple to the Wikidata item and returns the value of the defining formula property as object.

In the second case, if the value of the predicate is in our list of geometry properties (volume, area, radius etc.), Pywikibot looks for the value of the predicate in the has quality property (P1552) of the subject and, if available, replaces the triple object with the defining formula (P2534) value. For instance, What is the volume of a sphere? has the triple representation (sphere, volume, ?). The module maps the subject to the Wikidata item, the predicate to its has quality property and returns the value of the defining formula property as object.

The Calculation Module module is responsible for calculating the result of the formula, with values for the occurring variables provided by the user. If the names and values of the identifiers are available on Wikidata as has part (P527) property, they are automatically retrieved and displayed, so that the user can understand their meaning before entering values. Once the formula is received from the Formula Retrieval Module, it is parsed from LaTeX to Sympy form using the process_sympy parser (latex2sympy) to subsequently have its identifiers extracted for the calculation that is done using the python library Sympy (SymPy). In addition to the definition and geometry questions, our system also allows a formula as a question input to provide a calculation based on values for the identifiers.

Figure 4 shows the user web interface (GUI) for English and Hindi questions, as well as a direct formula question.

## 4. Evaluation

Finally, the individual success of the Wikidata seeding and our QA system was evaluated by the standard Information Retrieval measures precision, recall and the combined f-measure.

### 4.1. Evaluation of the Wikidata Seeding

The main goal of the evaluation was to determine how effectively and accurately we were able to retrieve the mathematical formulae from Wikipedia. The evaluation was carried out separately for general mathematics and geometry.

#### 4.1.1. General Mathematics.

We evaluated the success of the data transfer by precision and recall, while classifying a result as

relevant if and only if the retrieved result was estimated to be one and the only general mathematical representation of the Wikipedia article it was extracted from and non-relevant if it was not the defining formula or incomplete. As the formulae were extracted from Wikipedia which is not machine-interpretable, we needed to check manually whether a formula is relevant or non-relevant. Since it would have been very exhaustive and time-consuming to check all the formulae extracted from 32.682 Wikipedia pages, we chose a random sample of an arbitrary size of 100, which can be evaluated in a moderate amount of time. We manually examined the Wikipedia articles to find out whether there were defining formulae available for the given mathematical concepts and compared them to the alleged defining formulae of the Wikidata items. To calculate precision and recall, we classified a result as relevant if there was a defining formula of the mathematical concept in its Wikipedia article and retrieved if it was the first formula that was subsequently seeded to Primary Sources. Table 1 shows a snapshot of the evaluation of our random sample comprising 100 formulae with their contingencies, whereas Table 2 contains the evaluation results for the general formula seeding. The complete list is available at https://github.com/ag-gipp/MathQa.

We calculated the precision of the data transfer as

P = 71/(71+17) = 0.8,

concluding that 80 % of the retrieved results were relevant.

Furthermore, we calculated the recall as

R = 71/(71+10) = 0.88,

concluding that 88 % of the total relevant documents were successfully retrieved.

Finally, the combined (equally weighted) f-measure is

 F1=2⋅0.8⋅0.880.8+0.88=0.84.

From this result, we can conclude that the seeding of general mathematical formulae from English Wikipedia articles to Wikidata yielded an overall accuracy of 84 %.

#### 4.1.2. Geometry Questions.

Eventually, we evaluated the accuracy of the formulae extracted from geometry related articles. The evaluation was carried out similar to the evaluation of general mathematics. However, due to the much smaller number of items, we did not choose a random sample but evaluated all the retrieved results, i.e., 65 formulae belonging to 49 Wikipedia articles.

Table 3 contains some of the extracted formulae with their Wikipedia title and the corresponding contingency, whereas Table 4 shows our evaulation results for the geometry formula seeding. The complete list is available at
https://github.com/ag-gipp/MathQa.

Based on these values, we calculated the precision of the data transfer as

P = 52/(52+1) = 0.98,

concluding that 98 % of the retrieved results were relevant.

Furthermore, we calculated the recall as

R = 52/(52+12) = 0.81,

concluding that 81 % of the total relevant documents were successfully retrieved.

Finally, the combined (equally weighted) f-measure is

 F1=2⋅0.98⋅0.810.98+0.81=0.87.

From this result, we can conclude that the seeding of geometry formulae from English Wikipedia articles to Wikidata yielded an overall accuracy of 87 %.

##### Issues.

Evaluating the seeding sample, we could observe some illustrative issue cases (see Table 5) which will be briefly discussed in the following. There were some Wikipedia articles that did not contain a mathematical concept in the strong sense, but instead an algorithm (# 4, 7, 48), measurement device (# 35, 38), mathematical method or field (# 25, 30, 33), a set (# 49) or even a scientist (# 9) or historical topic (# 19). Some retrieved formulae (# 6, 23) were only a part of the definition or statement which also contained natural language terms. We conclude that the [itex] tag is not a sufficient marker to find mathematical concepts within the bulk of Wikipedia articles. For future work, better filters will have to be developed that discard the articles mentioned above and possibly also other types we are currently not aware of.

### 4.2. Evaluation of the Math QA system

Our math-aware QA system can answer mathematical questions in English and the Hindi language or use a direct formula input to deliver a calculation based on input values for the occurring identifiers.

#### 4.2.1. Evaluation of the Formula Retrieval Module.

We evaluated our system on the basis of all formulae that were seeded correctly (true positive), determining whether a formula was retrieved (true or false) from Wikidata by the Formula Retrieval Module. The evaluation lists are available at
https://github.com/ag-gipp/MathQa. The retrieval of general mathematical formulae yielded 34 true and 35 false results. The accuracy of the system is

Accuracy = Number of true results / Total size of the sample

= 34/(34+35) = 0.49

From the results, we can conclude that the ability to successfully retrieve a general mathematical formula possessed an accuracy of 49 %. The retrieval of geometry formulae yielded an accuracy of 31 %. Our system can successfully answer questions provided by the user in the Hindi Language. So far, there is no tool available that can answer mathematical questions written in the Hindi language. So, we could not compare our results to any other tool.

##### Issues.

Evaluating the MathQA sample, we could observe some illustrative issue cases (see Table 6) which will be briefly stated in the following. Wikidata users renamed the has quality property area to area of plane shape, which impeded our system from retrieving the respective formula. In some cases, there were too many synonymous Wikidata items available, so that the system could not filter out the requested mathematical concept. Furthermore, when processing the request Volume of a prism, the system found prism - transparent optical element (Q165896) instead of prism - geometric shape (Q180544). Finally, if the name of the requested item contained an apostrophe ’ or hyphen - it could not be processed properly.

#### 4.2.2. Comparison to a commercial computational mathematical knowledge-engine

Currently, there is no known tool available which delivers a direct formula answer and performs a calculation using input values for the identifiers. So we could not fully compare our results to other systems. However, we studied the ability of our system to successfully answer questions compared to a selected computational knowledge engine that performs arithmetic operations and answers questions from different fields of general knowledge.

Quantitatively, we used 30 mathematical questions from the NTCIR-12 Task (DBLP:conf/ntcir/SchubotzMLG16) for an evaluation of the performance of the two systems. After approving or seeding 5 missing formulae manually, our system was able to outperform the commercial engine, yielding more suitable answers (denoted by in Table 7 column Performance (Perf.)) in 10/30 of the cases. The reference engine performed better than our system in only 6/30 (denoted by ), and in 14/30 questions both systems provided answers that were estimated to be equally suitable (denoted by ). All in all, our system was able to outperform the commercial engine on the NTCIR-12 sample. Nevertheless, it should be mentioned that our reference engine is continuously striving to improve on mathematical topics. For example, the question What is the formula for Logical equivalence? yielded Additional functionality for this topic is under development… and we suspect that there are more of these cases. Qualitatively, we could observe that our system is more powerful in comparison to the reference engine when answering definition questions. As an example, the question What is the formula for gas? is answered by , whereas the reference engine only returns a list of gaseous compounds. Furthermore, our system can successfully answer geometry questions, whereas the reference engine provides all formulae with unit edge length and is not giving any option to enter a customized edge length. For example, the question What is the surface area of triangular cupola? is answered as , whereas the reference engine only displays . However, our QA system can answer only mathematical questions, whereas the reference engine can answer questions from many other fields like people and history, health and medicine, materials, dates and times, engineering, earth science, etc.

##### Limitations.

Our system decisively depends on the knowledge stored in Wikidata. If the item or the formula we are looking for is not available in Wikidata, we are unable to answer a given question. Furthermore, we are using the question parsing module developed by Platypus [10] which is limited to the use of nouns in singular form leading to an inability to answer a question containing a plural noun. For instance, the question What is the formula for Maxwell’s equations? is parsed as Maxwell’s equation such that the item cannot be retrieved. Besides, our parser does not support specific LaTeX tags (\displaystyle, \frac, \left, \right, \bigg, \mathrm, etc.) or punctuation symbols (, ; ! etc.) as well as integration, summation, scalar products or more complex formulae. For the Hindi language questions, we are limited to the available Wikidata items that include Hindi labels. So, we are unable to process all the Wikidata items available in English in the Hindi language also.

## 5. Conclusion

The overall goal of this research project was to extract mathematical knowledge in the form of formulae from English Wikipedia and seed it into Wikidata. This served as a necessary preparation for building a QA system that can answer mathematical questions in English and Hindi language. Additionally, the user can perform arithmetic calculations using the retrieved formula, after providing input values for the detected identifiers. We have been able to provide the Wikidata community with more than 17 thousand new Wikidata statements containing formulae. Our seeding achieved a precision of 80 %, recall of 88 % and a combined f-measure of 84 % for general mathematical formulae. For the geometry formulae, the precision was 98 %, the recall 81 % and the f-measure yielded 87 %. The Formula Retrieval Module of our QA system possessed an accuracy of 49 % and 31 % for general and geometry formulae respectively. As far as we are concerned, our QA system is the only available to answer mathematical questions in the Hindi language.

##### Future Work.

Wikipedia is the world’s largest online encyclopedia and consists of a massive amount of information in numerous different languages. There are many possibilities when it comes to the task of migrating data from Wikipedia to Wikidata. This research project was only dealing with mathematical knowledge in Wikipedia. However, similar techniques can also be employed to migrate knowledge from other fields such as geography, computer science, politics and many more between these Wiki sister projects. We propose the following possibilities for future work: 1.) Use of another database or knowledge-base than Wikidata or adding a module that can use another database if the requested item is not available in Wikidata. 2.) Seeding Wikidata with more mathematical formulae to enable answering more questions. 3.) Seeding Wikidata with more Hindi labels for Wikidata items to improve the performance of our Hindi language module. 4.) Develop a new LaTeX parser that can parse any latex formula without restriction. 5.) Improve the Formula Retrieval Module allowing for plots and more information regarding the formula. 6.) Improve the Formula Calculation Module such that it delivers the calculated result including units of the formula and the identifiers.

###### Acknowledgements.
We would like to thank Akiko Aizawa for her advice and for hosting us as visiting researchers in her lab at the National Institute of Informatics (NII) in Tokyo. Furthermore, we thank Wikimedia Foundation and Wikimedia Deutschland providing cloud computing facilities and us for a research visit. Besides many Wikimedians, Lydia Pintscher and Jonas Kress were a great help in getting started with Wikidata. This work was supported by the FITWeltweit program of the German Academic Exchange Service (DAAD) as well as the Sponsor German Research Foundation (DFG Rl, grant Grant #3).

[keyword=primary]