A Classification of Artificial Intelligence Systems for Mathematics Education

by   Steven Van Vaerenbergh, et al.
Universidad de Cantabria

This chapter provides an overview of the different Artificial Intelligence (AI) systems that are being used in contemporary digital tools for Mathematics Education (ME). It is aimed at researchers in AI and Machine Learning (ML), for whom we shed some light on the specific technologies that are being used in educational applications; and at researchers in ME, for whom we clarify: i) what the possibilities of the current AI technologies are, ii) what is still out of reach and iii) what is to be expected in the near future. We start our analysis by establishing a high-level taxonomy of AI tools that are found as components in digital ME applications. Then, we describe in detail how these AI tools, and in particular ML, are being used in two key applications, specifically AI-based calculators and intelligent tutoring systems. We finish the chapter with a discussion about student modeling systems and their relationship to artificial general intelligence.



There are no comments yet.


page 1

page 2

page 3

page 4


Confident AI

In this paper, we propose "Confident AI" as a means to designing Artific...

Artificial Intelligence Technologies in Education: Benefits, Challenges and Strategies of Implementation

Since the education sector is associated with highly dynamic business en...

Artificial Intelligence in the Creative Industries: A Review

This paper reviews the current state of the art in Artificial Intelligen...

Kafka-ML: connecting the data stream with ML/AI frameworks

Machine Learning (ML) and Artificial Intelligence (AI) have a dependency...

A New Framework for Machine Intelligence: Concepts and Prototype

Machine learning (ML) and artificial intelligence (AI) have become hot t...

Neural Language Models are Effective Plagiarists

As artificial intelligence (AI) technologies become increasingly powerfu...

What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning

Recent years have witnessed the rising popularity of Natural Language Pr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Artificial intelligence (AI) has a long history, starting from observations by the early philosophers that a reasoning mind works in some ways like a machine. For AI to become a formal science, however, several advances in the mathematical formalization of fields such as logic, computation and probability theory were required

Russell and Norvig (2009). Interestingly, the relationship between mathematics and AI is not unilateral, as AI, in turn, serves the field of mathematics in several ways. In particular, AI powers many computer-based tools that are used to enhance the learning and teaching of mathematics, several of which are the topic of discussion of this chapter.

The close relationship between AI and Mathematics Education (ME) dates back at least to the 1970s, and it has been discussed thoroughly in the scientific literature Schoenfeld (1985); Wenger (1987); Balacheff (1994). One could list several parallels between both fields, for instance that they are both concerned with constructing sound reasoning based on the use of logic. Indeed, in ME, developing mathematical reasoning skills is an important educational goal, while many AI systems are designed to perform reasoning tasks in an automated manner. Also, modern AI techniques involve the concepts of teaching and learning, as some systems are required to learn models and concepts, either in an autonomous manner or supervised through some form of instruction111We will not enter into details regarding the similarities and differences in learning for AI and ME, as that discussion is slightly outside of the scope of this chapter.. Nevertheless, while these parallels exist, humans and machines clearly carry out these tasks in completely different ways. After all, as noted by Schoenfeld, “AI’s perspective is severely distorted by the engineering perspective, and extrapolations to human performance can be dangerous” (Schoenfeld, 1985, p. 184).

Before continuing, we will take a closer look at what exactly AI is, including its subfield, machine learning.

1.1 Artificial intelligence and machine learning

The literature contains many different definitions of AI, though they are all much related. Generally speaking, AI aims to create machines capable of solving problems that appear hard to the eyes of a human observer. Such problems may be related solely to thought processes and reasoning capabilities, or they may refer to exhibiting a certain behavior that strikes as intelligent Russell and Norvig (2009).

Historically, several simple AI programs have been designed using a set of predefined rules, which can often be represented internally as a decision tree model. For instance, a program for natural language understanding may look if certain words are present in a phrase, and combine the results through some set of fixed rules to determine the sentiment of a text. And in early computer vision systems, results were obtained by calculating hand-engineered

features of pixels and their neighborhoods, after which these features were compared to the surrounding features using predefined rules. However, as soon as one tries to build a system with an advanced comprehension of natural language or photographic imagery, a very complex set of such internal rules is required, exceeding largely what can be manually designed by a human expert. To deal with this complexity, an automated design process for the set of internal rules is required, better known as machine learning.

Formally, machine learning (ML) is a subfield of AI that follows a paradigm known as learning from examples, in which a system is given practical examples of a concept or behavior to be learned, after which it develops an internal representation that allows its own output to be consistent with the set of given examples Wenger (1987). The concept of ML is perhaps best summarized by Tom Mitchell, who wrote it is “the field that is concerned with the question of how to construct computer programs that automatically improve with experience” Mitchell (1997). This definition highlights the three properties that any ML system should hold: 1) its learning is automated, as in a computer program that does not require human intervention; 2) the performance can be measured, which allows the system to measure improvement; and 3) learning is based on receiving examples (the experience). In summary, as more examples are processed, a well-designed ML system guarantees that its performance will improve.

The concept of computer programs that learn from examples has been around as long as AI itself, notably as early as Alan Turing’s vision in the 1950s Russell and Norvig (2009)

. Nevertheless, it was not until the middle of the 1990s that a solid foundation for ML was established by Vladimir Vapnik in his seminal work on statistical learning theory

Vapnik (1995)

. At the same time, the techniques of neural networks and support vector machines gained popularity as they were being applied in real-world applications, though only for simple tasks, as compared to today’s standards. The latest chapter in the history of ML started around 2012, when it became clear that neural networks could solve tasks that were much more demanding than previously achieved, by making them larger and feeding them much more examples

Krizhevsky et al. (2012)

. These neural networks consist of multiple layers of neurons, leading to the name

deep learning, where each layer contains thousands or even millions of parameters. They are responsible for many of the AI applications that people interact at the time of writing, notably in image and voice recognition, and natural language understanding.

In this chapter, we aim to analyze the uses of AI, and in particular ML, in ME. The chapter is structured as follows. To fix ideas, in Section 2 we briefly discuss some contemporary AI-based tools that are being used by mathematical learners. Based on these examples, we introduce a high-level taxonomy of AI systems that serve as building blocks for tools in ME, in Section 3. We then show, in Section 4, how the proposed taxonomy allows us to provide an in-depth analysis of the different AI systems currently used in some ME tools. We finish with a discussion on future research required to build complete student modeling systems, in Section 5, and conclusions, in Section 6.

2 A glimpse of the present

The following are two examples of AI-based tools for mathematics education. These examples feature some representative AI techniques that will be at the basis of the taxonomy we propose later, after which we will revisit them for a more in-depth analysis.

2.1 A new breed of calculators

In 2014, a mobile phone application called Photomath222https://photomath.app/ was released that quickly became very popular among schoolchildren (and less so among teachers) Webel and Otten (2015). It allows the user to point a phone camera at any equation in a textbook and instantly obtain the solution, including the detailed steps of reasoning. If the user wishes so, he can even request an alternative sequence of steps for the solution. While the first version of the software would only work on pictures of clean, textbook equations, the technology was later upgraded to recognize handwriting as well. Several other apps have followed suit since, notably Google’s Socratic333https://socratic.org/ and Microsoft Math Solver444https://math.microsoft.com/.

The emergence of these tools, known as “camera calculators”, can be mainly attributed to advances in image recognition technology, and in particular to optical character recognition (OCR) algorithms based on deep learning. Once the OCR algorithm has translated the picture into a mathematical equation, standard equation solvers can be used to obtain a solution. The third and last stage of the application consists in explaining the solution to the user, for instance, by means of a sequence of steps. In the case of the solution of a linear equation, this stage is resolved algorithmically, requiring little AI. Nevertheless, it is often possible to find a shorter or more intuitive solution by thinking strategically (see, for instance, the example in (Webel and Otten, 2015, p. 4)). For an AI to do so, it would require capabilities of mimicking human intuition or exploring creative strategies. We will discuss these capabilities later, in Section 4.2.

Interestingly, these apps have reignited the discussion on the appropriate use of tools in mathematics education, reminiscent of the controversy on the use of pocket calculators that started several decades ago Webel and Otten (2015). Indeed, they could be seen as examples of a new generation of “smarter” calculators, that limit the number of actions and calculations that the user must perform to reach a solution. Such new, smarter calculators are not limited to equation solvers only, as they can be found in other fields as well. For instance, as of version 5, the popular dynamic geometry software (DGS) GeoGebra555https://www.geogebra.org/

includes a set of automated reasoning tools that allow the rigorous mathematical verification and automatic discovery of general propositions about Euclidean geometry figures built by the user

Kovacs et al. (2020). Rather than merely automating calculations, as common digital calculators do, these tools allow to automate the reasoning, to a certain extent.

2.2 Blueprint of a data-driven intelligent tutoring system

Our second example concerns an interactive tool for learning and teaching mathematics. In particular, in the following, we describe a hypothetical interaction between a student and an intelligent tutoring system (ITS).

Hypatia, a student, logs onto the system through her laptop and she starts reading a challenge proposed by the system. This time, the challenge consists in solving an integral equation. Hypatia is not sure how to start, and she spends a few minutes scratching calculations on her notepad. The system, after checking her profile in the database, infers that she needs help, and offers a hint on screen. Hypatia now knows how to proceed and advances a few steps towards the solution. However, some steps later, she makes a mistake in a substitution. The system immediately notices the mistake and identifies it as a common error (a “bug”). Through the visual interface, the system tells Hypatia to check if there were any mistakes in the last step. She reviews her calculations and quickly corrects it. The system encourages her for spotting the error, and she continues to solve the exercise successfully. At this point, the system shows her a summary of the solution and reminds her of the hints she was given. She can then choose to review any of the steps and their explanation, or continue to the next problem. If she chooses to continue, the system will present her a problem that has been designed specifically to advance along her personalized learning path.

Before Hypatia started using this ITS, the system’s database already contained the interactions of many other students. By using data mining techniques, it was able to identify a number of “stereotype” student profiles. The first time Hypatia interacted with the ITS, the system’s AI analyzed her initial actions to build an initial profile for her, based on one of the stereotype profiles. As she now performs different problem solving sessions, the system adds more of her interactions to its database, which allow it to identify behavioral patterns and build a more refined student model for her. This, in turn, allows the system to personalize her learning path and to offer her more relevant feedback when she encounters difficulties.

Certainly, the above-described example is not purely hypothetical, but based on real ITS that are used in practice today. We will return to this example later on.

3 A taxonomy of AI techniques for mathematics education

We now propose a taxonomy of AI techniques that are used in digital tools for ME. The taxonomy consists of four categories that span the entire range of such AI systems. While each of the categories is motivated by some aspect of the previous examples, we include a more comprehensive list of particular cases from the literature for each of them. Furthermore, we shed some light on the current technological capabilities of these AI systems.

3.1 Information extractors

We use the term information extractors to refer to AI technologies that take observations from the real world and translate them into a mathematical representation. A classic example in this category consists in parsing the text of algebraic word problems into equations Koncel-Kedziorski et al. (2015). More advanced information extractors can operate on digitized data from a sensor, such as a camera or a microphone, to which they apply an AI algorithm to extract computer-interpretable mathematical information.

Figure 1: Representation of an information extractor. The globe represents observations from the real world, and the summation sign represents mathematical information.

An example of information extractors that operate on visual data was given in Section 2.1

, where the initial stage of the described camera calculator translates a picture into a mathematical equation. The AI required to perform OCR in these information extractors operates in two steps: First, it employs a convolutional neural network (CNN) to recognize individual objects in an image. In essence, a CNN is a particular type of artificial neural network that is capable of processing spatial information present in neighborhoods of pixels by applying (and learning) digital filters

Krizhevsky et al. (2012)

. Then, the individually recognized objects are transformed into a sequence, which was traditionally performed by techniques such as Hidden Markov Models

Rabiner and Juang (1986)

, but is now implemented as neural-network based techniques such as Long Short-Term Memory networks

Hochreiter and Schmidhuber (1997) and transformers Vaswani et al. (2017).

Visual information extractors are not only used to digitize algebraic equations, but can also be used to extract other types of mathematical information from the real world. For instance, in the MonuMAI project Lamas et al. (2021), extractors based on CNN are used to obtain geometrical information from pictures of monuments. And some camera calculators, such as Socratic, allow to take pictures of word problems, which are transformed to text, interpreted, and converted into a mathematical representation.

Finally, sensor data from a student may be used to extract information for an ITS (see Section 4.2). In particular, these systems may require information about the student’s state of mind during the resolution of a mathematical problem. In this category we encounter AI techniques for facial expression recognition Li and Deng (2020), speech emotion recognition Fayek et al. (2017), and mood sensing through electrodermal activity Kajasilta et al. (2019).

3.2 Reasoning engines

In software engineering, a reasoning engine is a computer program that is capable of inferring logical consequences from a set of axioms found in a knowledge base, by following a set of predefined rules Furht (2008). For the current context of mathematics education, we employ a broader definition of reasoning engines that includes all software systems that are capable of automatically solving a mathematically formulated problem. A very simple such system consists of an equation solver, whose action is limited to transforming the (set of) equations into their canonical form and applying the formula or the algorithm to solve them Arnau et al. (2013). Several types of more sophisticated reasoning engines exists in the mathematical research literature, for instance automated theorem provers (ATP), whose aim is to verify and generate proofs of mathematical theorems Loveland (1978). While proof verification is a simple mechanical process that only requires checking the correctness of each individual step, proof generation is a much harder problem, as it requires searching through a combinatorial explosion of possible steps in the proof sequence.

Figure 2: Representation of a reasoning engine. It receives a mathematical problem as an input, and outputs the corresponding solution.

A novel contribution in the development of reasoning engines lies in the use of ML techniques, which has been fueled by the success of deep learning in pattern matching problems

Krizhevsky et al. (2012)

. These techniques follow the standard ML paradigm that requires a set of training examples: The ML algorithm, typically a deep neural network, learns a model in order to explain as much of the training examples as possible. The learned model is completely data driven, without any hard rules or logic programmed into it.

ML algorithms could improve current ATP techniques by encoding human provers’ intuitions and predicting the best next step in a proof Gauthier and Kaliszyk (2015); Loos et al. (2017); Schon et al. (2019)

. Furthermore, neural networks for natural language processing are being used to train machines to solve word problems and to perform symbolic reasoning, yielding currently some limited but promising results. For instance, Saxton et al.

Saxton et al. (2019) generated a data set of two million example problems from different areas of mathematics and their respective solutions. Several neural network models were trained on these data and, in general, a moderate performance was obtained, depending on the problem type. Deep learning is also being used to solve differential equations Arabshahi et al. (2018); Lample and Charton (2019), perform symbolic reasoning Lee et al. (2020), and solve word problems Wang et al. (2017, 2018). Note that these methods typically operate on text data and they perform the action of the information extractor and the reasoning engine using a single AI. Finally, in the ML community there is a growing interest in automating abstract reasoning. Research in this area currently focuses on solving visual IQ tests, such as variants of Raven’s Progressive Matrices Barrett et al. (2018); Chollet (2019), and causal inference, which deals with explaining cause-effect relations, for instance from a statistical point of view Judea Pearl and Jewell (2019).

3.3 Explainers

While reasoning engines can solve mathematical problems and generate correct proofs, they do not necessarily produce results that can be read by a human. Sometimes this is simply not needed, for instance when an ATP is used in research to verify a theorem that requires a long and complex proof, prone to human errors. But in a different context, for instance that of the mathematical learner, it becomes important to have proofs that are understandable Ganesalingam and Gowers (2017).

In the AI community, interest in explainable methods has recently surged. Part of this interest is due to legal reasons, as some administrations demand that decisions taken by an AI model on personal data be accompanied by a human-understandable explanation Meng-Leong (2019). While some early AI systems generated models that could easily be interpreted, modern AI techniques, especially deep learning systems, involve opaque decision systems. These algorithms operate in enormous parametric spaces with millions of parameters, rendering them effectively black-box methods whose decisions cannot be interpreted. To solve this issue, the research field of explainable AI is concerned with developing AI methods that produce interpretable models and interpretable decisions Adadi and Berrada (2018); Molnar (2019); Arrieta et al. (2020). We will refer to AI methods that produce understandable explanations as explainers.

Figure 3: Representation of a post-hoc explainer. It translates a machine-code solution into a sequence of logical, human-readable steps.

From a technical point of view, there exist two types of explainers. The first type are modules that can be added onto existing, opaque AI systems. They perform what is called post-hoc explainability, and may do so for instance by approximating the complex model with a simpler, interpretable one. Some different post-hoc explainability approaches are illustrated in (Arrieta et al., 2020, Fig. 4). The second type of explainable AI consists of models that are interpretable by design. Under our terminology, these correspond to reasoning engines that are restricted to only producing interpretable solutions. Of these two types, the former is currently more popular in the field of explainable AI, as it does not require replacing the entire reasoning engine, which is usually hard to design and train in the first place.

In the field of ME, explainers have been built principally for solving math equations step by step, for instance in the open source project mathsteps

666https://github.com/google/mathsteps. In ATP, on the other hand, explainability is a fairly new research line. In order to apply a post-hoc explainer onto an ATP, it might be necessary to construct an ATP based purely on logic, though, as Fu et al. (2019) notes, “while logic methods proposed have always been the dream of mankind, their applications are limited due to the massive search space”. One case in point is found in DGS, where geometric automated theorem provers (GATP) are now being integrated Quaresma (2020). State-of-the-art GATP are based on algebraic methods, and their results cannot be translated into human-readable proofs Quaresma (2020); Kovacs and Recio (2020). For this reason, explainability is to be introduced in ATP by designing ATP techniques that are transparent by design Ganesalingam and Gowers (2017); Meng-Leong (2019). In the case of GATP, this approach is currently very limited, as discussed in Font et al. (2018).

3.4 Data-driven modeling

Up till this point, we have described several techniques and scenarios in which substantial amounts of data are generated: the extraction techniques from Section 3.1 distill real-world and sensor observations into numerical information and mathematical representations; and, in section 2.2, Hypatia interacts with an ITS that relies on a database of student information and completed student tasks, which increases each time a student uses the system. In modern AI systems, data mining and machine learning techniques are used to analyze these data and to convert them into insights and practical models. These techniques, which we will refer to as data-driven modeling, cover a broad area and make up the final class of AI in ME.

Figure 4: Representation of a data-driven model. After receiving data from different sources, it infers a model for the data that can be used to produce predictions.

Data-driven modeling is employed for several reasons in ME. First, it may allow building models that are used to improve specific aspects of the learning process of individual students. These include AI models to predict a student’s performance Cortez and Silva (2008); Smith et al. (2015); Asif et al. (2017), to determine at what specific problem step a student learns a concept Baker et al. (2010), or even to detect that a student tries to game an ITS Baker et al. (2008) 777We discuss student modeling in more detail in Section 4.2.. The ML techniques that are used to construct these models are mainly regression techniques (to obtain predictors that produce numerical values) and classification algorithms (to predict categorical or qualitative variables).

Second, data-driven modeling techniques can be used on large collections of student data, in a big-data fashion. A classic application in this category consists in analyzing completed student tasks in order to build a database of common errors, or “bugs” Wenger (1987); Chrysafiadi and Virvou (2013), which is an important component of an ITS. A different application consists in modeling complete student populations, which can be useful to group students into different “stereotypes”, and is typically performed by clustering algorithms. Another application is the large-scale analysis of student profiles and completed tasks to improve the personalization of learning paths in an ITS. In this case, recommendation algorithms can be employed Chrysafiadi and Virvou (2013). Finally, while studies in this field are mostly restricted to single schools or data from single ITS, it is easy to imagine that data-driven modeling can be applied to larger populations of students, for instance on a national level, where they could be used to make statistical assessments about the effectiveness of specific aspects of a curriculum.

4 The present, revisited

Armed with the taxonomy introduced in Section 3, we can now revisit the examples from Section 2 and analyze their AI and ML techniques in more detail, pointing out some capabilities that may be added in the future.

4.1 AI-based calculators

Figure 5: The workflow of a camera calculator.

The “camera calculator”, described earlier, operates as shown in Figure 5: First, the user captures a problem, for instance by taking a picture, which is translated by an information extractor into its mathematical formulation. Second, a reasoning engine solves the problem and produces a solution, in machine code. Third, an explainer translates the machine code into a human-readable reasoning sequence. In the case of simple problems, the reasoning engine and explainer could be replaced by a single module. Finally, the complete solution is presented to the user, who may request additional information on each of the steps.

The described workflow is valid for a wide range of calculators: The extractor could operate on different types of data, such as text from word problems, or voice commands, which are transcribed to text. As for the reasoning engine, many ATP and advanced computational engines are available, including the solvers Mathematica888https://www.wolfram.com/mathematica/ and Maple999https://www.maplesoft.com/. In this context, a pioneering role in the development of AI-based calculators is played by the WolframAlpha computational engine101010https://www.wolframalpha.com/, which operates on written queries and combines database look-ups with the computational power of Mathematica. It includes some explainer capabilities as well, as it provide feedback on the solution and links to related educational resources. WolframAlpha was launched in 2009, making it a forerunner of current AI-based calculator such as the ones included in personal digital assistants, notably Siri111111https://www.apple.com/siri/, Cortana121212https://www.microsoft.com/cortana/, Alexa131313https://developer.amazon.com/alexa/ and Google Assistant141414https://assistant.google.com/.

Another type of AI-based calculator is found in DGS with reasoning capabilities, which take geometric constructions as an input. While these tools contain a reasoning engine in the form of their GATP, they do not have explainer capabilities, as mentioned before, since the GATP they include produce proofs that cannot be translated to human-readable reasoning Quaresma (2020); Kovacs and Recio (2020).

Presently, it is not clear how these new tools should fit in current mathematics curricula. If they are allowed without restrictions, some opponents claim that they will keep students from learning. Others recognize that the role of these tools must be debated in the educational community. Some proponents point out that the availability of these tools produces a shift in the desired objectives of ME Kovacs and Recio (2020).

4.2 Intelligent tutoring systems

An ITS is a computer-based learning tool that makes use of AI to create adaptive educational environments that respond both to the learner’s level and needs, and to the instructional agenda Graesser et al. (2012). While an ITS may share some underlying technologies with the AI-based calculators we described, they are much more complex tools, and they are fundamentally interactive. Here, we will review and discuss some relevant ITS that have been proposed in the literature.

Figure 6: Components of an ITS, in relationship to the introduced taxonomy.

Typically, an ITS involves four different components Wenger (1987); de Souza and Ferreira (2002); Shute and Psotka (1996), as represented in Figure 6: i) a domain component to encode the expert knowledge, ii) a student component to represent student knowledge and behavior, iii) a tutor component to select the best pedagogical action, and iv) an interface to interact with the student. The domain component includes, among others, expert knowledge, databases of tasks, and databases of bugs. The student component includes student models and student data, such as the detailed history of completed student tasks. After each interaction, the student actions are analyzed and the models are updated to reflect the new data. During the interaction with the student, the tutor uses a reasoning engine to track the reasoning of the student. It uses data from the student and domain components to spot bugs, offer feedback and personalize the learning path.

In some ITS, the tutor relies on expert knowledge with exact inference rules, which allow it to know a priori all possible solution paths to a problem. Examples include the Hypergraph Based Problem Solver (HBPS) ITS, which deals with word problems Arnau et al. (2013), and the QED-Tutrix ITS, used for solving high-school level geometric proof solving problems Leduc (2016)

. In general though, much of the information available in an ITS is incomplete or uncertain. Modeling the student, for instance, involves making inferences about the student’s knowledge and behavior. Hence, probabilistic and approximate techniques such as Bayesian networks or fuzzy modeling are needed. An example is found in the TIDES ITS, proposed in

Danine et al. (2006), which uses a Bayesian network to model student behavior based on the bugs that the student commits.

Currently, large parts of ITS that are being used in practice are designed and adapted using data-driven approaches, as described in Section 3.4. For instance, Kurvinen et al. (2020) describes the ViLLe ITS, whose commercial version uses AI techniques to improve the learning experience, based on the data of millions of student interactions.

Narrow student modeling

The literature on student modeling is vast, and currently there exist dozens of student models that are used in practical ITS Chrysafiadi and Virvou (2013); Sani et al. (2016); Abyaa et al. (2019). Nevertheless, the majority of student models focus only on one specific aspects of the student, which is why we will refer to them as specific or “narrow” models. For instance, a student model may be constructed solely to predict student performance, and a different model may represent their competences in mathematics. A comprehensive student model, as envisioned decades ago, should include both a complete model of the student’s knowledge as well as model of his behavior Balacheff (1993). This requires a more advanced AI, which we will discuss briefly in Section 5.

Some notes on exploration, creativity and randomness

The interaction with an ITS guarantees that a specific learning occurs and that a target performance is reached. Nevertheless, if the ITS is to provide a rich experience in which it can “determine the nature of the underlying meaning”, it should contain environments that allow the student to freely explore problem situations Balacheff and Kaput (1996). This is obtained by “guided discovery learning”, in which the system can shift between a tutor-like behavior for some situations and an open, exploratory environment for others.

In the AI literature, exploration is a prevailing theme. It is especially used in the subfield of ML known as “reinforcement learning”, in which an agent explores an environment to determine how to maximize some reward over time. Successful applications of this field include robotics software, where the AI has to learn how to interact with the physical world, and strategy games, where the system must figure out how to beat the game using its custom set of rules.

AI is usually not associated with creativity, or the capability to come up with creative solutions. What is more, popular belief has it that AI is suited only to provide “mechanical” solutions, while creativity is reserved for humans. Nonetheless, it is precisely the AI techniques that require exploration that show strong indications that AI is capable of showing creative behavior. A striking example was seen in the 2015 tournament of the game Go between world champion Lee Sedol and AlphaGo, an AI-based Go program developed by Google DeepMind Silver et al. (2017). The AI was first trained on records of human Go games, and then set out to battle clones of itself in order to continue improving. Interestingly, this approach led it to discover strategies that were previously unknown to human players.

In general, exploration and creativity require a certain component of “randomness”. Several studies have been performed on this topic in the AI literature. For instance, randomness can be used to initiate the exploration of the solution space in neural networks. A comprehensive introduction to this topic can be found in Scardapane and Wang (2017).

5 Modeling the mathematical learner: a most ambitious goal

While the advances in AI and ML over the past decade have been impressive, it is important to put them in context. In particular, the recent well-known “breakthroughs” in AI are all techniques that are very good at solving a very specific problem, such as recognizing objects in pictures or translating text into a different language. If the specific setting is changed, they may not function properly. For instance, if a model is trained on recognizing animals in pictures, it may not return a correct answer when given a drawing

of an animal. This capability of transferring a learned concept from one situation to a new context is known as “generalization”, and humans are particularly good at it. It is also one of the traits expected from Artificial General Intelligence (AGI), as opposed to the described narrow AI systems. A discussion on generalization capabilities in AI can be found in

Kansky et al. (2017).

In the previous sections, we have discussed several examples of student modeling, most of which are narrow modeling techniques, as they only cover one specific aspect of the student’s knowledge or behavior. In order to design a complete student model, we believe the AI needed is similar to AGI, in that it should thoroughly understand several fields and it should be able to generalize. The following is a non-exhaustive list of properties: First, it should master the understanding of physics, which is being researched in robotics AI. In a mathematics learning setting, physics are often needed to interpret word problems, and they are indispensable to describe what is happening in photographic imagery. Second, it should feature strong natural language understanding. This is currently a very active field in ML, with the best results being obtained by large neural networks. One such noteworthy system is the Generative Pre-trained Transformer 3 (GPT-3), a neural network built by OpenAI that contains 175 billion parameters and was trained on 45 TB of text data

Brown et al. (2020). It shows remarkable text-analysis capabilities and can correctly answer many text queries by producing arbitrarily long human-like text passages. Third, the AI should have reasoning capabilities that allow it to solve mathematical and other problems. As briefly touched throughout this chapter, this would require cognitive abilities in mathematical, logical, and abstract reasoning. Its generalization capabilities would furthermore allow it to relate knowledge from different fields. Finally, it would require knowledge from cognitive and developmental psychology to understand the student’s actions and general behavior. This aspect is perhaps the most complex to model, and ML research in this area is currently very limited.

6 Conclusions and discussion

In this chapter, we have presented an overview of contemporary AI techniques that are being used in digital ME tools. To provide a framework for this analysis, we have established a taxonomy of four different classes that cover each of these techniques: Information extractors, which convert data from the real world into a mathematical representation; Reasoning engines, which are solvers for mathematical problems; Explainers, which translate machine reasoning into human-interpretable steps; and data-driven modeling techniques, which are used to distill useful information and models from the data generated by students, for instance in ITS. We have also given more an in-depth analysis of AI-based calculator apps, which we consider to be the next generation of pocket calculators, and we have related the proposed taxonomy to the different components in a modern data-driven ITS.

We leave the reader with some ideas on AI-based tools for ME that we may see in the near future. First, progress in AI is currently dominated by ML-based techniques. The influence of these techniques is also noticeable in the experimental tools for ME that we have discussed. For instance, ML is used to automate perception in information extractors, to encode human intuition for searching in large solution spaces, and to analyze large volumes of data that are being generated in online education platforms. This trend will likely continue, with advances in ML being used to improve digital ME tools.

Second, a recurrent theme throughout this chapter is the existence of parallels between research in AI and research in ME. For one, many of the questions asked in the design of digital ME tools can be found in AI research as well. And, as such, many state-of-the-art techniques that are developed in AI can solve problems that are encountered while building ME tools, in particular in ITS. Nevertheless, it must be noted that the mentioned AI techniques were developed in fields other than ME, with goals other than ME in mind, after which they were “borrowed” to be used in ME tools. This imbalance is certainly fueled by the massive interest that exists nowadays in the AI space, and one could wonder what incentives in research and industry would be required to start a new generation of ME-first AI techniques. A related observation is that the field of AI has advanced greatly over the past decade, partly due to the habit of publishing novel algorithms as open source software. In ME, digital tools are currently difficult to access: Most of them are either private (and closed-source) initiatives, or academic prototypes that are not maintained after their research project finishes. A notable exception is found in DGS such as GeoGebra.

Finally, we can reflect on the transformation that occurred around five decades ago with the advent of pocket calculators. At that time, there existed a large experimental space in which many different ideas were tried out, after which some standard tools emerged that are still in use today. Currently, AI-based ME tools are in a similar experimental phase, and although it may take some years or decades, we do expect to see a similar appearance of a set of standard AI-based applications for ME.


  • [1] A. Abyaa, M. K. Idrissi, and S. Bennani (2019) Learner modelling: systematic review of the literature from the last 5 years. Educational Technology Research and Development 67 (5), pp. 1105–1143. Cited by: §4.2.
  • [2] A. Adadi and M. Berrada (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, pp. 52138–52160. Cited by: §3.3.
  • [3] F. Arabshahi, S. Singh, and A. Anandkumar (2018) Towards solving differential equations through neural programming. In ICML Workshop on Neural Abstract Machines and Program Induction (NAMPI), Cited by: §3.2.
  • [4] D. Arnau, M. Arevalillo-Herráez, L. Puig, and J. A. González-Calero (2013) Fundamentals of the design and the operation of an intelligent tutoring system for the learning of the arithmetical and algebraic way of solving word problems. Computers & Education 63, pp. 119–130. External Links: ISSN 0360-1315 Cited by: §3.2, §4.2.
  • [5] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al. (2020) Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58, pp. 82–115. Cited by: §3.3, §3.3.
  • [6] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider (2017) Analyzing undergraduate students’ performance using educational data mining. Computers & Education 113, pp. 177–194. Cited by: §3.4.
  • [7] R. S.J.d. Baker, A. B. Goldstein, and N. T. Heffernan (2010)

    Detecting the moment of learning

    In International Conference on Intelligent Tutoring Systems, pp. 25–34. Cited by: §3.4.
  • [8] R. Baker, J. Walonoski, N. Heffernan, I. Roll, A. Corbett, and K. Koedinger (2008) Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research 19 (2), pp. 185–224. Cited by: §3.4.
  • [9] N. Balacheff and J. J. Kaput (1996) Computer-based learning environments in mathematics. In International Handbook of Mathematics Education: Part 1, pp. 469–501. External Links: Document, ISBN 978-94-009-1465-0, Link Cited by: §4.2.
  • [10] N. Balacheff (1993) Artificial intelligence and mathematics education: expectations and questions. In 14th Biennal of the Australian Association of Mathematics Teachers, T. Herrington (Ed.), Perth, Australia, pp. 1–24. Cited by: §4.2.
  • [11] N. Balacheff (1994) Didactique et intelligence artificielle. Recherches en Didactique des Mathématiques 14, pp. 9–42. Cited by: §1.
  • [12] D. Barrett, F. Hill, A. Santoro, A. Morcos, and T. Lillicrap (2018-10–15 Jul) Measuring abstract reasoning in neural networks. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 511–520. External Links: Link Cited by: §3.2.
  • [13] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei (2020) Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 1877–1901. External Links: Link Cited by: §5.
  • [14] F. Chollet (2019) On the measure of intelligence. arXiv preprint arXiv:1911.01547. Cited by: §3.2.
  • [15] K. Chrysafiadi and M. Virvou (2013) Student modeling approaches: a literature review for the last decade. Expert Systems with Applications 40 (11), pp. 4715–4729. External Links: Document, ISSN 0957-4174, Link Cited by: §3.4, §4.2.
  • [16] P. Cortez and A. M. G. Silva (2008) Using data mining to predict secondary school student performance. In Proceedings of 5th Future Business Technology Conference, A. Brito and J. Teixeira (Eds.), pp. 5–12. Cited by: §3.4.
  • [17] A. Danine, B. Lefebvre, and A. Mayers (2006) Tides-using bayesian networks for student modeling. In Sixth IEEE International Conference on Advanced Learning Technologies (ICALT’06), pp. 1002–1007. Cited by: §4.2.
  • [18] M.A.F. de Souza and M. A. G. V. Ferreira (2002) Designing reusable rule-based architectures with design patterns. Expert Systems with Applications 23 (4), pp. 395–403. Cited by: §4.2.
  • [19] H. M. Fayek, M. Lech, and L. Cavedon (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Networks 92, pp. 60–68. Cited by: §3.1.
  • [20] L. Font, P. R. Richard, and M. Gagnon (2018) Improving QED-Tutrix by automating the generation of proofs. In Proceedings 6th International Workshop on Theorem proving components for Educational software (ThEdu’17), P. Quaresma and W. Neuper (Eds.), Electronic Proceedings in Theoretical Computer Science, Vol. 267, pp. 38–58. External Links: Document Cited by: §3.3.
  • [21] H. Fu, J. Zhang, X. Zhong, M. Zha, and L. Liu (2019) Robot for mathematics college entrance examination. In Electronic Proceedings of the 24th Asian Technology Conference in Mathematics, Mathematics and Technology, LLC, Cited by: §3.3.
  • [22] B. Furht (2008) Encyclopedia of multimedia. Springer Science & Business Media. Cited by: §3.2.
  • [23] M. Ganesalingam and W. T. Gowers (2017) A fully automatic theorem prover with human-style output. Journal of Automated Reasoning 58 (2), pp. 253–291. Cited by: §3.3, §3.3.
  • [24] T. Gauthier and C. Kaliszyk (2015) Premise selection and external provers for HOL4. In Proceedings of the 2015 Conference on Certified Programs and Proofs, pp. 49–57. Cited by: §3.2.
  • [25] A. C. Graesser, M. W. Conley, and A. Olney (2012) Intelligent tutoring systems. In APA educational psychology handbook, Vol 3: Application to learning and teaching, pp. 451–473. Cited by: §4.2.
  • [26] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.1.
  • [27] M. G. Judea Pearl and N. P. Jewell (2019) Causal inference in statistics: a primer. John Wiley & Sons. Cited by: §3.2.
  • [28] H. Kajasilta, M. Apiola, E. Lokkila, A. Veerasamy, and M. Laakso (2019) Measuring students’ stress with mood sensors: first findings. In International Conference on Web-Based Learning, pp. 92–99. Cited by: §3.1.
  • [29] K. Kansky, T. Silver, D. A. Mély, M. Eldawy, M. Lázaro-Gredilla, X. Lou, N. Dorfman, S. Sidor, S. Phoenix, and D. George (2017) Schema networks: zero-shot transfer with a generative causal model of intuitive physics. In International Conference on Machine Learning, pp. 1809–1818. Cited by: §5.
  • [30] R. Koncel-Kedziorski, H. Hajishirzi, A. Sabharwal, O. Etzioni, and S. D. Ang (2015) Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics 3, pp. 585–597. Cited by: §3.1.
  • [31] Z. Kovacs, T. Recio, P. R. Richard, S. Van Vaerenbergh, and M. P. Vélez (2020) Towards an ecosystem for computer-supported geometric reasoning. International Journal of Mathematical Education in Science and Technology. External Links: Document, ISSN 0020-739X Cited by: §2.1.
  • [32] Z. Kovacs and T. Recio (2020) GeoGebra reasoning tools for humans and for automatons. In Proceedings of the 25th Asian Technology Conference in Mathematics, Cited by: §3.3, §4.1, §4.1.
  • [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 25. Cited by: §1.1, §3.1, §3.2.
  • [34] E. Kurvinen, E. Kaila, M. Laakso, and T. Salakoski (2020) Long term effects on technology enhanced learning: the use of weekly digital lessons in mathematics. Informatics in Education 19 (1), pp. 51–75. Cited by: §4.2.
  • [35] A. Lamas, S. Tabik, P. Cruz, R. Montes, Á. Martínez-Sevilla, T. Cruz, and F. Herrera (2021) MonuMAI: dataset, deep learning pipeline and citizen science based app for monumental heritage taxonomy and classification. Neurocomputing 420, pp. 266–280. Cited by: §3.1.
  • [36] G. Lample and F. Charton (2019) Deep learning for symbolic mathematics. In Proceedings of ICLR, Cited by: §3.2.
  • [37] N. Leduc (2016) QED-tutrix: système tutoriel intelligent pour l’accompagnement des élèves en situation de résolution de problèmes de démonstration en géométrie plane. Ph.D. Thesis, École Polytechnique de Montréal. Cited by: §4.2.
  • [38] D. Lee, C. Szegedy, M. N. Rabe, S. M. Loos, and K. Bansal (2020) Mathematical reasoning in latent space. In Proceedings of ICLR, Cited by: §3.2.
  • [39] S. Li and W. Deng (2020) Deep facial expression recognition: a survey. IEEE Transactions on Affective Computing. External Links: Document Cited by: §3.1.
  • [40] S. Loos, G. Irving, C. Szegedy, and C. Kaliszyk (2017) Deep network guided proof search. LPAR-21. 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning. Cited by: §3.2.
  • [41] D. W. Loveland (1978) Automated theorem proving: a logical basis. Elsevier. Cited by: §3.2.
  • [42] H. Meng-Leong (2019-07) Future-ready strategic oversight of multiple artificial superintelligence-enabled adaptive learning systems via human-centric explainable AI-empowered predictive optimizations of educational outcomes. Big Data and Cognitive Computing 3 (3), pp. 46. External Links: ISSN 2504-2289 Cited by: §3.3, §3.3.
  • [43] T. M. Mitchell (1997) Machine learning. McGraw-Hill, New York. External Links: ISBN 978-0-07-042807-2 Cited by: §1.1.
  • [44] C. Molnar (2019) Interpretable machine learning. Lulu Publishing. Cited by: §3.3.
  • [45] P. Quaresma (2020) Automated deduction and knowledge management in geometry. Mathematics in Computer Science 14 (4), pp. 673–692. Cited by: §3.3, §4.1.
  • [46] L. R. Rabiner and B. H. Juang (1986)

    An introduction to hidden Markov models

    IEEE ASSP Magazine 3 (1), pp. 4–16. Cited by: §3.1.
  • [47] S. Russell and P. Norvig (2009) Artificial intelligence: a modern approach. 3rd edition, Prentice Hall Press, USA. External Links: ISBN 0136042597 Cited by: §1.1, §1.1, §1.
  • [48] S. M. Sani, A. B. Bichi, and S. Ayuba (2016) Artificial intelligence approaches in student modeling: half decade review (2010-2015). IJCSN-International Journal of Computer Science and Network 5 (5). Cited by: §4.2.
  • [49] D. Saxton, E. Grefenstette, F. Hill, and P. Kohli (2019) Analysing mathematical reasoning abilities of neural models. In Proceedings of ICLR, Cited by: §3.2.
  • [50] S. Scardapane and D. Wang (2017) Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7 (2), pp. e1200. Cited by: §4.2.
  • [51] A. H. Schoenfeld (1985) Artificial intelligence and mathematics education: a discussion of Rissland’s paper. In Teaching and learning mathematical problem solving: multiple research perspectives, E. Silver (Ed.), pp. 177–187. Cited by: §1.
  • [52] C. Schon, S. Siebert, and F. Stolzenburg (2019) Using conceptnet to teach common sense to an automated theorem prover. In Proceedings ARCADE 2019, Cited by: §3.2.
  • [53] V. J. Shute and J. Psotka (1996) Intelligent tutoring systems: past, present, and future. In Handbook of Research on Educational Communications and Technology, D. Jonassen (Ed.), pp. 570–600. Cited by: §4.2.
  • [54] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al. (2017) Mastering the game of go without human knowledge. Nature 550 (7676), pp. 354–359. Cited by: §4.2.
  • [55] A. Smith, W. Min, B. W. Mott, and J. C. Lester (2015) Diagrammatic student models: modeling student drawing performance with deep learning. In International Conference on User modeling, Adaptation, and Personalization, pp. 216–227. Cited by: §3.4.
  • [56] V. N. Vapnik (1995) The nature of statistical learning theory. Springer-Verlag, Berlin, Heidelberg. Cited by: §1.1.
  • [57] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. Cited by: §3.1.
  • [58] L. Wang, D. Zhang, L. Gao, J. Song, L. Guo, and H. T. Shen (2018) Mathdqn: solving arithmetic word problems via deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §3.2.
  • [59] Y. Wang, X. Liu, and S. Shi (2017) Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 845–854. Cited by: §3.2.
  • [60] C. Webel and S. Otten (2015) Teaching in a world with PhotoMath. The Mathematics Teacher 109 (5), pp. 368–373. Cited by: §2.1, §2.1, §2.1.
  • [61] E. Wenger (1987) Artificial intelligence and tutoring systems: computational and cognitive approaches to the communication of knowledge. Morgan Kaufmann. Cited by: §1.1, §1, §3.4, §4.2.