Intelligent Tutoring Systems (ITS) [anderson1985intelligent, nye2014autotutor] attempt to mimic personalized tutoring in a computer-based environment and are a low-cost alternative to human tutors. Over the past two decades, many ITS have been successfully deployed to enhance teaching and improve students’ learning experience in a number of domains [AbuEl, Agha, Nakhal, Rekhawi, budenbender2002, goguadze2005interactivity, leelawong2008designing, melis2004, passier2006, Qwaider], not only providing feedback and assistance but also addressing individual student characteristics [Graesser] and cognitive processes [Wu]. Many ITS consider the development of a personalized curriculum and personalized feedback [Aldahdooh, Nakhal, albacete2019impact, chi2011instructional, Lin, munshi2019personalization, rus2014macro, rus2014deeptutor], with dialogue-based ITS being some of the most effective tools for learning [ahn2018adaptive, graesser2005autotutor, graesser2001intelligent, nye2014autotutor, ventura2018preliminary], as they simulate a familiar learning environment of student–tutor interaction, thus helping to improve student motivation. The main bottleneck is the ability of ITS to address the multitude of possible scenarios in such interactions, and this is where methods of automated, data-driven feedback generation are of critical importance.
Our paper has two major contributions. Firstly, we describe how state-of-the-art machine learning (ML) and natural language processing (NLP) techniques can be used to generate automated, data-driven personalized hints and explanations, Wikipedia-based explanations, and mathematical hints. Feedback generated this way takes the individual needs of students into account, does not require expert intervention or hand-crafted rules, and is easily scalable and transferable across domains. Secondly, we demonstrate that the personalized feedback leads to substantially improved student learning gains and improved subjective feedback evaluation in practice. To support our claims, we utilize our feedback models in Korbit, a large-scale dialogue-based ITS.
2 Korbit Learning Platform
is a large-scale, open-domain, mixed-interface, dialogue-based ITS, which uses ML, NLP and reinforcement learning to provide interactive, personalized learning online. Currently, the platform has thousands of students enrolled and is capable of teaching topics related to data science, machine learning, and artificial intelligence.
Students enroll based on courses or skills they would like to study. Once a student has enrolled, Korbit
tutors them by alternating between short lecture videos and interactive problem-solving. During the problem-solving sessions, the student may attempt to solve an exercise, ask for help, or even skip it. If the student attempts to solve the exercise, their solution attempt is compared against the expectation (i.e. reference solution) using an NLP model. If their solution is classified as incorrect, theinner-loop system (see Fig. 1) will activate and respond with one of a dozen different pedagogical interventions, which include hints, mathematical hints, elaborations, explanations, concept tree diagrams, and multiple choice quiz answers. The pedagogical intervention is chosen by an ensemble of machine learning models from the student’s zone of proximal development (ZPD) [cazden1979peekaboo] based on their student profile and last solution attempt.
3 Automatically Generated Personalized Feedback
In this paper, we present experiments on the Korbit learning platform with actual students. These experiments involve varying the text hints and explanations based on how they were generated and how they were adapted to each unique student.
3.0.1 Personalized Hints and Explanations
are generated using NLP techniques applied by a 3-step algorithm to all expectations (i.e. reference solutions) in our database: (1) keywords, including nouns and noun phrases, are identified within the question (e.g. overfitting and underfitting in Table 1); (2) appropriate sentence span that does not include keywords is identified in a reference solution using state-of-the-art dependency parsing with spaCy222https://spacy.io (e.g., A model is underfitting is filtered out, while it has a high bias is considered as a candidate for a hint); and (3) a grammatically correct hint is generated using discourse-based modifications (e.g., Think about the case) and the partial hint from step (2) (e.g., when it has a high bias).
|What is thebetween||A model is||Think about the case|
|and ?||when it has a high bias.||when it has a high bias.|
Next, hints are ranked according to their linguistic quality as well as the past student–system interactions. We employ a Random Forest classifier using two broad sets of features: (1)Linguistic quality features assess the quality of the hint from the linguistic perspective only (e.g., considering length of the hint/explanation, keyword and topic overlap between the hint/explanation and the question, etc.), and are used by the baseline model only. (2) Performance-based features additionally take into account past student interaction with the system. Among them, the shallow personalization model includes features related to the number of attempted questions, proportion of correct and incorrect answers, etc., and the deep personalization model additionally includes linguistic features pertaining to up to previous student–system interaction turns. The three types of feedback models are trained and evaluated on a collection of previously recorded student–system interactions.
3.0.2 Wikipedia-Based Explanations
provide alternative ways of helping students to understand and remember concepts. We generate such explanations using another multi-stage pipeline: first, we use a 2 GB dataset on “Machine learning” crawled from Wikipedia and extract all relevant domain keywords from the reference questions and solutions using spaCy. Next, we use the first sentence in each article as an extracted Wikipedia-based explanation and the rest of the article to generate candidate explanations
. A Decision Tree classifier is trained on a dataset of positive and negative examples to evaluate the quality of a Wikipedia-based explanation using a number of linguistically-motivated features. This model is then applied to identify the most appropriate Wikipedia-based explanations among the generated ones.
3.0.3 Mathematical Hints
are either provided by Korbit in the form of suggested equations with gapped mathematical terms for the student to fill in, or in the form of a hint on what the student needs to change if they input an incorrect equation. Math equations are particularly challenging because equivalent expressions can have different representations: for example, in could be a function or a term multiplied by . To evaluate student equations, we first convert their LaTeX string into multiple parse trees, where each tree represents a possible interpretation, and then use a classifier to select the most likely parse tree and compare it to the expectation. Our generated feedback is fully automated, which differentiates Korbit from other math-oriented ITS, where feedback is generated by hand-crafted test cases [budenbender2002, hennecke1999online].
4 Experimental Results and Analysis
Our preliminary experiments with the baseline, shallow and deep personalization models run on the historical data using -fold cross-validation strongly suggested that deep personalization model selects the most appropriate personalized feedback. To support our claims, we ran experiments involving annotated student–system interactions, collected from students enrolled for free and studying the machine learning course on the Korbit platform between January and February, 2020. First, a hint or explanation was selected at uniform random from one of the personalized feedback models when a student gives an incorrect solution. Afterwards, the student learning gain was measured as the proportion of instances where a student provided a correct solution after receiving a personalized hint or explanation. Since it’s possible for the ITS to provide several pedagogical interventions for a given exercise, we separate the learning gains observed for all students from those for students who received a personalized hint or explanation before their second attempt at the exercise. Table 2 presents the results, showing that the deep personalization model leads to the highest student learning gains at followed by the shallow personalization model at and the baseline model at for all attempts. The difference between the learning gains of the deep personalization model and baseline model for the students before their second attempt is statistically significant at 95% confidence level based on a z-test (p=0.03005). These results support the hypothesis that automatically generated personalized hints and explanations lead to substantial improvements in student learning gains.
|All Attempts||Before Second Attempt|
|Model||Mean||95% C.I.||Mean||95% C.I.|
|Baseline (No Personalization)|
Student learning gains for personalized hints and explanations with 95% confidence intervals (C.I.).
Experiments on the Korbit platform confirm that extracted and generated Wikipedia-based explanations lead to comparable student learning gains. Students rated either or both types of explanations as helpful of the time. This shows that automatically-generated Wikipedia-based explanations can be included in the set of interventions used to personalize the feedback. Moreover, two domain experts independently analyzed a set of student–system interactions with Korbit, where the student’s solution attempt contained an incorrect mathematical equation. The results showed that over of the mathematical hints would be considered either “very useful” or “somewhat useful”.
In conclusion, our experiments strongly support the hypothesis that the personalized hints and explanations, as well as Wikipedia-based explanations, help to improve student learning outcomes significantly. Preliminary results also indicate that the mathematical hints are useful. Future work should investigate how and what types of Wikipedia-based explanations and mathematical hints may improve student learning outcomes, as well as their interplay with student learning profiles and knowledge gaps.