A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

by   Iulian Vlad Serban, et al.

We present Korbit, a large-scale, open-domain, mixed-interface, dialogue-based intelligent tutoring system (ITS). Korbit uses machine learning, natural language processing and reinforcement learning to provide interactive, personalized learning online. Korbit has been designed to easily scale to thousands of subjects, by automating, standardizing and simplifying the content creation process. Unlike other ITS, a teacher can develop new learning modules for Korbit in a matter of hours. To facilitate learning across a widerange of STEM subjects, Korbit uses a mixed-interface, which includes videos, interactive dialogue-based exercises, question-answering, conceptual diagrams, mathematical exercises and gamification elements. Korbit has been built to scale to millions of students, by utilizing a state-of-the-art cloud-based micro-service architecture. Korbit launched its first course in 2019 on machine learning, and since then over 7,000 students have enrolled. Although Korbit was designed to be open-domain and highly scalable, A/B testing experiments with real-world students demonstrate that both student learning outcomes and student motivation are substantially improved compared to typical online courses.


page 1

page 2

page 3

page 4


Automated Personalized Feedback Improves Learning Gains in an Intelligent Tutoring System

We investigate how automated, data-driven, personalized feedback in a la...

Personalized Education at Scale

Tailoring the presentation of information to the needs of individual stu...

AI-Driven Interface Design for Intelligent Tutoring System Improves Student Engagement

An Intelligent Tutoring System (ITS) has been shown to improve students'...

Applied Machine Learning for Games: A Graduate School Course

The game industry is moving into an era where old-style game engines are...

Transferable Student Performance Modeling for Intelligent Tutoring Systems

Millions of learners worldwide are now using intelligent tutoring system...

Ontology-based Fuzzy Markup Language Agent for Student and Robot Co-Learning

An intelligent robot agent based on domain ontology, machine learning me...

Analysis of Individual Conversational Volatility in Tandem Telecollaboration for Second Language Learning

Second language learning can be enabled by tandem collaboration where st...

1 Introduction

Intelligent tutoring systems (ITS) are computer programs powered by artificial intelligence (AI), which deliver real-time, personalized tutoring to students. Traditional ITS implement or imitate the behavior and pedagogy of human tutors. In particular, one type of ITS are dialogue-based tutors, which use natural language conversations to tutor students [13]. This process is sometimes called “Socratic tutoring”, because of its similarity to Socratic dialogue [18]. Newer ITS have started to interleave their dialogue with interactive media (e.g. interactive videos and interactive web applets) – a so-called “mixed-interface system”. It has been shown that ITS can be twice as effective at promoting learning compared to the previous generation of computer-based instruction and that ITS may be as effective as human tutors in general [12].

However, despite the fact that ITS have been around for decades and are known to be highly effective, their deployment in education and industry has been extremely limited [14, 17]. A major reason for this is the sheer cost of development [5, 14]. As observed by Olney [14]: “Unfortunately, ITS are extremely expensive to produce, with some groups estimating that it takes 100 hours of authoring time from AI experts, pedagogical experts, and domain experts to produce 1 hour of instruction.” For example, the creators of the ITS “ITADS” noted that their system took 26 months to develop [16]

. On the other hand, lower-cost educational approaches, such as massive open online courses (MOOCs), have flourished and now boast of having millions of learners around the world. Indeed, it is estimated that today there are over 110 million learners around the world enrolled in MOOCs 

[19]. However, the learning outcomes resulting from learning in MOOCs depend critically on their teaching methodology and quality of content, and remains questionable in general [2, 3, 9, 10, 11, 15]

. In particular, recent research indicates that MOOCs having low levels of active learning, little feedback from instructors and peers, and few peer discussions tend to yield poor learning outcomes 

[10, 15]. Furthermore, it is well-known that student retention in MOOCs is substantially worse than in traditional classroom learning [8]. By combining the low cost and scalability of MOOCs with the personalization and effectiveness of ITS, we hope Korbit may one day help to effectively teach and motivate millions of students around the world.

2 The Korbit ITS

Figure 1: The Korbit ITS: An example dialogue illustrating how the ITS inner-loop system selects the pedagogical intervention. The student gives an incorrect solution and afterwards receives a text hint.

Korbit is a large-scale, open-domain, mixed-interface, dialogue-based ITS, which uses machine learning, natural language processing (NLP) and reinforcement learning (RL) to provide interactive, personalized learning online. The ITS has over 7,000 students enrolled from around the world, including students from educational institution partners and professionals from industry partners. Korbit is capable of teaching topics related to data science, machine learning, and artificial intelligence. The platform is highly modular and, since it is easy to create new content, it will soon be expanded with many more topics.

Students enroll on the Korbit website by selecting either a course or a set of skills they would like to study. Students may also answer a few questions about their background knowledge. Based on these, Korbit generates a personalized curriculum for each student. Following this, Korbit tutors the student by alternating between short lecture videos and interactive problem-solving exercises. The outer-loop system decides on which lecture video or exercise to show next based on the personalized curriculum. Currently, the ordering of videos and exercises is fully determined by the initial curriculum. However, we are working on an extension to make the curriculum adapt during learning (for example, by adding new modules attacking student knowledge gaps on-the-fly).

During the exercise sessions, the inner-loop

system manages the interaction. First, it shows the student a problem statement (e.g., a question). The student may then attempt to solve the exercise, ask for help, or skip the exercise. If the student attempts to solve the exercise, their solution attempt is compared against the expectation (i.e. reference solution) using an NLP model. If their solution is classified as incorrect, then the

inner-loop system will select one of a dozen different pedagogical interventions. The pedagogical interventions include textual hints, mathematical hints, elaborations, explanations, concept tree diagrams, and multiple choice quiz answers. The pedagogical intervention is chosen by an ensemble of machine learning models based on the student’s profile and the last solution attempt. Depending on the pedagogical intervention, the inner-loop system may either ask the student to retry the initial exercise or follow up on the intervention (e.g., with additional questions, confirmations, or prompts).

The Korbit ITS is closely related to the line of work on dialogue-based ITS, such as the pioneering AutoTutor and the newer IBM Watson Tutor [1, 6, 7, 13, 20]. Although Korbit is highly constrained compared to existing dialogue-based ITS, a major innovation of Korbit lies in its modular, scalable design. The inner-loop system is implemented as a finite-state machine. Each pedagogical intervention is a separate state, with its own logic, data and machine learning models. Each state operates independently of the rest of the system, has access to all database content (including all exercises and lecture videos) and can autonomously improve as new data becomes available. This ensures that the system gets better and better, that it can adapt to new content and that it can be extended with new pedagogical interventions. Furthermore, the transitions between the states of the finite-state machine is decided by a reinforcement learning model, which itself is agnostic to the underlying implementation of each state and also continues to improve as more and more data becomes available.

3 System Evaluation

We have conducted multiple studies to evaluate the Korbit ITS. Some of these studies have evaluated the entire system while others have focused on particular aspects or modules of the system. Taken together, the studies demonstrate that the Korbit ITS is an effective learning tool and that it overall improves student learning outcomes and motivation compared to alternative online learning approaches.

To keep things short, in this paper we limit ourselves and discuss only one of these studies. The study we present compares the entire system (Full ITS) against an xMOOC-like system [4]. The purpose of this particular study is to evaluate 1) whether students prefer the Korbit ITS or a regular MOOC, 2) whether the Korbit ITS increases student motivation, and 3) which aspects of the Korbit ITS students find most useful and least useful. In an ideal world, Korbit ITS would be compared against a regular xMOOC teaching students through lecture videos and multiple choice quizzes in a randomized controlled trial (a randomized A/B testing experiment). However, it is not possible to compare against such a system in a randomized controlled trial, because it would create confusion and drastically offset our students expectations.111Indeed, we attempted this in an earlier study. However, during that study, as soon as students found out that they were assigned to the xMOOC system instead of the ITS system, they would complain to us, logout and create a new account to access the main ITS system. Therefore, in this study, we compare the Full ITS against a reduced ITS, which appears identical to the Full ITS and utilizes the same content (video lectures and exercise questions), but defaults to multiple choice quizzes 50% of the time. Thus, students assigned to the reduced ITS effectively spend about half of their interactions in an xMOOC-like setting. We refer to this system as the xMOOC ITS.

System Time Spent Returning Students Will Refer Others   Learning Gain
Full ITS
Table 1: A/B testing results comparing the Full ITS against the xMOOC ITS

: average time spent by students (in minutes), returning students (in %), students who said they will refer others (in %) and learning gain (in %), with corresponding 95% confidence intervals. The

and shows statistical significance at 90% and 95% confidence respectively.

The experiment was conducted between October 7th and December 22nd, 2019, in an A/B testing setup with n=612 participants. Students who enrolled online were randomly assigned to either the Full ITS (80%) or xMOOC ITS (20%). Students came from many different countries and were not subject to any selection or filtering process. Apart from bug fixes and minor speed improvements, the system was kept fixed during this time period to limit confounding factors. After using the system for about 45 minutes, students were shown a questionnaire to evaluate the system.

Table 1 shows the experimental results. The average time spent in the Full ITS was 39.86 min compared to 22.98 min in the xMOOC ITS. As such, the Full ITS yields a staggering 73.46% increase in time spent. In addition, the percentage of returning students and the percentage of students who said they would refer others to use the system is substantially higher for the Full ITS compared to the xMOOC ITS. These results were also confirmed by the feedback provided by the students in the questionnaire. Thus, we can conclude that students strongly prefer Korbit ITS over xMOOCs and that the Korbit ITS increases overall student motivation.

Table 1 also shows the average student learning gain, which was observed to be 39.14%. The learning gain is measured as the proportion of instances where a student provides a correct exercise solution after having receiving a pedagogical intervention from the Korbit ITS. Thus, the pedagogical interventions appear to be effective.

Finally, in the questionnaire, 85.31% of students reported that they found the chat equally or more fun compared to learning alone and 66.67% of students reported that the chat helped them learn better sometimes, many times or all of the time. For the Full ITS, 54.17% of students reported that they would refer others to use Korbit ITS. In addition, students reported that the Korbit ITS could be improved by more accurately identifying their solutions as being correct or incorrect and, in the case of incorrect solutions, by providing more relevant and personalized feedback.


  • [1] J. Ahn, M. Chang, P. Watson, R. Tejwani, S. Sundararajan, T. Abuelsaad, and S. Prabhu (2018) Adaptive Visual Dialog for Intelligent Tutoring Systems. In International Conference on Artificial Intelligence in Education, pp. 413–418. Cited by: §2.
  • [2] J. K. Cavanaugh and S. J. Jacquemin (2015) A large sample comparison of grade based student learning outcomes in online vs. face-to-face courses.. Online Learning 19 (2), pp. n2. Cited by: §1.
  • [3] K. F. Colvin, J. Champaign, A. Liu, Q. Zhou, C. Fredericks, and D. E. Pritchard (2014) Learning in an introductory physics mooc: all cohorts learn equally, including an on-campus class. The international review of research in open and distributed learning 15 (4). Cited by: §1.
  • [4] J. Daniel (2012) Making sense of moocs: musings in a maze of myth, paradox and possibility. Journal of interactive Media in education 2012 (3). Cited by: §3.
  • [5] J. T. Folsom-Kovarik, S. Schatz, and D. Nicholson (2010) Plan ahead: Pricing ITS learner models. In Proceedings of the 19th Behavior Representation in Modeling & Simulation (BRIMS) Conference, pp. 47–54. Cited by: §1.
  • [6] A. C. Graesser, P. Chipman, B. C. Haynes, and A. Olney (2005) AutoTutor: An intelligent tutoring system with mixed-initiative dialogue. IEEE Transactions on Education 48 (4), pp. 612–618. Cited by: §2.
  • [7] A. C. Graesser, K. VanLehn, C. P. Rosé, P. W. Jordan, and D. Harter (2001) Intelligent tutoring systems with conversational dialogue. AI magazine 22 (4), pp. 39–39. Cited by: §2.
  • [8] K. S. Hone and G. R. El Said (2016) Exploring the factors affecting mooc retention: a survey study. Computers & Education 98, pp. 157–168. Cited by: §1.
  • [9] L. Kirtman (2009) Online versus in-class courses: an examination of differences in learning outcomes.. Issues in teacher education 18 (2), pp. 103–116. Cited by: §1.
  • [10] K. R. Koedinger, J. Kim, J. Z. Jia, E. A. McLaughlin, and N. L. Bier (2015) Learning is not a spectator sport: doing is better than watching for learning from a mooc. In Proceedings of the second (2015) ACM conference on learning@ scale, pp. 111–120. Cited by: §1.
  • [11] I. Koxvold (2014) MOOCs: opportunities for their use in compulsory-age education. Department for Education. Cited by: §1.
  • [12] J. A. Kulik and J. Fletcher (2016) Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of educational research 86 (1), pp. 42–78. Cited by: §1.
  • [13] B. D. Nye, A. C. Graesser, and X. Hu (2014) AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education 24 (4), pp. 427–469. Cited by: §1, §2.
  • [14] A. M. Olney (2018) Using novices to scale up intelligent tutoring systems. In Interservice/Industry training, Simulation, and Education Conference (I/ITSEC), Cited by: §1.
  • [15] D. Otto, A. Bollmann, S. Becker, and K. Sander (2018) It’s the learning, stupid! discussing the role of learning outcomes in moocs. Open Learning: The Journal of Open, Distance and e-Learning 33 (3), pp. 203–220. Cited by: §1.
  • [16] S. Ramachandran, R. Jensen, J. Ludwig, E. Domeshek, and T. Haines (2018) ITADS: a real-world intelligent tutor to train troubleshooting skills. In International Conference on Artificial Intelligence in Education, pp. 463–468. Cited by: §1.
  • [17] S. Ritter, J. R. Anderson, K. R. Koedinger, and A. Corbett (2007) Cognitive Tutor: Applied research in mathematics education. Psychonomic bulletin & review 14 (2), pp. 249–255. Cited by: §1.
  • [18] C. P. Rosé, J. D. Moore, K. VanLehn, and D. Allbritton (2001) A comparative evaluation of socratic versus didactic tutoring. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 23. Cited by: §1.
  • [19] D. Shah (2019) By The Numbers: MOOCs in 2019. Class Central MOOC Report. Cited by: §1.
  • [20] M. Ventura, M. Chang, P. Foltz, N. Mukhi, J. Yarbro, A. P. Salverda, J. Behrens, J. Ahn, T. Ma, T. I. Dhamecha, et al. (2018) Preliminary evaluations of a dialogue-based digital tutor. In International Conference on Artificial Intelligence in Education, pp. 480–483. Cited by: §2.