Log In Sign Up

Detecting Behavioral Engagement of Students in the Wild Based on Contextual and Visual Data

by   Eda Okur, et al.

To investigate the detection of students' behavioral engagement (On-Task vs. Off-Task), we propose a two-phase approach in this study. In Phase 1, contextual logs (URLs) are utilized to assess active usage of the content platform. If there is active use, the appearance information is utilized in Phase 2 to infer behavioral engagement. Incorporating the contextual information improved the overall F1-scores from 0.77 to 0.82. Our cross-classroom and cross-platform experiments showed the proposed generic and multi-modal behavioral engagement models' applicability to a different set of students or different subject areas.


page 1

page 2

page 3


Unobtrusive and Multimodal Approach for Behavioral Engagement Detection of Students

We propose a multimodal approach for detection of students' behavioral e...

Bag of States: A Non-sequential Approach to Video-based Engagement Measurement

Automatic measurement of student engagement provides helpful information...

Active Learning for Out-of-class Activities by Using Interactive Mobile Apps

Keeping students engaged with the course content outside the classroom i...

Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions

With the advent of Digital Therapeutics (DTx), the development of softwa...

Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach

Human behavior expression and experience are inherently multi-modal, and...

1 Introduction

Monitoring students’ face and upper body (appearance) as well as their interactions with the learning platform on the digital device (context) provide important cues to accurately understand different dimensions of students’ states during learning. In this study, our goal is to detect students’ behavioral engagement [5] (i.e., On-Task vs. Off-Task states) [7, 8, 4] in 1:1 digital learning scenarios. Towards this end, we aim to address two research questions: (1) What level of behavioral engagement detection performance can we achieve by using a scalable multi-modal approach (i.e., camera and URL logs)? (2) How would this performance change when considering cross-subjects or cross-content platforms (Math vs. English as a Second Language (ESL))?

2 Methodology

Monitoring students’ face and upper body (appearance) as well as their interactions with the learning platform (context) provide important cues to accurately understand different dimensions of students’ states during learning. To detect behavioral engagement, we propose a two-phase system:

  1. Phase 1: Contextual data (URL logs) is processed to assess whether the student is actively using the content platform. If not (Off-Platform), the student’s state is predicted as Off-Task.

  2. Phase 2: If content platform is active in learner’s device, then the appearance information is utilized to predict whether the student is On-Task or Off-Task.

We trained generic appearance classifiers by employing Random Forests

[3] in Phase 2

. The frame-wise raw video data is used to extract face location, head position and pose, 78 facial landmark localizations, 22 facial expressions, and 7 basic facial emotions. For instance-wise feature extraction, conventional time series analysis methods were applied, such as robust statistical estimators, motion and energy measures, frequency domain features. More details regarding the appearance modality and feature extraction can be found in our previous study

[1]. Instances are sliding windows of 8-sec with 4-sec overlaps.

3 Experimental Results

170 hours of multi-modal data were collected through authentic classroom pilots, from 28 9th grade students (two different classrooms) in 22 sessions (40 minutes each), using laptops with a 3D camera. Online content platforms for two subject areas were used: (1) Math (watching videos), (2) ESL (reading articles). To obtain ground truth labels, we employed HELP [2] with 3 expert labelers. We experimented with two test cases: (1) Cross-classroom, where trained models were tested on a different classroom’s data; (2) Cross-platform, where the data collected in different subject areas were utilized in training and testing, respectively. The results for these two experiments are summarized in Table 1 and Table 2, respectively.

Train Test Class Appr Context + Appr
Set1 Set1 On-Task 0.82 0.82
Off-Task 0.69 0.77
Overall 0.77 0.80
Set1 Set2 On-Task 0.83 0.83
Off-Task 0.63 0.79
Overall 0.77 0.82
Table 1: F1-scores for Cross-classroom Experiments (Set1: Classroom 1, Set2: Classroom 2, Appr: Appearance).
Train Test Class Appr Context + Appr
Set1 + Set2 Set1 + Set2 On-Task 0.82 0.82
(Math) (Math) Off-Task 0.67 0.78
Overall 0.77 0.80
Set1 + Set2 Set3 On-Task 0.79 -
(Math) (ESL) Off-Task 0.59 -
Overall 0.72 -
Table 2: F1-scores for Cross-platform Experiments (Set1: Classroom 1 with Math, Set2: Classroom 2 with Math, Set3: Classroom 1 with ESL).

Since we have more Off-Platform samples in Set2 than in Set1, which are predicted as Off-Task in Phase 1; using context improves Off-Task scores more in Set2. We believe that the overall performance achieved is acceptable, as the expected accuracy by chance is 0.48, observed accuracy is 0.77, and Cohen’s Kappa is 0.55 for the final models. Further details of the methodology used in this study and discussions of the experimental results can be found in the full version of this paper [6].

4 Conclusion

To explore scalable multi-modal approach for behavioral engagement detection, we proposed a two-phase system incorporating both visual and contextual cues. Using the context information even in the form of URL logs is rewarding for improving the overall system performance. The promising overall F1-scores show the cross-subject and cross-platform applicability of our models.


  • Alyuz et al. [2016] N. Alyuz, E. Okur, E. Oktay, U. Genc, S. Aslan, S. E. Mete, B. Arnrich, and A. A. Esme. Semi-supervised model personalization for improved detection of learner’s emotional engagement. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pages 100–107, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4556-9. doi: 10.1145/2993148.2993166. URL
  • Aslan et al. [2017] S. Aslan, S. E. Mete, E. Okur, E. Oktay, N. Alyuz, U. E. Genc, D. Stanhill, and A. A. Esme. Human expert labeling process (help): Towards a reliable higher-order user state labeling process and tool to assess student engagement. Educational Technology, 57(1):53–59, 2017. ISSN 00131962. URL
  • Chen et al. [2004] C. Chen, A. Liaw, and L. Breiman. Using random forest to learn imbalanced data. University of California, Berkeley, 110:1–12, 2004.
  • Fancsali [2013] S. E. Fancsali. Data-driven causal modeling of" gaming the system" and off-task behavior in cognitive tutor algebra. In NIPS Workshop on Data Driven Education, 2013.
  • Fredricks et al. [2004] J. A. Fredricks, P. C. Blumenfeld, and A. H. Paris. School engagement: Potential of the concept, state of the evidence. Review of educational research, 74(1):59–109, 2004.
  • Okur et al. [2017] E. Okur, N. Alyuz, S. Aslan, U. Genc, C. Tanriover, and A. Arslan Esme. Behavioral engagement detection of students in the wild. In

    International Conference on Artificial Intelligence in Education (AIED 2017)

    , volume 10331 of Lecture Notes in Computer Science, pages 250–261, Cham, June 2017. Springer International Publishing.
    ISBN 978-3-319-61425-0. doi: 10.1007/978-3-319-61425-0_21. URL
  • Pekrun and Linnenbrink-Garcia [2012] R. Pekrun and L. Linnenbrink-Garcia. Academic emotions and student engagement. In Handbook of research on student engagement, pages 259–282. Springer, 2012.
  • Rodrigo et al. [2013] M. M. T. Rodrigo, R. Baker, L. Rossi, et al. Student off-task behavior in computer-based learning in the philippines: comparison to prior research in the usa. Teachers College Record, 115(10):1–27, 2013.