Monitoring students’ face and upper body (appearance) as well as their interactions with the learning platform on the digital device (context) provide important cues to accurately understand different dimensions of students’ states during learning. In this study, our goal is to detect students’ behavioral engagement  (i.e., On-Task vs. Off-Task states) [7, 8, 4] in 1:1 digital learning scenarios. Towards this end, we aim to address two research questions: (1) What level of behavioral engagement detection performance can we achieve by using a scalable multi-modal approach (i.e., camera and URL logs)? (2) How would this performance change when considering cross-subjects or cross-content platforms (Math vs. English as a Second Language (ESL))?
Monitoring students’ face and upper body (appearance) as well as their interactions with the learning platform (context) provide important cues to accurately understand different dimensions of students’ states during learning. To detect behavioral engagement, we propose a two-phase system:
Phase 1: Contextual data (URL logs) is processed to assess whether the student is actively using the content platform. If not (Off-Platform), the student’s state is predicted as Off-Task.
Phase 2: If content platform is active in learner’s device, then the appearance information is utilized to predict whether the student is On-Task or Off-Task.
. The frame-wise raw video data is used to extract face location, head position and pose, 78 facial landmark localizations, 22 facial expressions, and 7 basic facial emotions. For instance-wise feature extraction, conventional time series analysis methods were applied, such as robust statistical estimators, motion and energy measures, frequency domain features. More details regarding the appearance modality and feature extraction can be found in our previous study. Instances are sliding windows of 8-sec with 4-sec overlaps.
3 Experimental Results
170 hours of multi-modal data were collected through authentic classroom pilots, from 28 9th grade students (two different classrooms) in 22 sessions (40 minutes each), using laptops with a 3D camera. Online content platforms for two subject areas were used: (1) Math (watching videos), (2) ESL (reading articles). To obtain ground truth labels, we employed HELP  with 3 expert labelers. We experimented with two test cases: (1) Cross-classroom, where trained models were tested on a different classroom’s data; (2) Cross-platform, where the data collected in different subject areas were utilized in training and testing, respectively. The results for these two experiments are summarized in Table 1 and Table 2, respectively.
|Train||Test||Class||Appr||Context + Appr|
|Train||Test||Class||Appr||Context + Appr|
|Set1 + Set2||Set1 + Set2||On-Task||0.82||0.82|
|Set1 + Set2||Set3||On-Task||0.79||-|
Since we have more Off-Platform samples in Set2 than in Set1, which are predicted as Off-Task in Phase 1; using context improves Off-Task scores more in Set2. We believe that the overall performance achieved is acceptable, as the expected accuracy by chance is 0.48, observed accuracy is 0.77, and Cohen’s Kappa is 0.55 for the final models. Further details of the methodology used in this study and discussions of the experimental results can be found in the full version of this paper .
To explore scalable multi-modal approach for behavioral engagement detection, we proposed a two-phase system incorporating both visual and contextual cues. Using the context information even in the form of URL logs is rewarding for improving the overall system performance. The promising overall F1-scores show the cross-subject and cross-platform applicability of our models.
- Alyuz et al.  N. Alyuz, E. Okur, E. Oktay, U. Genc, S. Aslan, S. E. Mete, B. Arnrich, and A. A. Esme. Semi-supervised model personalization for improved detection of learner’s emotional engagement. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pages 100–107, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4556-9. doi: 10.1145/2993148.2993166. URL https://doi.acm.org/10.1145/2993148.2993166.
- Aslan et al.  S. Aslan, S. E. Mete, E. Okur, E. Oktay, N. Alyuz, U. E. Genc, D. Stanhill, and A. A. Esme. Human expert labeling process (help): Towards a reliable higher-order user state labeling process and tool to assess student engagement. Educational Technology, 57(1):53–59, 2017. ISSN 00131962. URL https://eric.ed.gov/?id=EJ1126255.
- Chen et al.  C. Chen, A. Liaw, and L. Breiman. Using random forest to learn imbalanced data. University of California, Berkeley, 110:1–12, 2004.
- Fancsali  S. E. Fancsali. Data-driven causal modeling of" gaming the system" and off-task behavior in cognitive tutor algebra. In NIPS Workshop on Data Driven Education, 2013.
- Fredricks et al.  J. A. Fredricks, P. C. Blumenfeld, and A. H. Paris. School engagement: Potential of the concept, state of the evidence. Review of educational research, 74(1):59–109, 2004.
Okur et al. 
E. Okur, N. Alyuz, S. Aslan, U. Genc, C. Tanriover, and A. Arslan Esme.
Behavioral engagement detection of students in the wild.
International Conference on Artificial Intelligence in Education (AIED 2017), volume 10331 of Lecture Notes in Computer Science, pages 250–261, Cham, June 2017. Springer International Publishing. ISBN 978-3-319-61425-0. doi: 10.1007/978-3-319-61425-0_21. URL https://doi.org/10.1007/978-3-319-61425-0_21.
- Pekrun and Linnenbrink-Garcia  R. Pekrun and L. Linnenbrink-Garcia. Academic emotions and student engagement. In Handbook of research on student engagement, pages 259–282. Springer, 2012.
- Rodrigo et al.  M. M. T. Rodrigo, R. Baker, L. Rossi, et al. Student off-task behavior in computer-based learning in the philippines: comparison to prior research in the usa. Teachers College Record, 115(10):1–27, 2013.