Clustering Introductory Computer Science Exercises Using Topic Modeling Methods

04/21/2021
by   Laura O. Moraes, et al.
0

Manually determining concepts present in a group of questions is a challenging and time-consuming process. However, the process is an essential step while modeling a virtual learning environment since a mapping between concepts and questions using mastery level assessment and recommendation engines are required. We investigated unsupervised semantic models (known as topic modeling techniques) to assist computer science teachers in this task and propose a method to transform Computer Science 1 teacher-provided code solutions into representative text documents, including the code structure information. By applying non-negative matrix factorization and latent Dirichlet allocation techniques, we extract the underlying relationship between questions and validate the results using an external dataset. We consider the interpretability of the learned concepts using 14 university professors' data, and the results confirm six semantically coherent clusters using the current dataset. Moreover, the six topics comprise the main concepts present in the test dataset, achieving 0.75 in the normalized pointwise mutual information metric. The metric correlates with human ratings, making the proposed method useful and providing semantics for large amounts of unannotated code.

READ FULL TEXT

page 1

page 7

page 9

research
12/01/2021

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

We utilize a recently developed topic modeling method called SeNMFk, ext...
research
12/20/2018

Recommendation System based on Semantic Scholar Mining and Topic modeling: A behavioral analysis of researchers from six conferences

Recommendation systems have an important place to help online users in t...
research
07/06/2021

Topic Modeling in the Voynich Manuscript

This article presents the results of investigations using topic modeling...
research
02/08/2022

Police Text Analysis: Topic Modeling and Spatial Relative Density Estimation

We analyze a large corpus of police incident narrative documents in unde...
research
05/08/2014

Improving Image Clustering using Sparse Text and the Wisdom of the Crowds

We propose a method to improve image clustering using sparse text and th...
research
10/25/2016

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

Topic modeling, a method for extracting the underlying themes from a col...
research
11/16/2020

The Influence of Domain-Based Preprocessing on Subject-Specific Clustering

The sudden change of moving the majority of teaching online at Universit...

Please sign up or login with your details

Forgot password? Click here to reset