ast2vec: Utilizing Recursive Neural Encodings of Python Programs

03/22/2021
by   Benjamin Paaßen, et al.
0

Educational datamining involves the application of datamining techniques to student activity. However, in the context of computer programming, many datamining techniques can not be applied because they expect vector-shaped input whereas computer programs have the form of syntax trees. In this paper, we present ast2vec, a neural network that maps Python syntax trees to vectors and back, thereby facilitating datamining on computer programs as well as the interpretation of datamining results. Ast2vec has been trained on almost half a million programs of novice programmers and is designed to be applied across learning tasks without re-training, meaning that users can apply it without any need for (additional) deep learning. We demonstrate the generality of ast2vec in three settings: First, we provide example analyses using ast2vec on a classroom-sized dataset, involving visualization, student motion analysis, clustering, and outlier detection, including two novel analyses, namely a progress-variance-projection and a dynamical systems analysis. Second, we consider the ability of ast2vec to recover the original syntax tree from its vector representation on the training data and two further large-scale programming datasets. Finally, we evaluate the predictive capability of a simple linear regression on top of ast2vec, obtaining similar results to techniques that work directly on syntax trees. We hope ast2vec can augment the educational datamining toolbelt by making analyses of computer programs easier, richer, and more efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2021

Automating Program Structure Classification

When students write programs, their program structure provides insight i...
research
03/19/2016

Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks

We present a method for automatically generating repair feedback for syn...
research
09/29/2022

Repairing Bugs in Python Assignments Using Large Language Models

Students often make mistakes on their introductory programming assignmen...
research
03/24/2020

Context-Aware Parse Trees

The simplified parse tree (SPT) presented in Aroma, a state-of-the-art c...
research
01/24/2023

Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models

Large language models (LLMs), such as Codex, hold great promise in enhan...
research
05/18/2018

Tree Edit Distance Learning via Adaptive Symbol Embeddings: Supplementary Materials and Results

Metric learning has the aim to improve classification accuracy by learni...

Please sign up or login with your details

Forgot password? Click here to reset