An Evalutation of Programming Language Models' performance on Software Defect Detection

by   Kailun Wang, et al.

This dissertation presents an evaluation of several language models on software defect datasets. A language Model (LM) "can provide word representation and probability indication of word sequences as the core component of an NLP system." Language models for source code are specified for tasks in the software engineering field. While some models are directly the NLP ones, others contain structural information that is uniquely owned by source code. Software defects are defects in the source code that lead to unexpected behaviours and malfunctions at all levels. This study provides an original attempt to detect these defects at three different levels (syntactical, algorithmic and general) We also provide a tool chain that researchers can use to reproduce the experiments. We have tested the different models against different datasets, and performed an analysis over the results. Our original attempt to deploy bert, the state-of-the-art model for multitasks, leveled or outscored all other models compared.



There are no comments yet.



Code Obfuscation for the C/C++ Language

Obfuscation is the action of making something unintelligible. In softwar...

Capturing Structural Locality in Non-parametric Language Models

Structural locality is a ubiquitous feature of real-world datasets, wher...

Language Modelling for Source Code with Transformer-XL

It has been found that software, like natural language texts, exhibits "...

Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

Authorship attribution of source code has been an established research t...

Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability

With the recent influx of bidirectional contextualized transformer langu...

Exploring Software Naturalness through Neural Language Models

The Software Naturalness hypothesis argues that programming languages ca...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.