Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

04/03/2020
by   Timofey Bryksin, et al.
0

In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2021

A Typology of Data Anomalies

Anomalies are cases that are in some way unusual and do not appear to fi...
research
04/14/2022

MP-CodeCheck: Evolving Logical Expression Code Anomaly Learning with Iterative Self-Supervision

Machine programming (MP) is concerned with automating software developme...
research
12/21/2020

Privacy Interpretation of Behavioural-based Anomaly Detection Approaches

This paper proposes the notion of 'Privacy-Anomaly Detection' and consid...
research
08/27/2020

The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets

Anomaly detection is the process of identifying cases, or groups of case...
research
12/13/2021

Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance

We discuss how VMware is solving the following challenges to harness dat...
research
07/30/2020

On the Nature and Types of Anomalies: A Review

Anomalies are occurrences in a dataset that are in some way unusual and ...
research
08/21/2019

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

The Scala programming language offers two distinctive language features ...

Please sign up or login with your details

Forgot password? Click here to reset