Mining Idioms in the Wild

07/13/2021
by   Aishwarya Sivaraman, et al.
0

Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new APIs or have not gotten around to them. Detection of idiomatic code can also point to the need for new APIs. We share our experiences in mine idiomatic patterns from the Hack repo at Facebook. We found that existing techniques either cannot identify meaningful patterns from syntax trees or require test-suite-based dynamic analysis to incorporate semantic properties to mine useful patterns. The key insight of the approach proposed in this paper – Jezero – is that semantic idioms from a large codebase can be learned from canonicalized dataflow trees. We propose a scalable, lightweight static analysis-based approach to construct such a tree that is well suited to mine semantic idioms using nonparametric Bayesian methods. Our experiments with Jezero on Hack code shows a clear advantage of adding canonicalized dataflow information to ASTs: Jezero was significantly more effective than a baseline that did not have the dataflow augmentation in being able to effectively find refactoring opportunities from unannotated legacy code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2019

Concrete Syntax with Black Box Parsers

Context: Meta programming consists for a large part of matching, analyzi...
research
09/05/2020

TreeCaps: Tree-Based Capsule Networks for Source Code Processing

Recently program learning techniques have been proposed to process sourc...
research
03/23/2021

PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

The application of machine learning algorithms to source code has grown ...
research
12/08/2017

Mining Fix Patterns for FindBugs Violations

In this paper, we first collect and track large-scale fixed and unfixed ...
research
12/23/2021

Towards Fully Declarative Program Analysis via Source Code Transformation

Advances in logic programming and increasing industrial uptake of Datalo...
research
02/16/2019

Getafix: Learning to fix bugs automatically

Static analyzers, including linters, can warn developers about programmi...
research
12/13/2018

Active Inductive Logic Programming for Code Search

Modern search techniques either cannot efficiently incorporate human fee...

Please sign up or login with your details

Forgot password? Click here to reset