Structural Language Models for Any-Code Generation

09/30/2019
by   Uri Alon, et al.
0

We address the problem of Any-Code Generation (AnyGen) - generating code without any restriction on the vocabulary or structure. The state-of-the-art in this problem is the sequence-to-sequence (seq2seq) approach, which treats code as a sequence and does not leverage any structural information. We introduce a new approach to AnyGen that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous structural techniques that have severely restricted the kinds of expressions that can be generated, our approach can generate arbitrary expressions in any programming language. Our model significantly outperforms both seq2seq and a variety of existing structured approaches in generating Java and C# code. We make our code, datasets, and models available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

Structural Language Models of Code

We address the problem of any-code completion - generating a missing pie...
research
05/21/2023

SLaDe: A Portable Small Language Model Decompiler for Optimized Assembler

Decompilation is a well-studied area with numerous high-quality tools av...
research
08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...
research
02/17/2023

PAC Prediction Sets for Large Language Models of Code

Prediction sets have recently been shown to be a promising strategy for ...
research
03/29/2021

Embedding API Dependency Graph for Neural Code Generation

The problem of code generation from textual program descriptions has lon...
research
06/01/2023

AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code

Control Flow Graphs (CFGs) are essential for visualizing, understanding ...
research
08/22/2022

Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation

Code generation focuses on the automatic conversion of natural language ...

Please sign up or login with your details

Forgot password? Click here to reset