On Adversarial Robustness of Synthetic Code Generation

06/22/2021
by   Mrinal Anand, et al.
0

Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with AlgoLisp DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing AlgoLisp DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Automated Code generation for Information Technology Tasks in YAML through Large Language Models

The recent improvement in code generation capabilities due to the use of...
research
03/22/2023

JaCoText: A Pretrained Model for Java Code-Text Generation

Pretrained transformer-based models have shown high performance in natur...
research
08/22/2022

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

Code generation aims to generate a code snippet automatically from natur...
research
06/10/2023

A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text

Java Code Generation consists in generating automatically Java code from...
research
07/14/2020

Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing

A significant progress has been made in deep-learning based Automatic Es...
research
10/05/2019

JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

Interactive programming with interleaved code snippet cells and natural ...
research
02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

Please sign up or login with your details

Forgot password? Click here to reset