Large Language Models for Code: Security Hardening and Adversarial Testing

02/10/2023
by   Jingxuan He, et al.
0

Large language models (LMs) are increasingly pretrained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1 security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3 closely matches the original LMs in functional correctness.

READ FULL TEXT

page 10

page 18

research
06/05/2023

SelfEvolve: A Code Evolution Framework via Large Language Models

Large language models (LLMs) have already revolutionized code generation...
research
05/18/2023

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

Code generation aims to automatically generate source code from high-lev...
research
07/28/2023

VeriGen: A Large Language Model for Verilog Code Generation

In this study, we explore the capability of Large Language Models (LLMs)...
research
09/14/2023

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

The increasing popularity of large language models (LLMs) has paved the ...
research
06/24/2023

LLM-assisted Generation of Hardware Assertions

The security of computer systems typically relies on a hardware root of ...
research
10/03/2017

On Secure and Usable Program Obfuscation: A Survey

Program obfuscation is a widely employed approach for software intellect...
research
02/28/2022

Curb Your Self-Modifying Code

Self-modifying code has many intriguing applications in a broad range of...

Please sign up or login with your details

Forgot password? Click here to reset