Learning Performance-Improving Code Edits

02/15/2023
by   Aman Madaan, et al.
2

The waning of Moore's Law has shifted the focus of the tech industry towards alternative methods for continued performance gains. While optimizing compilers are a standard tool to help increase program efficiency, programmers continue to shoulder much responsibility in crafting and refactoring code with better performance characteristics. In this paper, we investigate the ability of large language models (LLMs) to suggest functionally correct, performance improving code edits. We hypothesize that language models can suggest such edits in ways that would be impractical for static analysis alone. We investigate these questions by curating a large-scale dataset of Performance-Improving Edits, PIE. PIE contains trajectories of programs, where a programmer begins with an initial, slower version and iteratively makes changes to improve the program's performance. We use PIE to evaluate and improve the capacity of large language models. Specifically, use examples from PIE to fine-tune multiple variants of CODEGEN, a billion-scale Transformer-decoder model. Additionally, we use examples from PIE to prompt OpenAI's CODEX using a few-shot prompting. By leveraging PIE, we find that both CODEX and CODEGEN can generate performance-improving edits, with speedups of more than 2.5x for over 25 the programs, for C++ and Python, even after the C++ programs were compiled using the O3 optimization level. Crucially, we show that PIE allows CODEGEN, an open-sourced and 10x smaller model than CODEX, to match the performance of CODEX on this challenging task. Overall, this work opens new doors for creating systems and methods that can help programmers write efficient code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2021

Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large langua...
research
07/29/2022

Language Models Can Teach Themselves to Program Better

This work shows how one can use large-scale language models (LMs) to syn...
research
04/13/2023

Improving Few-Shot Prompts with Relevant Static Analysis Products

Large Language Models (LLM) are a new class of computation engines, "pro...
research
05/29/2023

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Chain-of-thought (CoT) prompting with large language models has proven e...
research
03/09/2023

Planning with Large Language Models for Code Generation

Existing large language model-based code generation pipelines typically ...
research
06/17/2022

Evolution through Large Models

This paper pursues the insight that large language models (LLMs) trained...
research
07/20/2023

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

Large Language Models have many methods for solving the same problem. Th...

Please sign up or login with your details

Forgot password? Click here to reset