BenchDirect: A Directed Language Model for Compiler Benchmarks

03/02/2023
by   Foivos Tsimpourlas, et al.
0

The exponential increase of hardware-software complexity has made it impossible for compiler engineers to find the right optimization heuristics manually. Predictive models have been shown to find near optimal heuristics with little human effort but they are limited by a severe lack of diverse benchmarks to train on. Generative AI has been used by researchers to synthesize benchmarks into existing datasets. However, the synthetic programs are short, exceedingly simple and lacking diversity in their features. We develop BenchPress, the first ML compiler benchmark generator that can be directed within source code feature representations. BenchPress synthesizes executable functions by infilling code that conditions on the program's left and right context. BenchPress uses active learning to introduce new benchmarks with unseen features into the dataset of Grewe's et al. CPU vs GPU heuristic, improving its acquired performance by 50 been impossible for other synthesizers to reach. In 3 feature spaces, we outperform human-written code from GitHub, CLgen, CLSmith and the SRCIROR mutator in targeting the features of Rodinia benchmarks. BenchPress steers generation with beam search over a feature-agnostic language model. We improve this with BenchDirect which utilizes a directed LM that infills programs by jointly observing source code context and the compiler features that are targeted. BenchDirect achieves up to 36 targeting the features of Rodinia benchmarks, it is 1.8x more likely to give an exact match and it speeds up execution time by up to 72 BenchPress. Both our models produce code that is difficult to distinguish from human-written code. We conduct a Turing test which shows our models' synthetic benchmarks are labelled as 'human-written' as often as human-written code from GitHub.

READ FULL TEXT

page 11

page 14

research
08/13/2022

BenchPress: A Deep Active Benchmark Generator

We develop BenchPress, the first ML benchmark generator for compilers th...
research
02/24/2021

Learning to Make Compiler Optimizations More Effective

Because loops execute their body many times, compiler developers place m...
research
03/08/2019

Formal Constraint-based Compilation for Noisy Intermediate-Scale Quantum Systems

Noisy, intermediate-scale quantum (NISQ) systems are expected to have a ...
research
01/02/2014

Structured Generative Models of Natural Source Code

We study the problem of building generative models of natural source cod...
research
10/26/2020

Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation

Machine learning has shown its capabilities for an automatic generation ...
research
01/04/2023

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse progr...
research
10/25/2019

Selective Lambda Lifting

Lambda lifting is a well-known transformation, traditionally employed fo...

Please sign up or login with your details

Forgot password? Click here to reset