Combinatorial Register Allocation and Instruction Scheduling

This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain. An extensive evaluation of estimated speed, code size, and scalability confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1 code size reduction ranges from 0.8 Executing the generated code on Hexagon confirms that the estimated speedup indeed results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 647 instructions, delivers improved solutions for functions of up to 874 instructions, and achieves more than 85 potential speed for 90 The results in this paper show that our combinatorial approach can be used in practice to trade compilation time for code quality beyond the usual compiler optimization levels, fully exploit processor-specific features, and identify improvement opportunities in existing heuristic algorithms.


page 1

page 2

page 3

page 4


Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Statically estimating the number of processor clock cycles it takes to e...

Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction

BWDSP is a 32bit static scalar digital signal processor with VLIW and SI...

Revec: Program Rejuvenation through Revectorization

Modern microprocessors are equipped with Single Instruction Multiple Dat...

Mitigating Power Attacks through Fine-Grained Instruction Reordering

Side-channel attacks are a security exploit that take advantage of infor...

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

This paper presents a methodology for using LLVM-based tools to tune the...

Efficient global register allocation

In a compiler, an essential component is the register allocator. Two mai...

A scheme for dynamically integrating C library functions into a λProlog implementation

The Teyjus system realizes the higher-order logic programming languageλP...

Please sign up or login with your details

Forgot password? Click here to reset