StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

by   Johannes de Fine Licht, et al.

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate the generated architectures on an FPGA testbed, demonstrating the highest single-device and multi-device performance recorded for stencil programs on FPGAs to date, then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.



There are no comments yet.


page 1

page 8

page 9


A Foray into Efficient Mapping of Algorithms to Hardware Platforms on Heterogeneous Systems

Heterogeneous computing can potentially offer significant performance an...

Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features

High-performance computing developers are faced with the challenge of op...

hXDP: Efficient Software Packet Processing on FPGA NICs

FPGA accelerators on the NIC enable the offloading of expensive packet p...

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burde...

One-Time Programs made Practical

A one-time program (OTP) works as follows: Alice provides Bob with the i...

dMVX: Secure and Efficient Multi-Variant Execution in a Distributed Setting

Multi-variant execution (MVX) systems amplify the effectiveness of softw...

Interim Report on Adaptive Event Dispatching in Serverless Computing Infrastructures

Serverless computing is an emerging service model in distributed computi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.