Implementation of a framework for deploying AI inference engines in FPGAs

05/30/2023
by   Ryan Herbst, et al.
0

The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library This SLAC Neural Network Library SNL framework is designed with a streaming data approach optimizing the data flow between layers while minimizing the buffer data buffering requirements The goal is to ensure the highest possible framerate while keeping the maximum latency constrained to the needs of the experiment Our framework is designed to ensure the RTL implementation of the network layers supporting full redeployment of weights and biases without requiring resynthesis after training The ability to reduce the precision of the implemented networks through quantization is necessary to optimize the use of both DSP and memory resources in the FPGA We currently have a preliminary version of the toolset and are experimenting with both general purpose example networks and networks being designed for specific LCLS2 experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

100 Gb/s High Throughput Serial Protocol (HTSP) for Data Acquisition Systems with Interleaved Streaming

Demands on Field-Programmable Gate Array (FPGA) data transport have been...
research
07/01/2022

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Recurrent neural networks have been shown to be effective architectures ...
research
07/11/2018

FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs

It is well known that many types of artificial neural networks, includin...
research
07/14/2018

LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks

Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play a...
research
02/13/2023

OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science

In many experiment-driven scientific domains, such as high-energy physic...
research
03/11/2020

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

We present the implementation of binary and ternary neural networks in t...

Please sign up or login with your details

Forgot password? Click here to reset