Optimization Techniques to Improve Inference Performance of a Forward Propagating Neural Network on an FPGA
This paper describes an optimized implementation of a Forward Propagating Classification Neural Network which has been previously trained. The implementation described highlights a novel means of using Python scripts to generate a Verilog hardware implementation. The characteristics of this implementation include optimizations to scale input data, use selected addends instead of multiplication functions, hardware friendly activation functions and simplified output selection. Inference performance comparison of a 28x28 pixel 'hand-written' recognition NN between a software implementation on an Intel i7 vs a Xilinx FPGA will be detailed.
READ FULL TEXT