Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs

03/10/2018
by   Junzhong Shen, et al.
0

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal design parameters. Experiments on a real-life convolutional neural network (CNN) show that we can obtain the optimal extension of the linear array architecture.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset