Generic and Universal Parallel Matrix Summation with a Flexible Compression Goal for Xilinx FPGAs

06/21/2018

∙

Bit matrix compression is a highly relevant operation in computer arithmetic. Essentially being a multi-operand addition, it is the key operation behind fast multiplication and many higher-level operations such as multiply-accumulate, the computation of the dot product or the implementation of FIR filters. Compressor implementations have been constantly evolving for greater efficiency both in general and in the context of concrete applications or specific implementation technologies. This paper is building on this history and describes a generic implementation of a bit matrix compressor for Xilinx FPGAs, which does not require a generator tool. It contributes FPGA-oriented metrics for the evaluation of elementary parallel bit counters, a systematic analysis and partial decomposition of previously proposed counters and a fully implemented construction heuristic with a flexible compression target matching the device capabilities. The generic implementation is agnostic of the aspect ratio of the input matrix and can be used for multiplication the same way as it can be for single-column population count operations.

READ FULL TEXT

Generic and Universal Parallel Matrix Summation with a Flexible Compression Goal for Xilinx FPGAs

Sign in with Google

Consider DeepAI Pro