Efficient Multi-Cycle Folded Integer Multipliers
Fast combinational multipliers with large bit widths can occupy significant silicon area. Provided the application allows for a multiplication to last two or more clock cycles, the area can be reduced through resource sharing (i.e., folding). This work introduces multiple architectures and parameterized Verilog circuit generators for Multi-Cycle folded Integer Multiplier (MCIM) designs, which are based on Schoolbook and Karatsuba approaches. When implementing an application in hardware, it is possible that a fractional number of multiplications is performed per cycle on average, such as 3.5. In such a case, we can use 3 single-cycle multipliers plus an additional smaller multiplier with a ThroughPut (TP) of 0.5. Our MCIM designs offer customization in terms of TP, latency, and clock frequency. The MCIM idea is for a TP of 1/n, where n is an integer and n ≥ 2. All proposed designs were synthesized and verified for various bit widths using scripts. ASIC synthesis results show that MCIM designs with a TP of 1/2 offer area savings of 21 of 8 to 128, with respect to synthesizing the * operator. Additionally, MCIM designs can offer up to 33 reduction.
READ FULL TEXT