A Cost-Efficient Look-Up Table Based Binary Coded Decimal Adder Design

03/18/2022
by   Zarrin Tasnim Sworna, et al.
University of Dhaka
0

The BCD (Binary Coded Decimal) being the more accurate and human-readable representation with ease of conversion, is prevailing in the computing and electronic communication.In this paper, a tree-structured parallel BCD addition algorithm is proposed with the reduced time complexity. BCD adder is more effective with a LUT (Look-Up Table)-based design, due to FPGA (Field Programmable Gate Array) technology's enumerable benefits and applications. A size-minimal and depth-minimal LUT-based BCD adder circuit construction is the main contribution of this paper.

READ FULL TEXT VIEW PDF

Authors

page 8

09/03/2018

Programmable Memristive Threshold Logic Gate Array

This paper proposes the implementation of programmable threshold logic g...
10/10/2018

Computational ghost imaging using a field-programmable gate array

Computational ghost imaging is a promising technique for single-pixel im...
10/29/2019

A Structured Table of Graphs with Symmetries and Other Special Properties

We organize a table of regular graphs with minimal diameters and minimal...
12/01/2019

A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees

This paper presents a deeply pipelined and massively parallel Binary Sea...
02/10/2017

Physically unclonable function using initial waveform of ring oscillators on 65 nm CMOS technology

A silicon physically unclonable function (PUF) using ring oscillators (R...
10/29/2019

A Structured Table of Graphs with Special Properties

We organize a table of regular graphs with minimal diameters and mean pa...
04/07/2020

Increasing the Inference and Learning Speed of Tsetlin Machines with Clause Indexing

The Tsetlin Machine (TM) is a machine learning algorithm founded on the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

BCD (Binary Coded Decimal) representation is advantageous due to its finite place value representation, rounding, easy scaling by a factor of 10, simple alignment and conversion to character form [1] [2]. It is highly used in embedded applications, digital communication and financial calculations [3] [4]. Hence, faster and efficient BCD addition method is desired. In this paper, a

-digit addition method is proposed which omits the complex manipulation steps, reducing area and delay of the circuit. The application of FPGA in cryptography, NP (Non Polynomial)-Hard optimization problems, pattern matching, bioinformatics, floating point arithmetic, molecular dynamics is increasing radically

[5] [6] [7]. Due to re-configurable capabilities, FPGA implementation of BCD addition is of concern. LUT being one of the main components of FPGA, a LUT-based adder circuit is proposed.

Two main contributions are addressed in this paper. Firstly, a new tree-based parallel BCD addition algorithm is presented. Secondly, a compact and high-speed BCD adder circuit with an improvement in time complexity of is proposed, where represents the number of digits and represents the number of bits in a digit.

The organization of this paper is as follows: In the next section, the earlier approaches and their limitations are described. In Section III, a novel BCD addition method is proposed. Then, the construction of BCD adder circuit is given. In Section IV, the simulation results and performance analysis of the proposed circuit are elucidated. Last of all, the paper is concluded in Section V.

Ii Literature Overview

In this section, various types of the latest existing LUT-based BCD adders are presented.

Ii-a Existing LUT-Based BCD Adders

BCD adder uses BCD numbers as input and output [8]. Since a 4-bit binary code has 16 different binary combinations, the addition of two BCD digits may produce incorrect result that exceeds the largest BCD digit [8] [9] [10]. In such cases, the result must be corrected by adding to guarantee that the result is a BCD digit. The resultant decimal carry output generated by the correction process is added to the next higher digit of the BCD addends.

Authors in paper [9] proposed a direct implementation of BCD adder circuit. They had proposed two different architectures for the construction of LUT-based BCD adder circuit. A truth table had been formed for each input/output combination and the corresponding circuit was proposed in first architecture. It consumed eleven 6-input LUTs. Two level of abstraction was performed for the second architecture. First least two significant input bits were fed into the first level of circuits whereas the rest input bits along with the output of the first level were provided to the second level of the circuits. The second approach required seven number of 6-input LUTs with a much delay. The direct implementation suffers a significant LUT-delay product.

The BCD adder proposed in [10] used Virtex-6 platform to implement their circuit architecture which had been proposed earlier in [11]. Gao et al. proposed a BCD adder, where the first bit of the addends are added using a full adder and the most significant three bits are added using 6-input LUTs [11]. A correction is ensured in 6-input LUTs by adding to the sum if the sum of the most significant three bits is greater than or equals to . Moreover, extra circuits are required, when the sum of the most significant three bits is and the carry generated from the full adder is one. The circuit being serial in architecture except the LUTs portions, requires much time complexity and delay which hinders faster output generation [11]. Bioul et al. proposed a BCD adder, where additions were performed in a carry chain type fashion and thus, suffered from a significant amount of delay [12]. They used Virtex-4 and Virtex-5 platform to show that the area overhead (in terms of required number of LUTs) with respect to binary computation is not negligible and it is around five times in Virtex-4 and nearly four times in Virtex-5. The main reason of such difference is due to the more complex definition of the carry propagate and carry generate functions.

Authors in [13] proposed a BCD addition method, where six is added as a correction factor, when the sum of equals or greater than 8, where and represent the most significant three bits of the input operands and , respectively. If the final output is (111), then a replacement of (111) with (100) is required as a final step for the exact BCD output. Vazquez et al. presented various carry chain BCD addition methods and their implementations on the LUT architecture [14]. As the carry-chain mechanisms being serial in architecture, the proposed methods in [14] require much delay which are surely a huge drawback.

A power and area-efficient BCD adder circuit was proposed by the authors in paper [15]. They actually used the circuit architecture exhibited in [1] and estimate the power consumption of the circuit. The delay has been calculated on a Virtex-5 platform by using 6-input LUT and the value obtained was 6.22 ns. The average power consumption of the circuit described in [15] was 25 mW which achieved a significant improvement over conventional LUT-based BCD adder. However, the method proposed in [15] required a total of 48 logic elements which can be optimized further.

Iii Proposed Design of LUT-Based BCD Adder

In this section, firstly a BCD addition algorithm is proposed. Then a new LUT-based BCD adder is constructed. Essential figures and lemmas are presented to clarify the proposed ideas.

Iii-a Proposed Parallel BCD Addition Method

The carry propagation is the main cause of delay of BCD adder circuit, which gives BCD adder a serial architecture. As the reduction of delay is one of the most important factor for the efficiency of the circuit, carry propagation mechanism needs to be removed for faster BCD addition. In this paper, a highly parallel BCD addition method is proposed with a tree-structured representation with significant reduction of delay. The proposed BCD addition method has mainly two steps which are as follows:

  • Bit-wise addition of the BCD addends produce the corresponding sum and carry in parallel. For the addition of first bit, the carry from the previous digit will be added too and the produced sum will be the direct first bit of the output.

  • If the most significant carry bit is zero then, except the first sum and last carry bit, add the other sum and carry bits in pair in parallel; and if the sum is greater than or equals to five, add three to the result to obtain the correct BCD output.

  • If the most significant carry bit is one then, update the final output values according to Equation 1 and 2.

Suppose, and be the two addends of a 1-digit BCD adder, where BCD representations of and are and , respectively. The output of the adder will be a 5-bit binary number , where represents the position of tens digit and symbolizes unit digit of BCD sum. and are added along with which is the carry from the previous digit addition. If it is the first digit addition, the carry will be considered as zero. The produced sum bit will be the direct first bit of the output. Other pairwise bits , , will be added simultaneously. The resultant sum and carry bits and are added pairwise providing output { } and corrected by addition of three according to the following Equation 1 and Equation 2:

(1)
(2)

In Table I, the truth table is designed with and as input and as the final BCD output by following required correction. , , , and are added pairwise as intermediate step, producing by considering carry always 1. A numeric is added to the intermediary output , if is greater than or equals to five. A similar table considering as 0 can be calculated which is shown in Table II. The truth tables verify the functions of each output of the LUTs of the BCD adder. The algorithm of -digit BCD addition method is presented in Algorithm 1.

Add 3
000 001 001 000 1 0010 - 0 0 1 0
000 010 010 000 1 0011 - 0 0 0 0
000 011 011 000 1 0100 - 0 0 0 0
000 100 100 000 1 0101 Add 3 1 0 0 0
001 001 000 001 1 0011 - 0 0 0 0
001 010 011 000 1 0100 - 0 0 0 0
001 011 010 001 1 0101 Add 3 1 0 0 0
001 100 101 000 1 0110 Add 3 1 0 0 1
. . . . .
. . . . .
. . . . .
100 001 101 000 1 0110 Add 3 1 0 0 1
100 010 110 000 1 0111 Add 3 1 0 1 0
100 011 111 000 1 1000 Add 3 1 0 1 1
100 100 000 100 1 1001 Add 3 1 1 0 0
  • ‘-’ Represents “No correction by adding 3 is required.”

TABLE I: The Truth Table of 1-Digit BCD Addition with
Add 3
000 001 001 000 0 0001 - 0 0 0 1
000 010 010 000 0 0010 - 0 0 1 0
000 011 011 000 0 0011 - 0 0 1 1
000 100 100 000 0 0100 - 0 1 0 0
001 001 000 001 0 0010 - 0 0 1 0
001 010 011 000 0 0011 - 0 0 1 1
001 011 010 001 0 0100 - 0 1 0 0
001 100 101 000 0 0101 Add 3 1 0 0 0
. . . . .
. . . . .
. . . . .
100 001 101 000 0 0101 Add 3 1 0 0 0
100 010 110 000 0 0110 Add 3 1 0 0 1
100 011 111 000 0 0111 Add 3 1 0 1 0
100 100 000 100 0 1000 Add 3 1 0 1 1
  • ‘-’ Represents “No correction by adding 3 is required.”

TABLE II: The Truth Table of 1-Digit BCD Addition with
Input: Two -digit BCD numbers and where and with ;
Output: Sum, where with and Carry, = ;
;
repeat
      and ;
      and ;
      and ;
      and in parallel;
      if () then
           ; ;
           if  then
               ; ;
                else
                    ;
                     end if
                    
                     else
                          ;
                          if   then
                              ;
                               end if
                              
                               end if
                              if () then
                                   
                                    else
                                        
                                         end if
                                        
                                         until ;
Algorithm 1 Proposed Algorithm for an -digit Parallel BCD Addition

Two example of BCD addition method using the proposed algorithm is demonstrated in Fig. 1 and 2, where and , respectively. Each step of the example is mapped to the corresponding algorithm step for more clarification.

Fig. 1: Example Demonstration of the Proposed BCD Addition Algorithm for .

Fig. 2: Example Demonstration of the Proposed BCD Addition Algorithm for .

The proposed BCD addition method can be represented as a tree-structure as it is parallel which is shown in Fig. 3. There are basically two operational levels of the tree. Starting from the inputs, in level 1, the bit-wise addition is performed and the intermediary resultants are obtained. Then, in level 2, the addition and correction are performed providing the correct BCD output. Hence, the time complexity of the proposed algorithm is logarithmic according to the operational depth of the tree. Lemma 3.1 is given to prove the time complexity of our proposed method. The time complexity of existing and proposed BCD adders are elucidated in Table III.

Fig. 3: Tree Structure Representation of the Proposed BCD Addition Method.
Method Time Complexity
Existing [11]
Existing [12]
Existing [13]
Existing [14]
Proposed
  • ’:“number of bits in a digit” and ‘’: “number of digits”.

TABLE III: Comparison of the Time Compleixities of the Proposed and Existing BCD Addition Methods

Lemma 3.1 The proposed BCD addition algorithm requires at least of time complexity, where is number of BCD digits and is the number of bits in a digit.
Proof The proposed BCD addition algorithm being parallel, can be represented as a tree structure where addends are the root node of the tree and child nodes are direct logic implementation circuits, addition with 3-correction logic circuits as well as the output selection circuits.

So, a directed graph can be constructed where,

and

.

It is obvious that, there exists exactly one pair of vertices of path length , which is the highest path length among any pair of vertices in the graph. So, the diameter of the graph is unique. Now, it is sufficient to prove that, the length of the diameter is where is the number of bits in a BCD digit.

Take any node and find the vertex which is furthest from it. Now, it will be shown that, the vertex found will be either or . Suppose, that the vertex found is (neither nor ). Two cases can be considered here

  1. suppose that is a node on path . Without loss of generality, let the path have no edges overlapping with the path. So, we find the distance of the paths as follows

    .

    But, from the shortest path algorithm (Dijkstra), we know that,

    .

    This contradicts the assumption that, is the unique diameter of the tree.

  2. let does not lie on the path from to . Now, either the path overlaps with the path or is disjoint. If there is overlap, consider the vertex which is the vertex closest to among the vertices which are the parts of the overlap. Without loss of generality, let the path have no edges overlapping with the path. So,

    From Dijkstra algorithm as we know,

    This once again contradicts the assumption of being the unique diameter of the tree.

If the paths do not overlap, there are vertices and on the and paths, respectively which are closest to each other.So,

But according to Dijkstra algorithm,

Hence, the assumption that is the diameter is contradicted. In each case, we have seen that there is a contradiction if is not one of or . Hence it follows that , the furthest vertex from , is either or . So,it is proved that the furthest vertex from is . Hence, while calculating the distance using DFS algorithm, we actually find the diameter of the tree in the second run of DFS. Since the diameter is unique, the cost of traversing from to is . For a -digit BCD adder, the time complexity becomes .

Iii-B Proposed Parallel BCD Adder Circuit Using LUT

A LUT-based BCD adder is designed by using the proposed BCD addition algorithm and LUT architecture. An algorithm for the construction of proposed BCD adder circuit is presented in Algorithm 2. According to the algorithm, the circuit is depicted in Fig. 4. For the addition of the least significant bit with carry from the previous digit addition, a full adder is used. Three half-adders are used for individual bit-wise addition operation of the most significant three bits. Depending on the value of , Equation 1 and Equation 2 are followed in the proposed circuit architecture by using the transistors and LUTs, where four number of 6-input LUTs are used to add the output from the half-adders and full adder {} with the correction by adding 3, if the sum is greater than or equals to five. Depending on the value of , a switching circuit is used to follow Equation 3. The proposed circuit gains huge delay reduction due to its parallel working mechanism compared to existing BCD adder circuits.

By using the proposed 1-digit BCD adder circuit, we can easily create an -digit BCD adder circuit, where the of one digit adder circuit is sent to the next digit of the BCD adder circuit as a . Therefore, the generalized -digit BCD adder computes sequentially by using the previous carry, the block diagram of which is shown in Fig. 5.

(3)
Input: Two 1-digit BCD numbers and ;
Output: Sum and Carry = ;
Apply a full adder circuit where Input:= and Output:= ;
;
repeat
      Apply a half adder circuit where Input:= and Output:= ;
     
      until  ;
     if () then
           ; ;
           if  then
               ; ;
                else
                    ; ;
                     end if
                    
                     else
                          Apply four 6-input LUTs where each LUT’s Input:=
                          and combined Output:= ;
                         
                          end if
                         ;
                          repeat
                               Apply a switching circuit where Input:= and Output:= ;
                              
                               until  ;
                              Apply fourth switching circuit where Input:= and Output:= ;
Algorithm 2 Proposed Algorithm for the Construction of an 1-Digit BCD Adder Circuit

Fig. 4: Proposed 1-Digit BCD Adder Circuit.

Fig. 5: Block Diagram of the Proposed -Digit BCD Adder Circuit.

Iv Simulation Results and Performance Analysis

As the BCD adder circuits being compared contain different types of logic gates and logic modules, it is better to preserve the basic modules as described in the architectures as long as they correspond to the commonly available cells in a typical standard cell library. The area and delay of the proposed BCD adder circuits are derived and expressed in terms of the area and critical path delay of the basic logic modules that can be found in a typical standard cell library for different operator sizes. These theoretical estimates are then calibrated by the basic logic modules from CMOS 45 nm open cell library [16]. Table IV shows the area and critical path delay of basic logic gates. In this table, we have taken the core logic gates such as inverter, 2-input AND, OR and EX-OR gates. Table V calculates the area and critical path delay of some logic modules such as full adder, half adder and multiplexer by using the Table IV. It is required to mention that, the area has been calculated in terms of number of transistors.

Basic Logic Gates Area (in transistors) Critical Path Delay (ns)
Inverter (INV) 1 1
2-input AND 6 4.68
2-input OR 6 4.5
2-input EX-OR 8 4.72
TABLE IV: Area and Critical Path Delay of Basic Logic Gates
Elements Area (in transistors) Critical Path Delay (ns)
2-to-1 Multiplexer (MUX) 20 10.18
Half Adder (HA) 14 4.72
Full Adder (FA) 34 13.9
TABLE V: Area and Critical Path Delay of Basic Logic Modules

The area complexity of the proposed BCD adder is derived from its basic logic modules. The proposed BCD adder requires three half adders, one full adder, four 6-input LUTs, six inverters and twenty six transistors. Thus, the total area of the proposed BCD adder () can be determined as follows:

(4)

Table VI shows the comparison among the proposed and existing BCD adders in terms of area. It is evident from Table VI that the proposed design requires 108 transistors and four 6-input LUTs whereas the best known existing methods [10] [11] require 132 transistors and four 6-input LUTs. Thus the proposed BCD adder gains an improvement of 18.18% in terms of area for pre-layout simulation result. Similarly, the critical path delay of the proposed BCD adder contains one full adder, one 6-input LUT, two inverters and two transistors. Therefore, the critical path delay of the proposed BCD adder () can be calculated as follows:

(5)

Table VII shows the comparison among the proposed and existing BCD adders in terms of critical path delay. It is shown from Table VII that the proposed BCD adder requires 41.8 ns of delay whereas the best known existing methods [10] [11] require 69.56 ns of delay. Therefore the proposed BCD adder achieves an improvement of 39.9% in terms of critical path delay in pre-layout simulation result.

Method Area Expression Area* LUT Count
Gao et al [10] [11]
(1 + 3 +
3 +2
+ 2 +4 )
132 4
Bioul et al[12] (8 + 6 ) 120 8
Vazquez et al [13]
( 5 + 4
+ 4 + 2 +
2 )
134 5
Vazquez et al [14]
( 8 +
7 + 8 )
204 8
Proposed
(3 + 1 +
4 + 6 +
26 )
108 4
  • ‘*’ Represents “Area has been calculated in terms of transistors.”

TABLE VI: Comparison of Area among the Existing and the Proposed -Digit BCD Adders for Pre-Layout Simulation
Method Delay Expression Critical Path Delay (ns)
Gao et al [10] [11]
(1 + 2 +
1 +1 +
1 +1 )
69.56
Bioul et al [12] (4 + 4 ) 140.72
Vazquez et al [13]
(1 + 4 +
2 +1 +
1 )
80.74
Vazquez et al [14]
(4 + 4
+ 6 )
168.64
Proposed
(1 + 1 +
2 + 2 )
41.8
TABLE VII: Comparison of Delay among the Existing and the Proposed -Digit BCD Adders for Pre-Layout Simulation

Iv-a FPGA Implementation and Post-Layout Simulation Results

Fig. 6: Simulation Result of BCD Adder with Intermediate Carry C1= 1.

Fig. 7: Simulation Result of BCD Adder with Intermediate Carry C1= 0.

The proposed BCD adder was coded in VHDL and implemented in a Virtex-6 XC6VLX75T Xilinx FPGA with a -3 speed grade using by ISE 13.1. The results are compared with the earlier approaches proposed in [11]-[14] by using the same experimental setup for fair comparison. The delays were extracted from Postplacement-and-Routing Static Timing Report and the LUTs usage was obtained from Place-and-Routing Report. Besides, the simulations of the proposed BCD adder are demonstrated in Fig. 6 and Fig. 7 with carry 1 and 0, respectively.

The proposed BCD adder is high-speed due to its less time complexity with optimum critical path delay and cost-efficient due to its area and area-delay product efficiency. Comparison of area, delay and area-delay product among existing [11]-[14] and the proposed BCD adder circuits for various number of input digits are shown in graphical representation in Fig. 8, Fig. 9 and Fig. 10, respectively with improvement of 20%, 41.32% and 53.06% in terms of area, delay and area-delay product, respectively compared to the existing best method [10] [11]. It is to be noted that, the results shown in Fig. 8, Fig. 9 and Fig. 10 for earlier approaches [11]-[14] have been re-implemented by using Virtex-6 platform.

Fig. 8: Graphical Analysis of Area of Existing and Proposed BCD Adder Circuits for Post-Layout Simulation.

Fig. 9: Graphical Analysis of Delay of Existing and Proposed BCD Adder Circuits for Post-Layout Simulation.

Fig. 10: Graphical Analysis of Area-Delay Product of Existing and Proposed BCD Adder Circuits.

V Conclusion

In twenty years, reconfigurable computing has grown from a wild, exploratory idea to a viable alternative to Application-Specific Integrated Circuits (ASICs) and fixed microprocessors in our computing systems. Besides, BCD (Binary Coded Decimal) addition being the basic arithmetical operation, it is the main focus. The proposed BCD adder is highly parallel, which mitigates the significant carry propagation delay of addition operation. The proposed BCD adder circuit is not only faster but also area-efficient compared to the existing best known circuit. The pre-layout simulation provides 18.18% and 39.9% efficiency in terms of area and critical path delay reduction, respectively compared to the existing best known BCD adder circuit. The proposed BCD adder circuit is simulated using Xilinx Virtex-6. The correctness and efficiency of the circuit is proved in the proposed section and simulation section using corresponding tables, figures and lemma. It is shown by the comparative analysis that the proposed BCD adder is 20% and 41.3% improved in terms of area and delay, respectively compared to the existing best known adder circuit along with 53.06% improvement in area-delay product. These improvements in FPGA-based BCD addition will consequently influence the advancement in computation and manipulation of decimal digits, as it is more convenient to convert from decimal to BCD than binary. Besides, FPGA implementation will be beneficial to be applied in bit-wise manipulation, private key encryption and decryption acceleration, heavily pipe-lined and parallel computation of NP-hard problems, automatic target generation and many more applications [4] [5].

Acknowledgment

Zarrin Tasnim Sworna and Mubin Ul Haque has been granted fellowship from the Ministry of Information and Technology, People’s Republic of Bangladesh under the program of higher studies and research with the reference no. 56.00.0000.028.33.058.15-629.

References