I Introduction
BCD (Binary Coded Decimal) representation is advantageous due to its finite place value representation, rounding, easy scaling by a factor of 10, simple alignment and conversion to character form [1] [2]. It is highly used in embedded applications, digital communication and financial calculations [3] [4]. Hence, faster and efficient BCD addition method is desired. In this paper, a
-digit addition method is proposed which omits the complex manipulation steps, reducing area and delay of the circuit. The application of FPGA in cryptography, NP (Non Polynomial)-Hard optimization problems, pattern matching, bioinformatics, floating point arithmetic, molecular dynamics is increasing radically
[5] [6] [7]. Due to re-configurable capabilities, FPGA implementation of BCD addition is of concern. LUT being one of the main components of FPGA, a LUT-based adder circuit is proposed.Two main contributions are addressed in this paper. Firstly, a new tree-based parallel BCD addition algorithm is presented. Secondly, a compact and high-speed BCD adder circuit with an improvement in time complexity of is proposed, where represents the number of digits and represents the number of bits in a digit.
The organization of this paper is as follows: In the next section, the earlier approaches and their limitations are described. In Section III, a novel BCD addition method is proposed. Then, the construction of BCD adder circuit is given. In Section IV, the simulation results and performance analysis of the proposed circuit are elucidated. Last of all, the paper is concluded in Section V.
Ii Literature Overview
In this section, various types of the latest existing LUT-based BCD adders are presented.
Ii-a Existing LUT-Based BCD Adders
BCD adder uses BCD numbers as input and output [8]. Since a 4-bit binary code has 16 different binary combinations, the addition of two BCD digits may produce incorrect result that exceeds the largest BCD digit [8] [9] [10]. In such cases, the result must be corrected by adding to guarantee that the result is a BCD digit. The resultant decimal carry output generated by the correction process is added to the next higher digit of the BCD addends.
Authors in paper [9] proposed a direct implementation of BCD adder circuit. They had proposed two different architectures for the construction of LUT-based BCD adder circuit. A truth table had been formed for each input/output combination and the corresponding circuit was proposed in first architecture. It consumed eleven 6-input LUTs. Two level of abstraction was performed for the second architecture. First least two significant input bits were fed into the first level of circuits whereas the rest input bits along with the output of the first level were provided to the second level of the circuits. The second approach required seven number of 6-input LUTs with a much delay. The direct implementation suffers a significant LUT-delay product.
The BCD adder proposed in [10] used Virtex-6 platform to implement their circuit architecture which had been proposed earlier in [11]. Gao et al. proposed a BCD adder, where the first bit of the addends are added using a full adder and the most significant three bits are added using 6-input LUTs [11]. A correction is ensured in 6-input LUTs by adding to the sum if the sum of the most significant three bits is greater than or equals to . Moreover, extra circuits are required, when the sum of the most significant three bits is and the carry generated from the full adder is one. The circuit being serial in architecture except the LUTs portions, requires much time complexity and delay which hinders faster output generation [11]. Bioul et al. proposed a BCD adder, where additions were performed in a carry chain type fashion and thus, suffered from a significant amount of delay [12]. They used Virtex-4 and Virtex-5 platform to show that the area overhead (in terms of required number of LUTs) with respect to binary computation is not negligible and it is around five times in Virtex-4 and nearly four times in Virtex-5. The main reason of such difference is due to the more complex definition of the carry propagate and carry generate functions.
Authors in [13] proposed a BCD addition method, where six is added as a correction factor, when the sum of equals or greater than 8, where and represent the most significant three bits of the input operands and , respectively. If the final output is (111), then a replacement of (111) with (100) is required as a final step for the exact BCD output. Vazquez et al. presented various carry chain BCD addition methods and their implementations on the LUT architecture [14]. As the carry-chain mechanisms being serial in architecture, the proposed methods in [14] require much delay which are surely a huge drawback.
A power and area-efficient BCD adder circuit was proposed by the authors in paper [15]. They actually used the circuit architecture exhibited in [1] and estimate the power consumption of the circuit. The delay has been calculated on a Virtex-5 platform by using 6-input LUT and the value obtained was 6.22 ns. The average power consumption of the circuit described in [15] was 25 mW which achieved a significant improvement over conventional LUT-based BCD adder. However, the method proposed in [15] required a total of 48 logic elements which can be optimized further.
Iii Proposed Design of LUT-Based BCD Adder
In this section, firstly a BCD addition algorithm is proposed. Then a new LUT-based BCD adder is constructed. Essential figures and lemmas are presented to clarify the proposed ideas.
Iii-a Proposed Parallel BCD Addition Method
The carry propagation is the main cause of delay of BCD adder circuit, which gives BCD adder a serial architecture. As the reduction of delay is one of the most important factor for the efficiency of the circuit, carry propagation mechanism needs to be removed for faster BCD addition. In this paper, a highly parallel BCD addition method is proposed with a tree-structured representation with significant reduction of delay. The proposed BCD addition method has mainly two steps which are as follows:
-
Bit-wise addition of the BCD addends produce the corresponding sum and carry in parallel. For the addition of first bit, the carry from the previous digit will be added too and the produced sum will be the direct first bit of the output.
-
If the most significant carry bit is zero then, except the first sum and last carry bit, add the other sum and carry bits in pair in parallel; and if the sum is greater than or equals to five, add three to the result to obtain the correct BCD output.
-
If the most significant carry bit is one then, update the final output values according to Equation 1 and 2.
Suppose, and be the two addends of a 1-digit BCD adder, where BCD representations of and are and , respectively. The output of the adder will be a 5-bit binary number , where represents the position of tens digit and symbolizes unit digit of BCD sum. and are added along with which is the carry from the previous digit addition. If it is the first digit addition, the carry will be considered as zero. The produced sum bit will be the direct first bit of the output. Other pairwise bits , , will be added simultaneously. The resultant sum and carry bits and are added pairwise providing output { } and corrected by addition of three according to the following Equation 1 and Equation 2:
(1) |
(2) |
In Table I, the truth table is designed with and as input and as the final BCD output by following required correction. , , , and are added pairwise as intermediate step, producing by considering carry always 1. A numeric is added to the intermediary output , if is greater than or equals to five. A similar table considering as 0 can be calculated which is shown in Table II. The truth tables verify the functions of each output of the LUTs of the BCD adder. The algorithm of -digit BCD addition method is presented in Algorithm 1.
Add 3 | ||||||||||
000 | 001 | 001 | 000 | 1 | 0010 | - | 0 | 0 | 1 | 0 |
000 | 010 | 010 | 000 | 1 | 0011 | - | 0 | 0 | 0 | 0 |
000 | 011 | 011 | 000 | 1 | 0100 | - | 0 | 0 | 0 | 0 |
000 | 100 | 100 | 000 | 1 | 0101 | Add 3 | 1 | 0 | 0 | 0 |
001 | 001 | 000 | 001 | 1 | 0011 | - | 0 | 0 | 0 | 0 |
001 | 010 | 011 | 000 | 1 | 0100 | - | 0 | 0 | 0 | 0 |
001 | 011 | 010 | 001 | 1 | 0101 | Add 3 | 1 | 0 | 0 | 0 |
001 | 100 | 101 | 000 | 1 | 0110 | Add 3 | 1 | 0 | 0 | 1 |
. | . | . | . | . | ||||||
. | . | . | . | . | ||||||
. | . | . | . | . | ||||||
100 | 001 | 101 | 000 | 1 | 0110 | Add 3 | 1 | 0 | 0 | 1 |
100 | 010 | 110 | 000 | 1 | 0111 | Add 3 | 1 | 0 | 1 | 0 |
100 | 011 | 111 | 000 | 1 | 1000 | Add 3 | 1 | 0 | 1 | 1 |
100 | 100 | 000 | 100 | 1 | 1001 | Add 3 | 1 | 1 | 0 | 0 |
-
‘-’ Represents “No correction by adding 3 is required.”
Add 3 | ||||||||||
000 | 001 | 001 | 000 | 0 | 0001 | - | 0 | 0 | 0 | 1 |
000 | 010 | 010 | 000 | 0 | 0010 | - | 0 | 0 | 1 | 0 |
000 | 011 | 011 | 000 | 0 | 0011 | - | 0 | 0 | 1 | 1 |
000 | 100 | 100 | 000 | 0 | 0100 | - | 0 | 1 | 0 | 0 |
001 | 001 | 000 | 001 | 0 | 0010 | - | 0 | 0 | 1 | 0 |
001 | 010 | 011 | 000 | 0 | 0011 | - | 0 | 0 | 1 | 1 |
001 | 011 | 010 | 001 | 0 | 0100 | - | 0 | 1 | 0 | 0 |
001 | 100 | 101 | 000 | 0 | 0101 | Add 3 | 1 | 0 | 0 | 0 |
. | . | . | . | . | ||||||
. | . | . | . | . | ||||||
. | . | . | . | . | ||||||
100 | 001 | 101 | 000 | 0 | 0101 | Add 3 | 1 | 0 | 0 | 0 |
100 | 010 | 110 | 000 | 0 | 0110 | Add 3 | 1 | 0 | 0 | 1 |
100 | 011 | 111 | 000 | 0 | 0111 | Add 3 | 1 | 0 | 1 | 0 |
100 | 100 | 000 | 100 | 0 | 1000 | Add 3 | 1 | 0 | 1 | 1 |
-
‘-’ Represents “No correction by adding 3 is required.”
Two example of BCD addition method using the proposed algorithm is demonstrated in Fig. 1 and 2, where and , respectively. Each step of the example is mapped to the corresponding algorithm step for more clarification.
The proposed BCD addition method can be represented as a tree-structure as it is parallel which is shown in Fig. 3. There are basically two operational levels of the tree. Starting from the inputs, in level 1, the bit-wise addition is performed and the intermediary resultants are obtained. Then, in level 2, the addition and correction are performed providing the correct BCD output. Hence, the time complexity of the proposed algorithm is logarithmic according to the operational depth of the tree. Lemma 3.1 is given to prove the time complexity of our proposed method. The time complexity of existing and proposed BCD adders are elucidated in Table III.
Method | Time Complexity |
---|---|
Existing [11] | |
Existing [12] | |
Existing [13] | |
Existing [14] | |
Proposed |
-
‘’:“number of bits in a digit” and ‘’: “number of digits”.
Lemma 3.1
The proposed BCD addition algorithm requires at least of time complexity, where is number of BCD digits and is the number of bits in a digit.
Proof
The proposed BCD addition algorithm being parallel, can be represented as a tree structure where addends are the root node of the tree and child nodes are direct logic implementation circuits, addition with 3-correction logic circuits as well as the output selection circuits.
So, a directed graph can be constructed where,
and
.
It is obvious that, there exists exactly one pair of vertices of path length , which is the highest path length among any pair of vertices in the graph. So, the diameter of the graph is unique. Now, it is sufficient to prove that, the length of the diameter is where is the number of bits in a BCD digit.
Take any node and find the vertex which is furthest from it. Now, it will be shown that, the vertex found will be either or . Suppose, that the vertex found is (neither nor ). Two cases can be considered here
-
suppose that is a node on path . Without loss of generality, let the path have no edges overlapping with the path. So, we find the distance of the paths as follows
.
But, from the shortest path algorithm (Dijkstra), we know that,
.
This contradicts the assumption that, is the unique diameter of the tree.
-
let does not lie on the path from to . Now, either the path overlaps with the path or is disjoint. If there is overlap, consider the vertex which is the vertex closest to among the vertices which are the parts of the overlap. Without loss of generality, let the path have no edges overlapping with the path. So,
From Dijkstra algorithm as we know,
This once again contradicts the assumption of being the unique diameter of the tree.
If the paths do not overlap, there are vertices and on the and paths, respectively which are closest to each other.So,
But according to Dijkstra algorithm,
Hence, the assumption that is the diameter is contradicted. In each case, we have seen that there is a contradiction if is not one of or . Hence it follows that , the furthest vertex from , is either or . So,it is proved that the furthest vertex from is . Hence, while calculating the distance using DFS algorithm, we actually find the diameter of the tree in the second run of DFS. Since the diameter is unique, the cost of traversing from to is . For a -digit BCD adder, the time complexity becomes .
Iii-B Proposed Parallel BCD Adder Circuit Using LUT
A LUT-based BCD adder is designed by using the proposed BCD addition algorithm and LUT architecture. An algorithm for the construction of proposed BCD adder circuit is presented in Algorithm 2. According to the algorithm, the circuit is depicted in Fig. 4. For the addition of the least significant bit with carry from the previous digit addition, a full adder is used. Three half-adders are used for individual bit-wise addition operation of the most significant three bits. Depending on the value of , Equation 1 and Equation 2 are followed in the proposed circuit architecture by using the transistors and LUTs, where four number of 6-input LUTs are used to add the output from the half-adders and full adder {} with the correction by adding 3, if the sum is greater than or equals to five. Depending on the value of , a switching circuit is used to follow Equation 3. The proposed circuit gains huge delay reduction due to its parallel working mechanism compared to existing BCD adder circuits.
By using the proposed 1-digit BCD adder circuit, we can easily create an -digit BCD adder circuit, where the of one digit adder circuit is sent to the next digit of the BCD adder circuit as a . Therefore, the generalized -digit BCD adder computes sequentially by using the previous carry, the block diagram of which is shown in Fig. 5.
(3) |
Iv Simulation Results and Performance Analysis
As the BCD adder circuits being compared contain different types of logic gates and logic modules, it is better to preserve the basic modules as described in the architectures as long as they correspond to the commonly available cells in a typical standard cell library. The area and delay of the proposed BCD adder circuits are derived and expressed in terms of the area and critical path delay of the basic logic modules that can be found in a typical standard cell library for different operator sizes. These theoretical estimates are then calibrated by the basic logic modules from CMOS 45 nm open cell library [16]. Table IV shows the area and critical path delay of basic logic gates. In this table, we have taken the core logic gates such as inverter, 2-input AND, OR and EX-OR gates. Table V calculates the area and critical path delay of some logic modules such as full adder, half adder and multiplexer by using the Table IV. It is required to mention that, the area has been calculated in terms of number of transistors.
Basic Logic Gates | Area (in transistors) | Critical Path Delay (ns) |
---|---|---|
Inverter (INV) | 1 | 1 |
2-input AND | 6 | 4.68 |
2-input OR | 6 | 4.5 |
2-input EX-OR | 8 | 4.72 |
Elements | Area (in transistors) | Critical Path Delay (ns) |
---|---|---|
2-to-1 Multiplexer (MUX) | 20 | 10.18 |
Half Adder (HA) | 14 | 4.72 |
Full Adder (FA) | 34 | 13.9 |
The area complexity of the proposed BCD adder is derived from its basic logic modules. The proposed BCD adder requires three half adders, one full adder, four 6-input LUTs, six inverters and twenty six transistors. Thus, the total area of the proposed BCD adder () can be determined as follows:
(4) |
Table VI shows the comparison among the proposed and existing BCD adders in terms of area. It is evident from Table VI that the proposed design requires 108 transistors and four 6-input LUTs whereas the best known existing methods [10] [11] require 132 transistors and four 6-input LUTs. Thus the proposed BCD adder gains an improvement of 18.18% in terms of area for pre-layout simulation result. Similarly, the critical path delay of the proposed BCD adder contains one full adder, one 6-input LUT, two inverters and two transistors. Therefore, the critical path delay of the proposed BCD adder () can be calculated as follows:
(5) |
Table VII shows the comparison among the proposed and existing BCD adders in terms of critical path delay. It is shown from Table VII that the proposed BCD adder requires 41.8 ns of delay whereas the best known existing methods [10] [11] require 69.56 ns of delay. Therefore the proposed BCD adder achieves an improvement of 39.9% in terms of critical path delay in pre-layout simulation result.
Method | Area Expression | Area* | LUT Count | |||
---|---|---|---|---|---|---|
Gao et al [10] [11] |
|
132 | 4 | |||
Bioul et al[12] | (8 + 6 ) | 120 | 8 | |||
Vazquez et al [13] |
|
134 | 5 | |||
Vazquez et al [14] |
|
204 | 8 | |||
Proposed |
|
108 | 4 |
-
‘*’ Represents “Area has been calculated in terms of transistors.”
Method | Delay Expression | Critical Path Delay (ns) | |||
---|---|---|---|---|---|
Gao et al [10] [11] |
|
69.56 | |||
Bioul et al [12] | (4 + 4 ) | 140.72 | |||
Vazquez et al [13] |
|
80.74 | |||
Vazquez et al [14] |
|
168.64 | |||
Proposed |
|
41.8 |
Iv-a FPGA Implementation and Post-Layout Simulation Results
The proposed BCD adder was coded in VHDL and implemented in a Virtex-6 XC6VLX75T Xilinx FPGA with a -3 speed grade using by ISE 13.1. The results are compared with the earlier approaches proposed in [11]-[14] by using the same experimental setup for fair comparison. The delays were extracted from Postplacement-and-Routing Static Timing Report and the LUTs usage was obtained from Place-and-Routing Report. Besides, the simulations of the proposed BCD adder are demonstrated in Fig. 6 and Fig. 7 with carry 1 and 0, respectively.
The proposed BCD adder is high-speed due to its less time complexity with optimum critical path delay and cost-efficient due to its area and area-delay product efficiency. Comparison of area, delay and area-delay product among existing [11]-[14] and the proposed BCD adder circuits for various number of input digits are shown in graphical representation in Fig. 8, Fig. 9 and Fig. 10, respectively with improvement of 20%, 41.32% and 53.06% in terms of area, delay and area-delay product, respectively compared to the existing best method [10] [11]. It is to be noted that, the results shown in Fig. 8, Fig. 9 and Fig. 10 for earlier approaches [11]-[14] have been re-implemented by using Virtex-6 platform.
V Conclusion
In twenty years, reconfigurable computing has grown from a wild, exploratory idea to a viable alternative to Application-Specific Integrated Circuits (ASICs) and fixed microprocessors in our computing systems. Besides, BCD (Binary Coded Decimal) addition being the basic arithmetical operation, it is the main focus. The proposed BCD adder is highly parallel, which mitigates the significant carry propagation delay of addition operation. The proposed BCD adder circuit is not only faster but also area-efficient compared to the existing best known circuit. The pre-layout simulation provides 18.18% and 39.9% efficiency in terms of area and critical path delay reduction, respectively compared to the existing best known BCD adder circuit. The proposed BCD adder circuit is simulated using Xilinx Virtex-6. The correctness and efficiency of the circuit is proved in the proposed section and simulation section using corresponding tables, figures and lemma. It is shown by the comparative analysis that the proposed BCD adder is 20% and 41.3% improved in terms of area and delay, respectively compared to the existing best known adder circuit along with 53.06% improvement in area-delay product. These improvements in FPGA-based BCD addition will consequently influence the advancement in computation and manipulation of decimal digits, as it is more convenient to convert from decimal to BCD than binary. Besides, FPGA implementation will be beneficial to be applied in bit-wise manipulation, private key encryption and decryption acceleration, heavily pipe-lined and parallel computation of NP-hard problems, automatic target generation and many more applications [4] [5].
Acknowledgment
Zarrin Tasnim Sworna and Mubin Ul Haque has been granted fellowship from the Ministry of Information and Technology, People’s Republic of Bangladesh under the program of higher studies and research with the reference no. 56.00.0000.028.33.058.15-629.
References
- [1] Al-Khaleel, Osama, Mohammad Al-Khaleel, Zakaria Al-QudahJ, Christos A. Papachristou, Khaldoon Mhaidat, and Francis G. Wolff. “Fast binary/decimal adder/subtractor with a novel correction-free BCD addition.” In Electronics, Circuits and Systems (ICECS), 2011 18th IEEE International Conference on, pp. 455-459. IEEE, 2011.
- [2] Sundaresan, C., C. V. S. Chaitanya, P. R. Venkateswaran, Somashekara Bhat, and J. Mohan Kumar. “High speed BCD adder.” In Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science, pp. 113-118. Springer, Berlin Heidelberg, 2012. DOI: 10.1007/978-3-642-28308-6_15.
- [3] Z. T. Sworna, M. U. Haque and H. M. H. Babu. “A LUT-based matrix multiplication using neural networks.” 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, Canada, 2016, pp. 1982-1985. DOI: 10.1109/ISCAS.2016.7538964.
- [4] Z. T. Sworna, M. U. Haque, N. Tara, H. M. H. Babu and A. K. Biswas. “Low-power and area efficient binary coded decimal adder design using a look up table-based field programmable gate array.” IET (The Institution of Engineering and Technology) Circuits, Devices & Systems, 2015, volume: 10, issue: 3, pp. 1-10. DOI: 10.1049/iet-cds.2015.0213.
- [5] Pocek, Kenneth, Russell Tessier, and André DeHon. “Birth and adolescence of reconfigurable computing: A survey of the first 20 years of field-programmable custom computing machines.” In Field-Programmable Custom Computing Machines (FCCM), 2013 IEEE 21st Annual International Symposium on, pp. 1-17. IEEE, Seattle, WA, USA, 2013. DOI: 10.1109/FPGA.2013.6882273.
- [6] Han, Liu, and Seok-Bum Ko. “High-speed parallel decimal multiplication with redundant internal encodings.” IEEE Transactions on Computers 62, no. 5 (2013): 956-968. DOI: 10.1109/TC.2012.35.
- [7] Ning, Yonghai, Zongqiang Guo, Sen Shen, and Bo Peng. “Design of data acquisition and storage system based on the FPGA.” Procedia Engineering 29 (2012): 2927-2931, Elsevier. DOI: 10.1016/j.proeng.2012.01.416
- [8] G. Sutter, E. Todorovich, G. Bioul, M. Vazquez, and J.-P. Deschamps. 2009. “FPGA Implementations of BCD Multipliers”. In International Conference on Reconfigurable Computing and FPGAs, Quintana Roo, Mexico, 2009. ReConFig ’09, pp: 36–41. DOI: 10.1109/ReConFig.2009.28.
- [9] O.D. Al-Khaleel, N.H. Tulic, and K.M. Mhaidat. “FPGA implementation of binary coded decimal digit adders and multipliers.” In 8th International Symposium on Mechatronics and its Applications (ISMA), harjah, U.A.E 2012. DOI: 10.1109/ISMA.2012.6215199.
- [10] Gao, Shuli, Dhamin Al-Khalili, J. M. Langlois, and Noureddine Chabini. “Efficient Realization of BCD Multipliers Using FPGAs.” International Journal of Reconfigurable Computing 2017 (2017). DOI: 10.1155/2017/2410408.
- [11] ShuliGao, D. Al-Khalili, and N. Chabini. 2012. “An improved BCD adder using 6-LUT FPGAs.” In 10th International Conference on New Circuits and Systems (NEWCAS), 2012 IEEE , pp: 13–16. DOI: 10.1109/NEWCAS.2012.6328944.
- [12] G. Bioul, M. Vazquez, J. P. Deschamps, and G. Sutter. 2010. “High-speed FPGA 10’s Complement Adders-subtractors.” Int. J. Reconfig. Comput. 2010, Article 4 (Jan. 2010), DOI: 10.1155/2010/219764.
- [13] Alvaro Vazquez and Florent De Dinechin. 2010. “Multi-operand Decimal Adder Trees for FPGAs.” Research Report RR-7420. 20 pages. DOI: hal.inria.fr/inria-00526327.
- [14] M. Vazquez, G. Sutter, G. Bioul, and J.P. Deschamps. 2009. “Decimal Adders/Subtractors in FPGA: Efficient 6-input LUT Implementations.” In Reconfigurable Computing and FPGAs, 2009. ReConFig ’09. International Conference on. 42–47. DOI: 10.1109/ReConFig.2009.29.
- [15] Mishra, Shambhavi, and Gaurav Verma. Low power and area efficient implementation of BCD Adder on FPGA.” In Signal Processing and Communication (ICSC), 2013 International Conference on, pp. 461-465. IEEE, Noida, India, 2013. DOI: 10.1109/ICSPCom.2013.6719834.
- [16] “CMOS 45 nm Open Cell Library”. Available at http://www.si2.org/openeda.si2.org/projects/nangatelib Last access date: 14 March, 2017.