I Introduction
Many papers have been published on the design and performance of ternary circuits at different levels: inverters, basic gates, arithmetic circuits, flipflops and SRAMs, etc. Some of these papers will be quoted in this text. However, these papers generally do not compare the ternary circuits with the corresponding binary ones. Having presenting such a comparison in 1988 [1], we want to do it again more than 30 years later to determine if ternary circuits can compete with the binary ones. As ternary wires carry more information, N bits are approximately equivalent to M trits according to the relation M = N/IR where IR = log(3)/log(2) = 1.585 is the information ratio per wire. Table I presents the correspondence for some values of N. Due to rounding, N/M ranges from 1.5 to 1.6 in Table I.
We restrict the comparison to the arithmetic circuits, which are typical implementations of combinational logic. We compare ternary and binary circuits having approximately the same computing capability. For adders or multipliers, it means comparing Nbit circuits with Mtrit circuits according to Table I. Adders and multipliers are examined. It turns out that their structure depends on two basic circuits: the 1bit or 1trit full adder (FA), and the 1bit or 1trit multiplier.

The adders use 1bit/1trit full adders

The multipliers use 1bit/1trit multipliers and 1bit/1trit full adders and half adders
The rest of the paper is organized as follow:

Section 2 compares the number of 1bit full adders to implement Nbit adders with the number of 1trit full adders to implement Mtrit adders

Section 3 compares the number of 1bit multipliers and 1bit full adders to implement an N*N bit multiplier with the number of 1trit multipliers and 1trit full adders to implement a M*M trit multiplier.

Section 4 defines the methodology to compare the complexity of binary and ternary circuits

Section 5 compares the hardware complexity of 1bit full adder and 1trit full adder and the overall complexity of Nbit and Mtrit adders.

Section 6 compares the hardware complexity of 1bit multiplier and 1trit multiplier and the overall complexity of N x N bit multipliers and M x M trit multipliers.

A conclusion summarizes the results of the comparison.
Number of bits  8  16  32  64 
Number of trits  5  11  21  41 
Ii Adders
Iia CarryPropagate Adders (CPAs)
IiA1 The carry propagate approach
The most straightforward implementation of Nbit or Mtrit adders is the carry propagate scheme, also called “RippleCarry Adder". It is presented in Figure 1 for a 4digit adder. For the binary version, Ai, Bi and Ci are binary values. For the ternary version, Ai and Bi are ternary values, while Ci are binary values. In the literature, ternary adders are sometimes presented with ternary carries. We consider them as ternary compressors that could be used in Wallace trees for multiplication. For ternary additions, the carries are always binary.
IiA2 8bit CPA versus 5trit CPA
An 8bit CPA uses eight 1bit full adder while a 5trit CPA uses five 1trit full adder. The comparison is straightforward: a 5trit multiplier will be more efficient iff the 1trit adder complexity is no more than x8/5 the 1bit adder. The result is the same for any Mtrit adder compared to a Nbit adder for which . Obviously, there are different other techniques to implement fast adders. Considering all the possible versions is out of the scope of this paper. We just consider two other schemes which purpose is to speedup the carry propagation.
IiB Carry Lookahead Adders (CLAs)
Figure 2 presents a 4bit carry lookahead adder. This adder has the same number of full adders than the CPA. The binary equations of the carry computation part are wellknown:
or
For the ternary implementation, Gi and Pi are binary functions of ternary inputs. The computation of C1, C2, C3 and C4 uses binary circuits.

For stage i, a carry is generated when (Ai = 2 and Bi = 1) or (Ai = 1 and Bi = 2).

For stage i, a carry is propagated when (Ai = 0 and Bi = 2) or (Ai = 1 and Bi = 1) or (Ai = 2 and Bi = 0).
Just as CPAs, an 8bit CLA uses eight 1bit full adders while a 5trit CLA uses five 1trit full adders. As the binary carry lookahead computation complexity increases with the carry index, an 8bit CLA is generally implemented as a cascade of two 4bit CLAs. The carry computation part is thus two times the carry computation of a 4bit CLA. The computation of the carry lookahead part of a 5trit CLA can be implemented as a block of 5trit, with Gi and Pi (0 <i <5) and 5 equations computing C1 to C5. Comparing the two approaches means

Comparing eight 1bit full adders with five 1trit full adders

Comparing two blocks of 4bit lookahead computation with one block of 5trit lookahead computation
The comparison is exactly the same for 16bit, 32bit or 64bit adders and the corresponding ternary adders.
IiC Carry Skip Adders (CSA).
Figure 3 presents a 4bit binary skip adder. As the CPA, it has four 1bit full adders. The carry “skip" scheme uses the Pi propagate function of the CLA, a And gate to compute P0.P1.P2.P3 and a twoinput multiplexer. For the “skip" part, the only difference between the binary and the ternary version is the computation of the propagate functions, which have been defined for the CLA.
Just as CPAs, an 8bit CSA uses eight 1bit full adders while a 5trit CSA uses five 1trit full adders. As for the CLA approach, the 8bit “skip" part is decomposed into two blocks of 4bit “skip" parts while the 5trit CSA has only one block of 5trit “skip" part. Comparing the two approaches means

Comparing eight 1bit full adders with five 1trit full adders

Comparing two blocks of 4bit “skip" computation with one block of 5trit “skip" computation. A binary block computation uses four Pi functions, one 4input And gate and one multiplexer. A ternary block computation uses five Pi functions (binary functions of ternary inputs), one binary And gate and one binary multiplexer.
IiD Adder comparisons
CPAs, CLAs and CSA have the same N/M ratio of 1bit adders and 1trit adders. This ratio is close to IR = 1.585. For CLAs and CSAs, the circuits to speedup computation slightly modifies the overall comparison. More details will be given in Section 5.
Iii Multipliers
Iiia Comparing a 8*8 binary multiplier with a 5*5 ternary one
IiiA1 8*8 binary multiplier
Typical binary multipliers can be decomposed in two parts: the first one generates the partial products and the second part reduces the partial products into two sums to be added by a final fast adder. Figure 4 shows the process for an 8*8 bit multiplier.

The first part generates 8 partial products of 8 bits, i.e. 64 bits that are the products of for and . The binary product is implemented by a And gate.

The reduction of the partials products can use different schemes. The typical one is the Wallace tree, for which 3 lines of partial products are reduced to two lines by using 3input 2output full adders (and half adders). For 8*8 bit multipliers, there are several steps of parallel additions of lines of partial products: 8 to 6, 6 to 4, 4 to 3, and finally 3 to 2. A that point, a fast adder is used to add the two final lines. Dadda reduction tree is another one [2]. Other reduction operators can be used, such as 42 compressors, 73 compressors, etc. With the 32 reduction scheme, there are 35 FAs and 18 HAs. Considering the final addition, the total is 44 FAs and 19 HAs.
IiiA2 5*5 ternary multipliers
Ternary multipliers use 3 different values (0, 1 and 2). A ternary wire carries more information than a binary one. The information ratio is IR = log(3)/log (2) = 1.585. An ternary multiplier would be equivalent to a binary multiplier with M = N/1.858. For N = 8, M = 5.04. A 5x5 trit multiplier has slightly less computing capability than an 8*8 bit multiplier. Figure 5 shows the process for a 5*5 trit multiplier.

The ternary product generates one trit (product) and one binary carry according to Table II. The first part generates 5 partial products of 5 trits and 5 partial products of 5 bits for for and . 10 lines of partial products should be reduced.

A Wallace tree can be used to reduce the 10 partial products down to two final lines to be summed by a fast ternary adder. As shown in Figure 5, the reduction process alternates between a set of 2 ternary lines and 1 binary lines (quoted as 2 and 1) and a set of 1 ternary line and 2 binary lines. The reduction tree is thus based on usual ternary full adders (quoted as TFA) and ternary half adders (quoted as THA). There is one specific case when two successive lines are binary lines. In that case, THAs can be used for a 2 to 1 reduction that only provides one ternary line. The different steps are thus 10 to 7, 7 to 5, 5 to 3 and 3 to 2 for a total of 29 TFAs and 12 THAs. Considering the final addition, the total is 34 TFAs and 12 THAs.
Ai  Bi  Pi  Ci 

0  0  0  0 
0  1  0  0 
0  2  0  0 
1  0  0  0 
1  1  1  0 
1  2  2  0 
2  0  0  0 
2  1  2  0 
2  2  1  1 
IiiA3 Comparing N*N bit multipliers with M*M trit multipliers for N =8, 12, 16
The comparison includes

The number of 1bit multipliers and 1trit multipliers

The number of binary full and half adders (BFA and BHA) and ternary full and half adders (TFA and THA) for the reduction trees

The final fast adder can use techniques such as carrylook adder adders, carry skip adders, etc. To keep the comparison simple, we consider using the carry propagate adder (CPA) which simply cascades BFAs or TFAs. While more sophisticated techniques speedup the propagation delay, they are still based on BFAs or TFAs without changing the number of BFAs or TFAs to be used.

Finally, the FA complexity is equal or close to two times the HA complexity. An equivalent number of FAs can be derived assuming that 1 HA = 0.5 FA.
Table III summarizes the comparison for the bit multipliers.
Binary  Ternary  Binary/Ternary  

Ai*Bi  64  25  2.56 
Reduction FA  35  29  
Reduction HA  18  12  
Final Add FA  9  5  
Final Add HA  1  0  
Total FA  44  34  
Total HA  19  12  
Total equivalant FA  53.5  40  1.34 
Binary  Ternary  Binary/Ternary  

Ai*Bi  144  64  2.25 
Reduction FA  102  102  
Reduction HA  34  18  
Final Add FA  18  10  
Final Add HA  0  0  
Total FA  120  112  
Total HA  34  18  
Total equivalant FA  137  121  1.13 
Binary  Ternary  Binary/Ternary  
Ai*Bi  256  100  2.56 
Reduction FA  200  153  
Reduction HA  54  38  
Final Add FA  24  13  
Final Add HA  1  1  
Total FA  224  166  
Total HA  55  39  
Total equivalant FA  251.5  185.5  1.36 
Using the same methodology, Table IV compares a bit multiplier and a trit one. Table V compares a bit multiplier and a trit one. As 12/1.585 = 7.57, the ternary multiplier has more computing capability. As 16/1.585 = 10.1, the ternary multiplier has slightly less computing capability.
The comparison includes two parts:

The number of 1bit and 1trit multipliers, which are for N*N bit multipliers and for M*M ternary multipliers. Obviously, = 2.51. Rounding N or M to get integer values for N and M explain the values 2.56 (Table 3 and Table 5) or 2.25 (Table 4). The complexity of 1trit multiplier versus 1bit multiplier should not be more that to get advantage of the ternary approach.

The ternary Wallace tree operates on a smaller number of trits (N/IR), but two times more partial product lines because the 1trit multiplier generate a product term and a carry. It results than there are only slightly more equivalent 1bit adders than 1trit adders. The ratio ranges from 1.13 to 1.36 for the cases that we considered. The complexity of the ternary full adder versus the binary full adder should not be more than a value that is less than IR.
Iv Complexity comparisons
Comparing ternary and binary circuits is not easy. The main reason is that there have been a huge number of binary circuits designed, fabricated and used since the first days of integrated circuits, while very few ternary circuits have been fabricated and used. In the last period, while FinFET technologies have been implemented with 14 nm, 10 nm and even 7 nm technological nodes, only proposals of ternary or multivalued circuits can be found, generally based on simulations. To be able to make significant comparisons, we must define a common technology for both types of circuits and define some complexity measures.
Iva A technology: CNTFET
A carbon nanotube fieldeffect transistor (CNTFET) refers to a fieldeffect transistor that uses a single carbon nanotube or an array of carbon nanotubes as the channel material instead of bulk silicon in the traditional MOSFET. The MOSFETlike CNTFETs having p and n types look the most promising ones. This technology has advantages and drawbacks:

CNTFET have variable threshold voltages (according to the inverse function of the diameter). Among advantages, high electron mobility, high current density, high transductance can be quoted.

Lifetime issues, reliability issues, difficulties in mass production and production costs are quoted as disadvantages.
We use this technology for several reasons:

It is one of the few proposed ones to overcome the limitations of the FinFET technologies after the end of Moore’s law.

Its variable threshold voltages make easier the implementation of the different thresholds that are needed for ternary and multivalued circuits.

The MOSFETlike CNTFETs have the same circuit styles than the CMOS technologies, which means that the comparison results are not limited to that technology.

A large number of CNTFET ternary or mvalued circuits have been proposed in the recent last years. They facilitate the comparison with the corresponding binary circuits. We will use these proposals for the comparisons.
IvB Complexity figures
Hardware complexity is difficult to define as many parameters can be considered:

Number of transistors

Number of interconnections

Chip area

Power dissipation

Propagation delays

Etc.
Obviously, the most significant information is speed, chip area and power dissipation of fabricated chips in a given technology. However, comparing ternary and binary circuits according to chip area and power dissipation is quite impossible as there are very few or no integrated ternary circuits available for comparisons. Comparisons must be done with a simple criterion that is available from the circuit electrical scheme. We use the number of transistors. Although the transistor count is only an estimation, it gives significant insights. In fact, when using the same technology to implement the same operator, it is very doubtful that:

More transistors lead to less interconnects as these transistors are interconnected

More transistors lead to less chip area.

More transistors lead to less power dissipation
Finding counterexamples look very challenging! When the difference in transistor counts is limited to a few %, no conclusion can be derived. However, if the transistor count for ternary circuits is x2, x3 or more than for the equivalent binary circuits when the information ratio IR = log(3)/log(2) = 1.585, it only means that the ternary circuits have more interconnects, more chip area, more power dissipation than the corresponding binary ones.
V Complexity of ternary and binary adders
Va Complexity of 1bit and 1trit adders
For 1bit, various designs have been proposed using binary CMOS circuitry, which can be considered to implement CNTFET binary 1bit adders [3]. The transistor counts range from 28T for the conventional CMOS design down to 8T for a scheme using 3T Xor gates. Typical implementations with transmission gates use 14T or 16T. All circuits are not equivalent: while conventional CMOS design has maximal noise margins, circuits using transmission gates, or directly connecting inputs either to drain or source of transistors can have reduced noise margins. There is not a similar comparison of the different ternary 1trit adders. Our reference is the CNTFET ternary halfadder proposed in 2017 [4] that will be completed to implement a ternary full adder. As any multivalued circuits, the ternary half adder uses the general scheme presented in Figure 6. The decoder (a, b) and encoder (c) circuits are presented in Figure 7. The sum binary part and the carry generation part are respectively presented in Figure 8 and Figure 9. The transistor count for the ternary half adder is 66 T.
While the half adder compute Sum10, Sum20 and Cm0 with an implicit input carry equal to 0, similar circuitry can be used to compute Sum11, Sum21 and Cm1 when the input carry is 1. Two multiplexers controlled by the input carry are used to compute the final Sum1, Sum2 and Cm to drive the final encoder and carry circuit. We do not show all the details of computation. The final count for the full adder is 124 T. Table 6 summarizes the number of transistors for the ternary adder and different binary adders. There are from x4.4 to x15.5 more transistors for the ternary adder versus the different binary adders.
3FA  Nand2 FA  Xor2 FA  8TFA  

Transistor count  124  28  14 to 16  8 
Ratio 3/2  4.4  7.7 to 8.8  15.5 
Two papers should also be mentioned, which allow a comparison between the binary and the ternary implementations of a carbone nanotube full adder [5][6]. They both use the threshold logic approach with a linear combination of capacitive inputs. The advantage is the reduction of the number of devices needed to combine the inputs. The drawback is a drastic reduction of the noise margins when coherent noises are simulaneously present on the different inputs. If NM is the noise margin for one input, the noise margin for Ninput is NM/N. The binary FA is presented in Figure 10. Considering that the capacitors are implemented with transistors as in Figure 11, it has 11 T. The ternary FA is presented in Figure 11: it has 27T. The ratio is , which is greater than the information ratio IR = 1.585.
VB Complexity of carry circuitry
VB1 Ripple carry adders
As previously mentioned, a Mtrit multiplier will be more efficient than a Nbit multiplier with iff the 1trit adder complexity is no more than x1.585 the complexity of the 1bit adder. As shown in Table 6, this is never the case when comparing the best implementations for binary and ternary adders.
VB2 CarryLook Ahead Adders
We now compare the carry circuitry for 5trit and 8bit CLAs. The equations have been given in 2.2. Binary Gi and Pi functions are implemented respectively by Nand + Inverter and Nor + inverter. Both function uses 6T. Binary C1 is implemented as , i.e. one inverter and two 2input Nand gates for a total of 10T. C2, C3, C4 are implemented by two levels of Nand gates. The transistor count for a 4bit carry computation is given in Table VII. The transistor count for a 5trit carry computation is given in Table VIII.
Function  Gi  Pi  C1  C2  C3  C4  4bit  8bit 

Transistor count  24  24  10  18  28  40  144  288 
Function  Gi  Pi  C1  C2  C3  C4  C5  5trit 

Transistor count  80  90  10  18  28  40  54  310 
It turns out that the carry computations are more costly for a 5trit CLA than for an 8bit CLA. The difference comes from the cost of computing Gi and Pi ternary functions versus the binary ones.
VB3 CarrySkip Adders
For an 8bit CSA, the binary carry computation is composed of two 4bit skip computations. For 4bit, it means P0 to P3 functions, a 4input And gate and a multiplexer. For a 5trit CSA, the carry computation uses P0 to P4 functions, a 5input And gate and a multiplexer. The transistor counts are given in Table IX. Again, the ternary approach is more costly due to the Pi computation costs.
Pi  Nand+inverter  Mux  4bit CS  8bit 5trit CS  

Binary  24i  10  14  48  96 
Ternary  90  12  14  116 
VB4 Conclusion for adders
The transistor count for 1trit adders is greater than for 1bit adder and cannot compensate the reduced number of adders. Similarly, the carry computations are more costly for CLAs and CSAs. For CPAs, CSAs and CSAs, the Mtrit adders cannot compete with the Nbit adders with M = N/1.585.
Vi Complexity of binary and ternary multipliers
Ternary and binary multipliers are decomposed in two parts:

1trit and 1bit multipliers

1trit and 1bit full adders and halfadders that are used in the reduction tree.
Via Complexity of 1bit and 1trit multipliers
1bit multiplier is implemented with a And gate, which means 6T (Nand + Inverter). 1trit multiplier is far more complicated as it generates a product and a carry according to Table II.
Using the same approach as for the ternary adder, the equations are:
(1) 
(2) 
(3) 
The number of transistors for the 1trit multiplier is 4 (decoder) + 12 (Sum2) + 12 (Sum1) + 6 (Product encoder) + 4 (cout encoder) = 38 T, i.e. 38/6 = 6.3. While a 5trit multiplier uses 25 1trit multiplier (950 T), a 8bit multiplier uses 64 And gates for a total of 384 T. The ternary/binary ration is x2.47. There are less 1trit multipliers than 1bit ones. However, it cannot compensate their larger complexity compared to the binary ones.
ViB Complexity of the reduction tree
We consider the reduction trees for 5trit multipliers and 8bit multipliers. The ternary Wallace tree (Figure 5) has 34 TFAs and 14 THAs and the binary Wallace tree (Figure 4) has 35 BFAs and 18 BHAs. For a quick comparison, we can assume that a HA transistor count is half the FA transistor count. Then there are approximately 41 TFAs and 44 BFAs. Obviously, this small difference is not able to compensate the advantage of binary FAs compared to ternary FAs that were shown in 2.4. The issue for ternary multipliers is the carry generated by the 1trit multiplier, which doubles the number of partial products to reduce by the Wallace tree.
ViC Overall complexity
Both the set of elementary multipliers and the Wallace tree needs more transistors for the ternary approach versus the binary ones. Mtrit multipliers cannot compete with Nbit ones with M = N/1.585.
Vii Concluding remarks
We have compared ternary and binary adders and multipliers processing the same amount of information. First, we compared the number of elementary cells such as 1bit/1trit full adders, 1bit/1trit multipliers. This comparison remains valid for any implementation of these cells. Then we consider the hardware complexity using the transistor count for the typical implementation of these cells with CNTFET technology. It turns out that both ternary adders and multipliers cannot compete with the binary ones. If Mtrit adders or multipliers have less input and output connections than the corresponding Nbit adders or multipliers, the larger number of transistors means that the ternary arithmetic operators have more connections when considering the internal ones. More transistors mean more connections, more chip area, more propagation delays and more power dissipation for the ternary operators versus the binary ones when using the same technology.
References
 [1] D. Etiemble, M. Israel, “Comparison of Binary and Multivalued ICs according to VLSI Criteria“, in Computer, Vol. 21, Issue 4, April 1988, pp 2842.
 [2] W. J. Townsend, E. E. Swartzlander Jr. and J. A. Abraham, “A comparison of Dadda and Wallace multiplier delays", Proc. SPIE 5205, Advanced Signal Processing Algorithms, Architectures, and Implementations XIII, (24 December 2003); https://doi.org/10.1117/12.507012
 [3] R. Anitha, “Comparative study on transistor based full adder designs“, World Scientific News, WSN 53(3) (2016) 404416, EISSN 23922192
 [4] S.K. Sahoo, G.Akhilesh, R. Sahoo and M. Mugkilar, “High Performance Ternary Adder using CNTFET“, IEEE Transactions on Nanotechnology, Vol 16, N^{o} 3, January 2017
 [5] K. Navi, A. Momeni, F. Sharifi and P. Keshavarzian, “Two novel high speed carbon nanotube FullAdder cells", in IEICE Electronics Express, Vol. 6, N^{o} 19, 13951401, 2009
 [6] R. Faghih Mirzaee and K. Navi, “Optimized Adder Cells for Ternary RippleCarry Addition" in IEICE Trans. INF SYST, Vol.E97D, N^{o} 9, September 2014.
Comments
There are no comments yet.