I Introduction
Galois field (GF) arithmetic is used to implement critical arithmetic components in communication and securityrelated hardware. It has been extensively applied in many digital signal processing and security applications, such as Elliptic Curve Cryptography (ECC), Advanced Encryption Standard (AES), and others. Multiplication is one of the most heavily used Galois field computations and is a complexity operation. Specifically, in cryptography systems, the size of Galois field circuits can be very large. Therefore, developing a general formal analysis technique of Galois field arithmetic HW/SW implementations becomes critical. Contemporary formal techniques, such as Binary Decision Diagrams (BDDs), Boolean Satisfiability (SAT), Satisfiability Modulo Theories (SMT), etc., are not efficient to either the verification or reverse engineering of Galois field arithmetic. The limitations of these techniques when applied to Galois field arithmetic have been addressed [1].
The elements in field GF() can be represented using polynomial rings. The field of size is constructed using irreducible polynomial , which includes terms with degree [] and coefficients in GF(2). For example, = is an irreducible polynomial in GF(). The multiplication in the field is performed modulo . Theoretically, there is a large number of irreducible polynomials available for constructing the field arithmetic operations in GF(). However, the choice of irreducible polynomial has great impact on the actual implementation of the resulting GF circuits and the performance of field arithmetic operations. The irreducible polynomials differ in the number of bitlevel XOR operations. It is believed that, in general, the irreducible polynomial with minimum number of elements gives the best performance [2]. However, a later work [3] demonstrates that the best irreducible polynomial from circuit performance point of view varies in different scenarios, and depends on a computer architecture in which it is used, such as ARM vs. IntelPentium. In other words, 1) for GF() multiplication, each irreducible polynomial results in a unique implementation; and 2) for a fixed field size, there exist many irreducible polynomials that could be used for constructing the field in different applications. This provides the main motivation for this work.
Computer algebra techniques with polynomial representations is believed to offer best solution for analyzing arithmetic circuits [1][4][5][6]. These work address the verification problems of Galois field arithmetic and integer arithmetic implementations, including abstractions [4][5][6]. The verification problem is typically formulated as proving that the implementation satisfies the specification. This task is accomplished by performing a series of divisions of the specification polynomial by the implementation polynomials , representing components that implement the circuit. The techniques based on Gröbner Basis demonstrate that this approach can efficiently transform the verification problem to membership testing of the specification polynomial in the ideals [1][5]. A different approach to arithmetic verification of synthesized gatelevel circuits has been proposed, using algebraic rewriting technique, which transforms polynomial of primary outputs to polynomial of primary inputs [6]. The technique proposed in [1] has been specifically applied to large GF() arithmetic circuits. However, the knowledge of irreducible polynomial is essential to verify the implementations.
Symbolic computer algebra methods have also been used to reverse engineer the wordlevel operations for GF circuits and integer arithmetic circuits to speed up the verification performance [7][8][9]. In the work of [7], the authors proposed a original spectral method based on analyzing the internal polynomial expressions during the rewriting procedure. SayedAhmed et al. [8] introduced a reverse engineering technique in Algebraic Combinational Equivalence Checking (ACEC) process using Gröbner Basis by converting the function into canonical polynomials. However, both techniques are applicable to integer arithmetic only. In [9], an abstraction technique is introduced by analyzing the polynomial representation over . However, similarly to [1], it is limited to the implementation with a known irreducible polynomial. In this work, we present a method that is able to reverse engineer the design by extracting the irreducible polynomial of the GF() multiplier, regardless of the GF() algorithm used (e.g. and ). This procedure automatically checks the equivalence between the implementation with a golden implementation constructed using the extracted irreducible polynomial .
Ii Background
Different variants of canonical, graphbased representations have been proposed for arithmetic circuit verification, including Binary Decision Diagrams (BDDs) [10]
, Binary Moment Diagrams (BMDs)
[11], Taylor Expansion Diagrams (TED) [12], and other hybrid diagrams. While the canonical diagrams have been used extensively in logic synthesis, highlevel synthesis and verification, their application to verify large arithmetic circuits remains limited by the prohibitively high memory requirement of complex arithmetic circuits [6][1]. Alternatively, arithmetic verification problems can be modeled and solved using Boolean satisfiability (SAT) or satisfiability modulo theories (SMT). However, it has been demonstrated that these techniques cannot efficiently solve the verification problem of large arithmetic circuits [1] [13]. Popular in industry are Theorem Provers, userdriven deductive systems for proving that an implementation satisfies the specification, using mathematical reasoning. However, Theorem Provers require manual guidance and indepth domain knowledge, which makes it difficult to be applied automatically.Iia Computer Algebra Approaches
The most advanced techniques that have potential to solve the arithmetic verification problems are those based on symbolic Computer Algebra. These methods model the arithmetic circuit specification and its hardware implementation as polynomials [1][4][6][9][14]. The verification goal is to prove that implementation satisfies the specification by performing a series of divisions of the specification polynomial by the implementation polynomials , representing components that implement the circuit. The polynomials are called the bases, or generators, of the ideal . Given a set of generators of , a set of all simultaneous solutions to a system of equations =0; …,=0 is called a variety . Verification problem is then formulated as testing if the specification vanishes on . In some cases, the test can be simplified to testing if , which is known in computer algebra as ideal membership testing [1].
There are two basic techniques to reduce polynomial modulo . A standard procedure to test if is to divide polynomial by the elements of : {}, one by one. The goal is to cancel, at each iteration, the leading term of using one of the leading terms of . If the remainder of the division is 0, then vanishes on , proving that the implementation satisfies the specification. However, if , such a conclusion cannot be made: may not be sufficient to reduce to 0, and yet the circuit may be correct. To check if is reducible to zero, a canonical set of generators, , called Gröbner basis is needed. This technique has been successfully applied to Galois field arithmetic [1] and integer arithmetic circuits [5].
Verification work of Galois field arithmetic has been presented in [1] [9]. These works provide significant improvement compared to other techniques, since their formulations rely on certain simplifying properties in Galois field during polynomial reduction. Specifically, the problem reduces to the ideal membership testing over a larger ideal that includes in . In this paper, we provide comparison between this technique and our approach.
IiB Function Extraction
Function extraction is an arithmetic verification method originally proposed in [6] for integer arithmetic circuits, in . It extracts a unique bitlevel polynomial function implemented by the circuit directly from its gatelevel implementation. Extraction is done by backward rewriting, i.e., transforming the polynomial representing encoding of the primary outputs (called the output signature) into a polynomial at the primary inputs (the input signature). This technique has been successfully applied to large integer arithmetic circuits, such as 512bit integer multipliers. However, it cannot be directly applied to large multipliers because of exponential size of the intermediate number of polynomial terms before cancellations during rewriting. Fortunately, arithmetic circuits offer an inherent parallelism which can be exploited in backward rewriting.
In the rest of the paper, we first show how to apply such parallel rewriting in circuits while avoiding memory explosion experienced in integer arithmetic circuits. Using this approach, we extract the function of each output element in and the function is represented in algebraic expression where all variables are Boolean. Finally, we propose a method to reverse engineer the GF designs by extracting the irreducible polynomial by analyzing these expressions.
IiC Galois Field Multiplication
Galois field (GF) is an algebraic system with a finite number of elements and two main arithmetic operations, addition and multiplication; other operations can be derived from those two [15]. Galois field with elements is denoted as . The most widelyused finite fields are Prime Fields and Extension Fields, and particularly binary extension fields. Prime field, denoted , is a finite field consisting of finite number of integers {}, where is a prime number, with additions and multiplication performed modulo p. Binary extension field, denoted (or ), is a finite field with elements. Unlike in prime fields, however, the operations in extension fields are not computed modulo . Instead, in one possible representation (called polynomial basis), each element of is a polynomial ring with terms with the coefficients in . Addition of field elements is the usual addition of polynomials, with coefficient arithmetic performed modulo 2. Multiplication of field elements is performed modulo irreducible polynomial of degree and coefficients in . The irreducible polynomial is analog to the prime number in prime fields . Extension fields are used in many cryptography applications, such as AES and ECC. In this work, we focus on the verification problem of multipliers.
Two different GF multiplication structures constructed using different irreducible polynomials and , are shown in Figure 1. The integer multiplication takes two bit operands as input and generates a bit word, where the values computed at lower significant bits are carried through the carry chain all the way to the most significant bit (MSB). In contrast, there is no carry propagation in GF() implementations. To represent the result in , the result of the integer multiplication have to be reduced in to only four output bits. The result of such a reduction is shown in Figure 1. In GF(), the input and output operands are represented using polynomials , and , where =, =, =.
The functions of ( [0, 6]) are represented using polynomials in , namely: =, =+, up to =^{1}^{1}1For polynomials in , ”+” is computed as modulo 2.. The outputs ( []) are computed modulo the irreducible polynomial . Using =++1, we obtain : =+, =++, =+++++, and =++++. In digital circuits, partial products are implemented using and gates, and addition modulo 2 is done using xor gates. Note that, unlike in integer multiplication, in circuits there is no carry out to the next bit. For this reason, as we can see in Figure 1, the function of each output bit can be computed independently of other bits.
IiD Irreducible Polynomials
For constructing the field , the irreducible polynomial can be either a trinomial, ++1, or a pentanomial ++++1 [16]. In [16], it is stated that the pentanomial is chosen as irreducible polynomial only if an irreducible trinomial doesn’t exist. In order to obtain efficient GF multiplication algorithm, it is required that  . However, the work of [3] demonstrates that the trinomials are not always better than pentanomials. It means that for a given field size, there could be various irreducible polynomials used in different implementations.
An example of constructing multiplication using two different irreducible polynomials is shown in Figure 1. We can see that each polynomial corresponds to a unique multiplication. The performance difference can be evaluated by counting the XOR operations in each multiplication. Since the number of AND and XOR operations for generating partial products (variables in Fig. 1) is always the same, the difference is only caused by the reduction of the corresponding polynomials modulo . The number of XOR operations in reduction process can be counted as the number of terms in each column minus one. For example, the number of XORs using is 3+1+2+3=9; and using , the number of XORs is 1+2+2+1=6.
As will be shown in the next section, given the structure of the multiplication, such as shown in Figure 1, one can immediatelly identify the irreducible polynomial . This can be done by extracting the terms corresponding to the entry (here ) in the table and generating the irreducible polynomial beyond . We know that must contain , and the remaining terms are obtained from the nonzero terms corresponding to the entry . For the irreducible polynomial , the terms and are obtained by noticing the placement of in columns and . Similarly, for , the terms and are obtained by noticing that is placed in columns and . The reason for it and the details of this procedure will be explained in the next section.
Iii Approach
Iiia Computer Algebraic model
In this approach, the circuit is modeled as a network of logic elements, including: basic logic gates (AND, OR, XOR, INV), and complex standard cell gates (AOI, OAI, etc.) obtained by synthesis and technology mapping. The following algebraic equations are used to describe basic logic gates in [1]:
(1)  
IiiB Outline of the Approach
Similarly to the work of [6], the computed function of the circuits is specified by two polynomials, referred to as output signature and input signature. The output signature of a multiplier is defined as , and . The input signature of a multiplier is = , with coefficients (product terms) , and addition operation performed modulo 2. As discussed in Section II and shown in Figure 1, given an irreducible polynomial , the input signature can be computed easily in . The goal of verification is first to transform the output signature, , using polynomial representation of the internal logic elements, into and then check if = . The following theorem is adopted from [6], where it was initially applied to integer arithmetic circuits in .
Theorem 1 (Correctness): Given a combinational arithmetic circuit, composed of logic gates, described by polynomial expressions (Eq. 1), the input signature computed by backward rewriting is unique and correctly represents the function implemented by the circuit in .
Proof: The proof relies on the fact that each transformation step (rewriting iteration) is correct. That is, each internal signal is represented by an algebraic expression, which always evaluates to a correct value in . This is guaranteed by the correctness of the algebraic model in Eq. (1), which can be proved by inspection. The correctness of the computed signature can be proved by induction on , the step of transforming polynomial into . Assuming that =, and each , it is easy to show that remains in , where each variable in represents output of some logic gate. During the rewriting process, this variable is substituted by a corresponding polynomial in . Hence, the resulting polynomial correctly represents the function . Proof of the uniqueness of the computed signature follows the same reasoning.
The rewriting process is described in Algorithm 1. During the rewriting, the polynomial is simplified by applying mod 2 reduction to all its terms. This is unlike in case, where some terms (with opposite signs) would cancel each other.
The rewriting algorithm takes the gatelevel netlist of a circuit as input and first converts each logic gate into equations using Eq. (1). The rewriting process starts with , proceeds in a topological order of the netlist, and ends when all the variables in are all primary inputs. Each iteration includes two steps: Step 1) (lines 46 of the Algorithm) substitute the variable of the gate output using the expression in the inputs of the gate (Eq.1), and name the new expression ; Step 2) (line 4 and lines 810) simplify the new expression by removing all the monomials and constants that evaluate to 0 in . The algorithm outputs the function of the design in after iterations, where is the number of gates in the netlist.
In addition to verifying the design by comparing the computed polynomial with , the expressions of will be used to extract the irreducible polynomial and perform the verification.
An important observation is that the cancellations of polynomial terms take place only within the expression associated with the same degree of polynomial ring ( is a polynomial ring). In other words, the cancellations resulting the polynomial reduction happen in a logic cone of every output bit independently of other bits, regardless of logic sharing between the cones.
Theorem 2 (Parallelizability): Given a multiplier with = = + + … + ; and = + + … + , where is an algebraic expression in obtained during rewriting. Then, the polynomial reduction is possible only within a single expression , for =1, 2, …, m.
Proof: Consider a polynomial +, where and are simplified in . That is, ), and ). After simplifying each of the two polynomials, there are no common monomials between and . This is because for any element, , for any pairs of and .
=+  
= 

=x 


G0: +    G0:    
G1: +    G1: (+1+)x    
G2: +    G2: (++)x+x    
G3: +    G3: (1+++)x+x  2x  
G4: +    G4: (+1++)x    
G5: +1+    G5: (++1+)x+x  2x  
G6: 1+++1  2  G6: x(++)+2x  2x  
=+, =++ 
Example 1 (Figure 2): We illustrate our method using a postsynthesized 2bit multiplier in , shown in Figure 2. The irreducible polynomial of this design is = . The goal is to extract algebraic expressions of and by rewriting polynomials from the primary outputs to primary inputs, which is done in parallel ( and are rewritten in two threads). The first two transformations rewrite G0 and G1. After this, is rewritten to +, and is rewritten to x+x+x. In the rewriting process, we can see that the polynomial reduction happen when there are monomials that are not in GF(2). For example, during the iteration of rewriting , monomial 2x is eliminated. Also, we can see that the reductions happen only within the logic cone of each output bit, as proved in Theorem 2.
In the following, the outfiled products are the products , such that m. Since these products are associated with bits , they are reduced by .
Theorem 3: Given a multiplication in GF(), let the first outfield product set be . Then, the irreducible polynomial includes , and iff all products in set exist in the algebraic expression of the output bits, where .
Proof: Based on the definition of field arithmetic, the polynomial basis representation of is . To reduce into elements in the range [0, ] (with output bits), the field reductions are performed modulo irreducible polynomial with highest degree of . Based on the definition of irreducible polynomial, is either a trinomial or a pentanomial with degree of . Let be +. Then,
Hence, if exists in , it also exists in .
Example 2 (Figure 2): We illustrate the method of reverse engineering the irreducible polynomial using the 2bit multiplier in , shown in Figure 2. The algorithm is shown in Algorithm 2. Using the rewriting technique (Algorithm 1) based on Theorem 1 and 2, we can extract the algebraic expressions of =, and x = (++)x, hence =++ (lines 3  5). In this example, =2, hence ={}. We can see that both expressions of and include , which means that and are included in the irreducible polynomial of this design (lines 6  7). Based on Theorem 4, we know that is always included (line 2). Hence, irreducible polynomial of this design is = (=2) (line 10).
Iv Results
The technique described in this paper was implemented in C++. It reverse engineers the irreducible polynomials of GF() multiplications by analyzing the algebraic expressions of each element. The program was tested on a number of combinational gatelevel multipliers with different irreducible polynomials including Montgomery multipliers and Mastrovito multipliers. The multiplier generators are taken from [1]. It shows that our technique can successfully reverse engineer the irreducible polynomials of various designs, regardless of the GF() algorithm. The experiments were conducted on a PC with Intel(R) Xeon CPU E52420 v2 2.20 GHz x12 with 32 GB memory.
We first evaluate our approach using Montgomery and Mastrovito multipliers that are implemented using NISTrecommended irreducible polynomials [16]. The experimental results of Mastrovito multipliers with bitwidth varying from 64 to 571 bits is shown in Table I and results of Montgomery multipliers with bitwidth varying from 64 to 283 bits is shown in Table II. Note that we use the flattened version Montgomery multipliers, i.e. we have no knowledge of the block boundaries. The bitwidth of the GF() multiplier is shown in the first column. The irreducible polynomials used for constructing those multipliers are shown in the second column. The number of equations that represent the implementation is in the third column; it is also the number of iterations of extracting the polynomial expression of each output bit.
Our program takes the netlist/equations of the GF() implementations, and the number of threads as inputs. Hence, the users can adjust the parallel effort depending on the hardware resource. In this work, all results are performed in 16 threads. The results in Table I and Table II show that the proposed technique can extract the irreducible polynomial of large multipliers, regardless of the GF algorithm.
bitwidth  Irreducible polynomial P(x)  # eqns  Extraction in 16 threads  

Runtime(s)  Mem  
64  ++++1  21,814  9.2  37 MB 
96  ++++1  51,412  13.4  86 MB 
163  ++++1  153,245  158.9  253 MB 
233  ++1  167,803  244.9  1.5 GB 
283  ++++1  399,688  704.5  4.5 GB 
409  ++1  508,507  1324.7  8.3 GB 
571  ++++1  1628,170  4089.9  27.1 GB 
bitwidth  Irreducible polynomial P(x)  # eqns  Extraction in 16 threads  
Runtime(s)  Mem  
64  ++++1  16.898  42.2  30 MB 
96  ++++1  37,634  228.2  119 MB 
163  ++++1  107,582  1614.8  2.6 GB 
233  ++1  219,022  461.1  4.8 GB 
283  ++++1  322,622  21520.0  7.8 GB 
409  ++1  672,396   
We also apply our technique in the bitoptimized multipliers (Table III). The multipliers are optimized and mapped using synthesis tool ABC [17]. Comparing Table III with Tables I and II, we can see that it takes much less runtime and memory to extract the irreducible polynomials of the bitoptimized multipliers rather than the nonoptimized multipliers. This is because the GF multipliers are implemented without carry chain. As long as the logic cone of each output bit can be reduced, the complexity of extracting the polynomial expressions becomes easier.
Irreducible polynomial  Mastrovitosyn  Montgomerysyn  

Runtime(s)  Mem  Runtime(s)  Mem  
64  ++++1  12.8  25 MB  5.2  20 MB 
163  ++++1  67.6  508 MB  221.4  610 MB 
233  ++1  149.6  1.2 GB  154.4  2.9 GB 
409  ++1  821.6  6.5 GB  855.4  10.3 GB 
One observations is that in Table II, extracting of GF() multiplier requires four times runtime of extracting of GF() multiplier. The reason is that the complexity of the GF multiplication using different irreducible polynomials can be very different. The results shown in Table IV compare the performance of extracting the irreducible polynomials of GF() Mastrovito multipliers for different . Those multipliers are implemented with the polynomials shown in Table IV, which are optimal irreducible polynomials for different computer architectures [3]. We can see that the runtime varies from 233 seconds to 546 seconds, and memory usage varies from 4.8 GB to 11.7 GB. This is because, for different , the total number of XOR operations can be very different, e.g. as for the GF() multiplications discussed in Section IID.
Optimal P(x) in GF()  Runtime(s)  Mem  

IntelPentium  ++++1  546.7  11.7 GB 
ARM  ++1  233.7  5.1 GB 
MSP430  ++++1  511.2  10.9 GB 
NISTrecommended  ++1  244.9  4.8 GB 
The complexity of extracting irreducible polynomial is evaluated using the runtime of extracting polynomial expression of each output bit, and finding (Algorithm 2). The analysis results shown in Figure 4 are based on the GF() multipliers used in Table IV. The xaxis represents the output bit position, and the yaxis shows the runtime of extracting polynomial expression and finding .
V Conclusion
This paper presents a computer algebra based technique that extracts the irreducible polynomial used in the implementation of a multiplier with a given GF(). The method is based on analyzing the unique polynomial expressions of the output bits in Galois field. The experimental results show that our technique is able to extract the irreducible polynomial up to 571bit GF multipliers, regardless of the implementation. We analyze the runtime complexity using various irreducible polynomials.
Acknowledgment: The authors would like to thank Prof. Kalla, University of Utah, for his valuable discussion and the benchmarks; and Dr. Arnaud Tisserand, University Rennes 1 ENSSAT, for his valuable discussion. This work is funded by NSF grants, CCF1319496 and CCF1617708.
References
 [1] J. Lv, P. Kalla, and F. Enescu, “Efficient Grobner Basis Reductions for Formal Verification of Galois Field Arithmatic Circuits,” IEEE Trans. on CAD, vol. 32, no. 9, pp. 1409–1420, September 2013.
 [2] M. Ciet, J.J. Quisquater, and F. Sica, “A short note on irreducible trinomials in binary fields,” in 23rd Symposium on Information Theory in the BENELUX, 2002.
 [3] M. Scott, “Optimal irreducible polynomials for gf (2m) arithmetic.” IACR Cryptology ePrint Archive, vol. 2007, p. 192, 2007.
 [4] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch, and G. Greuel, “Stable: A new qfbv smt solver for hard verification problems combining boolean reasoning with computer algebra,” in DATE, 2011, pp. 155–160.
 [5] A. SayedAhmed, D. Große, U. Kühne, M. Soeken, and R. Drechsler, “Formal verification of integer multipliers by combining grobner basis with logic reduction,” in DATE’16, 2016, pp. 1–6.
 [6] M. Ciesielski, C. Yu, W. Brown, D. Liu, and A. Rossi, “Verification of Gatelevel Arithmetic Circuits by Function Extraction,” in 52nd DAC. ACM, 2015, pp. 52–57.
 [7] C. Yu and M. J. Ciesielski, “Automatic wordlevel abstraction of datapath,” in IEEE International Symposium on Circuits and Systems, ISCAS 2016, Montréal, QC, Canada, May 2225, 2016, 2016, pp. 1718–1721.
 [8] A. SayedAhmed, D. Große, M. Soeken, and R. Drechsler, “Equivalence checking using grobner bases,” FMCAD’2016, 2016.
 [9] T. Pruss, P. Kalla, and F. Enescu, “Equivalence Verification of Large Galois Field Arithmetic Circuits using WordLevel Abstraction via Gröbner Bases,” in DAC’14, 2014, pp. 1–6.
 [10] R. E. Bryant, “Graphbased algorithms for boolean function manipulation,” IEEE Trans. on Computers, vol. 100, no. 8, pp. 677–691, 1986.
 [11] R. E. Bryant and Y.A. Chen, “Verification of Arithmetic Functions with Binary Moment Diagrams,” in DAC’95.
 [12] M. Ciesielski, P. Kalla, and S. Askar, “Taylor Expansion Diagrams: A Canonical Representation for Verification of Data Flow Designs,” IEEE Trans. on Computers, vol. 55, no. 9, pp. 1188–1201, Sept. 2006.
 [13] C. Yu, W. Brown, D. Liu, A. Rossi, and M. J. Ciesielski, “Formal verification of arithmetic circuits using function extraction,” IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 35, no. 12, pp. 2131–2142, 2016.
 [14] O. Wienand, M. Wedler, D. Stoffel, W. Kunz, and G.M. Greuel, “An Algebraic Approach for Proving Data Correctness in Arithmetic Data Paths,” CAV, pp. 473–486, July 2008.
 [15] C. Paar and J. Pelzl, Understanding cryptography: a textbook for students and practitioners. Springer Science & Business Media, 2009.
 [16] NIST, “Recommended elliptic curves for federal government use,” 1999.
 [17] A. Mishchenko et al., “Abc: A system for sequential synthesis and verification,” URL http://www. eecs. berkeley. edu/~ alanmi/abc, 2007.