UNSAIL: Thwarting Oracle-Less Machine Learning Attacks on Logic Locking

Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic locking technique that inserts key-gate structures with the specific aim to confuse ML models like those used in SAIL. More specifically, UNSAIL serves to prevent attacks seeking to resolve the structural transformations of synthesis-induced obfuscation, which is an essential step for logic locking. Our approach is generic; it can protect any local structure of key-gates against such ML-based attacks in an oracle-less setting. We develop a reference implementation for the SAIL attack and launch it on both traditionally locked and UNSAIL-locked designs. In SAIL, a change-prediction model is used to determine which key-gate structures to restore using a reconstruction model. Our study on benchmarks ranging from the ISCAS-85 and ITC-99 suites to the OpenRISC Reference Platform System-on-Chip (ORPSoC) confirms that UNSAIL degrades the accuracy of the change-prediction model and the reconstruction model by an average of 20.13 and 17 percentage points (pp) respectively. When the aforementioned models are combined, which is the most powerful scenario for SAIL, UNSAIL reduces the attack accuracy of SAIL by an average of 11pp. We further demonstrate that UNSAIL thwarts other oracle-less attacks, i.e., SWEEP and the redundancy attack, indicating the generic nature and strength of our approach. Detailed layout-level evaluations illustrate that UNSAIL incurs minimal area and power overheads of 0.26 respectively, on the million-gate ORPSoC design.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 11

09/27/2018

SAIL: Machine Learning Guided Structural Analysis Attack on Hardware Obfuscation

Obfuscation is a technique for protecting hardware intellectual property...
11/20/2020

Challenging the Security of Logic Locking Schemes in the Era of Deep Learning: A Neuroevolutionary Approach

Logic locking is a prominent technique to protect the integrity of hardw...
12/14/2021

MuxLink: Circumventing Learning-Resilient MUX-Locking Using Graph Neural Network-based Link Prediction

Logic locking has received considerable interest as a prominent techniqu...
11/13/2021

UNTANGLE: Unlocking Routing and Logic Obfuscation Using Graph Neural Networks-based Link Prediction

Logic locking aims to prevent intellectual property (IP) piracy and unau...
07/19/2021

Deceptive Logic Locking for Hardware Integrity Protection against Machine Learning Attacks

Logic locking has emerged as a prominent key-driven technique to protect...
07/05/2021

Logic Locking at the Frontiers of Machine Learning: A Survey on Developments and Opportunities

In the past decade, a lot of progress has been made in the design and ev...
10/21/2021

CAPTIVE: Constrained Adversarial Perturbations to Thwart IC Reverse Engineering

Reverse engineering (RE) in Integrated Circuits (IC) is a process in whi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The substantial and continuously increasing manufacturing costs have led most of the semiconductor industry to adopt a fabless business model. Leading semiconductor design houses such as Apple® and Qualcomm® outsource their fabrication to off-shore foundries, which may be potentially untrustworthy. Attackers present in the integrated circuit (IC) supply chain can compromise the security of the underlying hardware during fabrication, testing, assembly, and packaging. Several hardware-focused attacks can be launched by attackers, which include (but are not limited to) reverse engineering, illegal overproduction, intellectual property (IP) piracy, and implantation of malicious circuits known as hardware Trojans [rostami2014primer]. Several design-for-security techniques seek to prevent IP piracy during the untrusted manufacturing stage, such as state-space obfuscation [chakraborty2009harpoon], split manufacturing [patnaik2018concerted, patnaik2018raise], hardware metering [alkabani2007active], and logic locking [epic_journal, chakraborty2019keynote]. The prime focus of this paper is to address the shortcomings of traditional logic locking techniques in offering protection during the untrusted manufacturing stage. The important technical terms used in the paper are defined in Table I to ease readability.

Term Description
Key-gate
A key-gate is a newly added gate and more precisely interposed into the design, driven by an original wire from the netlist and a newly introduced key-input k.
Only when the correct key-bit value for k is assigned, that key-gate would maintain/restore the functionality of the design; otherwise, it would remain locked,
i.e., non-functional.
Locked netlist A netlist where selected nets are locked using key-gates driven by key-inputs (connected to a tamper-proof memory).
Oracle-less attack
An attacker with access only to the GDSII representation of a locked design performs reverse engineering to obtain the locked netlist.
Therefore, the attacker needs to get around the logic locking scheme and all its key-gates focusing on structural analysis.
This is in contrast to the majority of attacks being oracle-guided, where the attacker holds a working chip, essential for functional verification.
Subgraph Locality around a key-gate; different sizes of subgraphs serve to capture different fan-in and fan-out cones of key-gates.
Pre-/Post-subgraph Pre-/Post-synthesis key-gate subgraph. These are essential to describe the obfuscation of key-gates induced by re-synthesis.
TABLE I: Definition of Common Terms
Fig. 1: Illustration of logic locking using X(N)OR logic gate. (a) Original design. (b) Locked design using an XOR key-gate (light gray) controlled by key-input k1; the correct key is 0

. (c) Re-synthesized locked design. The key-gate is transformed to an XNOR key-gate (dark gray), along with localized transformations of other gates, but the correct key remains

0.

I-a Logic Locking

In logic locking, additional key-gates are inserted in the original design to obfuscate its underlying functionality. These key-gates are controlled by key-inputs , driven by an on-chip tamper-proof memory. The locked design functions properly only after programming the correct key. An example of logic locking is illustrated in Fig. 1. The original design is shown in Fig. 1(a) where a suitable place for key-gate insertion is marked. An XOR key-gate is inserted, which is driven by an original signal from the netlist and the newly introduced key-input k1 as depicted in Fig. 1(b). The correct key-bit for an XOR key-gate must be 0 to maintain the original functionality of the design. However, as one can observe, it would be trivial for an attacker to identify the correct key owing to the one-to-one mapping between the type of key-gate and the corresponding key value.

To decorrelate this kind of inference/information leakage, logic locking schemes incur obfuscation through iterative rounds of synthesis.111Synthesis is a design stage which “compiles” an algorithmic/behavioral description to an optimized hardware implementation consisting of logic gates. We refer to such an obfuscation procedure as “re-synthesis” throughout the paper. The locked and re-synthesized design is shown in Fig. 1(c). Still, without any further efforts, the scale and persuasiveness of such obfuscations are subject alone to the synthesis tool, whose objectives and metrics are not focused toward security. These and other risks incurred by design tools during the implementation of secure schemes have been acknowledged, e.g., see [yang2019stripped, knechtel2020towards].

Fig. 2: Integration of logic locking in the IC supply chain. An attacker in the fab can launch an oracle-less attack on a locked netlist, obtaining the original design. Such a threat is considered more dangerous than that of an oracle-guided attack, which requires a working chip, as it can be easily launched by further adversaries at an earlier stage in the supply chain. The flow of an oracle-less attack is highlighted in the red dotted rectangle. In contrast, the flow of an oracle-guided attack is illustrated in the orange dotted rectangle. In this work, we focus on oracle-less attacks only.

Fig. 2 illustrates the integration of logic locking in the IC supply chain. Logic locking is commonly implemented on a synthesized design where the key is the designer’s secret. Post-testing, the correct key is loaded into the chips either by the design house or another trustworthy entity.

I-B Threat Models and Assumptions

Most of the existing research on logic locking assumes an oracle-guided threat model (orange dotted box in Fig. 2). In such a scenario, an attacker has access to (i) a locked netlist and (ii) a functional IC holding the correct key (in a tamper- and access-proof memory), useful for functional verification of any attack inference. An attacker can obtain the locked netlist by reverse-engineering the layout of a chip, and another chip can be used as an oracle for functional verification. However, these chips must be obtained from the open market; they become available only sometime after fabrication.

Another threat arises from fab-based adversaries who have access to all the structural information of an IC during manufacturing but do not possess an activated, working IC required for functional verification. That is, if adversaries can devise attacks using only the locked netlist, such oracle-less attacks could compromise the security offered by logic locking very early in the supply chain, which represents a more potent threat model. The red dotted box in Fig. 2 indicates the flow of an oracle-less attack.

I-C Scope of This Work

This work is motivated by the recent emergence of oracle-less, machine learning (ML)-based attacks on logic locking. Our objective is to develop an effective technique to delineate the learning of an ML-based framework, leading to low accuracy of such otherwise powerful attacks. We propose UNSAIL, a defense mechanism that can be integrated with any traditional logic locking technique, to protect locking against learning-based oracle-less attacks, mainly the SAIL [chakraborty2018sail] and SWEEP [alaql2019sweep] attacks.

Again, the main focus of this work on UNSAIL is to thwart oracle-less attacks and protect the design during the untrusted manufacturing stage. Nevertheless, UNSAIL can be readily integrated with some SAT-attack resilient locking technique, to achieve a two-layer defense protecting against both oracle-guided and oracle-less attacks.

The primary contributions of this work are as follows:

  1. We implement a framework for the UNSAIL defense which can be easily integrated with any combinational logic locking technique and any design-tool suite (e.g., Synopsys Design Compiler).

  2. We develop a reference framework of SAIL and implement the algorithms as outlined in [chakraborty2018sail].

  3. We perform a thorough and detailed analysis of our proposed UNSAIL technique. To that end, we have studied the effect of different types of key-gates, different key-sizes, and other key-gate insertion algorithms. We also study the effect of randomizing the selection of key-gates when securing against different oracle-less attacks.

Through our elaborate experimental study, the effectiveness of UNSAIL for protecting combinational logic locking techniques against the oracle-less ML-based SAIL [chakraborty2018sail] and SWEEP [alaql2019sweep] attacks is showcased. To the best of our knowledge, no other defense mechanisms have been proposed to mitigate these potent attacks. Additionally, our experiments also demonstrate that UNSAIL can thwart non-ML-based oracle-less attacks such as the redundancy attack [li2019piercing].

More specifically, five sets of comprehensive experiments are performed to validate the effectiveness of UNSAIL: (i) evaluating the change-prediction model of SAIL; (ii) evaluating the reconstruction model of SAIL; (iii) evaluating the full SAIL attack, where both models are used in conjunction; (iv) evaluating the SWEEP attack; and (v) evaluating the redundancy attack. Throughout our experiments using benchmarks from the ISCAS-85 and ITC-99 suite and the OpenRISC Reference Platform System-on-Chip (ORPSoC), the proposed defense is shown to effectively reduce the accuracy of the different ML models of SAIL by an average of 20.13, 17, and 11 percentage points (pp), respectively. In addition, the accuracy for the SWEEP attack reduces by an average of 15pp, showcasing that our defense is resilient against another powerful oracle-less, learning-based attack. The average percentage of UNSAIL’s key-bits recovered by the redundancy attack is 38%, demonstrating that UNSAIL also protects against this non-ML-based oracle-less attack. Layout-level evaluations show that our defense incurs minimal area and power overheads of 0.26% and 0.61%, respectively, on the million-gate ORPSoC design.

The remainder of this paper is organized as follows. The landscape of attack and defense strategies for logic locking is reviewed in Sec. II. Details about the relevant prior art of oracle-less ML-based attacks on logic locking are provided in Sec. III. Section IV presents the concept of our proposed UNSAIL scheme, whereas the implementation details of both the SAIL attack and the UNSAIL defense are given in Sec. V. The experimental setup is described in Sec. VI, and our detailed experimental study is presented in Sec. VII. Section VIII provides a discussion, and we conclude in Sec. IX.

Ii Logic Locking Techniques and Related Attacks

Ii-a Brief Overview of Logic Locking

Traditional Logic Locking. Early research in logic locking focused on finding suitable places for the insertion of key-gates. Researchers proposed several key-gate insertion algorithms such as random logic locking (RLL) [epic_journal], fault analysis-based logic locking (FLL) [JV-Tcomp-2013], and strong/secure logic locking (SLL) [yasin_TCAD_2016]. Researchers also investigated the use of different logic gates for obfuscation, e.g., X(N)OR gates [epic_journal, JV-Tcomp-2013], AND/OR gates [dupuis2014novel], multiplexers (MUXes) [JV-Tcomp-2013], etc. One advantage of such schemes is the high output corruption induced upon the application of incorrect keys.

SAT-Attack Resilient Logic Locking. An adversary with access to a functional IC can launch several attacks on the aforementioned logic locking techniques. The most potent oracle-guided attack is the SAT-based attack, which compromised all existing logic locking schemes at that time [Subramanyan_host_2015]. In response to this powerful attack, researchers began developing SAT-attack resilient solutions such as Anti-SAT [CHES2016YANGXIE], stripped functionality logic locking (SFLL) [yasin_CCS_2017], and SFLL-fault [sengupta2018atpg]. With each newly introduced SAT-attack resilient scheme, tailored oracle-guided attacks emerged, such as Double-DIP [shen2017double], AppSAT [shamsi2017appsat], etc.

Ii-B Oracle-Less Attacks on Logic Locking

The initial research was primarily focused on protecting logic locking from the SAT-based and other derivative attacks, which require an oracle. Recently, various oracle-less

attacks have been proposed that rely only on structural properties of the locked design. They can be classified as follows.

Fig. 3: High-level explanation for SAIL [chakraborty2018sail]. SAIL seeks to learn the structural changes induced by re-synthesis. Such a trained SAIL model can then revert the changes and recover the pre-synthesis locked design, where the correct key-bit values can be obtained by analyzing the types of key-gates.

Oracle-Less Attacks on SAT-Attack Resilient Logic Locking. Such attacks focus on structural properties of SAT-attack resilient techniques and attempt to circumvent their security promise by identifying and removing the added protection logic, thereby isolating the original circuit cone. Examples include the SFLL-hd–unlocked attack [yang2019stripped], the functional reverse engineering-based attack [alrahis2019functional], and the functional analysis attacks [sirone2020functional]. One of the drawbacks of SAT-attack resilient locking techniques [CHES2016YANGXIE, yasin_CCS_2017, sengupta2018atpg] is that they thwart the SAT-based attack [Subramanyan_host_2015] by inducing a low output corruption for incorrect key-assignments. As a result, SAT-attack resilient locking techniques are usually integrated with a high-corruptibility locking technique such as RLL or FLL [yasin_CCS_2017], to achieve security against removal attacks. Such an integration is commonly referred to as two-layer locking. In order to recover the original design, both layers must be broken.

Oracle-Less Attacks on Traditional Logic Locking. This category includes the de-synthesis attack [massad2017logic], the redundancy attack [li2019piercing], and the ML-based SAIL [chakraborty2018sail] and SWEEP [alaql2019sweep] attacks, all of which target traditional logic locking. For example, with the redundancy attack by Li et al. [li2019piercing], the authors observed that incorrect assignment(s) of the key-bit value(s) result in more redundancies in the netlist, enabling them to prune out incorrect key-assignments.

Most relevant for this work, SAIL [chakraborty2018sail] leverages ML to learn localized structural changes induced by re-synthesis for obfuscation of logic locking (see Fig. 3). SWEEP [alaql2019sweep] utilizes a feature weighting algorithm to learn and perform a mapping between design features and the correct key. Both these attacks are discussed in more detail next.

Iii Prior Work on Oracle-Less ML-Based Attacks on Traditional Logic Locking

Fig. 4: The SAIL attack flow [chakraborty2018sail].

Iii-a Sail [chakraborty2018sail]

In SAIL [chakraborty2018sail], the authors have shown that ML models can learn and revert the structural changes induced by re-synthesis, for traditional logic locking techniques (e.g., RLL, FLL, and SLL). Often there is a direct correspondence between the type of key-gate and the key-bit value, as illustrated earlier in Fig. 1. Hence, logic locking techniques seek to apply structural changes for all the key-gates using iterative re-synthesis, thereby obfuscating the correspondence between the type of key-gate and the key-bit value.

Utilizing the locked design , SAIL seeks to recover the original netlist , where

is the vector of regular inputs and

is the vector of additional key-inputs. The notion of SAIL is to learn the structural changes introduced on key-gates by deterministic design tools, logic synthesis in particular. The authors of [chakraborty2018sail] have shown that SAIL succeeds in (i) retrieving the locked design before synthesis, and (ii) obtaining the key values by analyzing the type of retrieved key-gates. A conceptual example is illustrated in Fig. 3.

The flow of SAIL is shown in Fig. 4. In addition to the locked netlist C’, the attacker also requires knowledge of the underlying logic locking technique and the synthesis setup. To generate training data, another round of locking is first implemented on top of the locked design, providing . Local structures around key-gates, which are identified by their connection to key-inputs , are extracted as pre-synthesis subgraphs or pre-subgraphs for short, denoted as set S. Next, the design C’’ is synthesized and, similarly, structures around key-gates connected to are extracted as post-synthesis subgraphs or post-subgraphs for short, denoted as set S’. This procedure is repeated multiple times to generate an extensive data set of pre-subgraphs and post-subgraphs of various degrees.222Some examples of subgraphs can be found in Fig. 5 and Fig. 6. This data set is then used to train two ML models: , a change-prediction model and , a reconstruction model. Given a key-gate and some surrounding structure from the locked design under attack, shall predict whether the related subgraph went through a structural change due to re-synthesis and, if any change is predicted, shall revert the change, providing the original, un-obfuscated key-gate. More details for the two ML models are discussed next.

Change-Prediction Model .

This model is built using the notion of Random Forest (RF), an ensemble-based model scheme used for classification or regression problems. In general, an RF is composed of multiple decision trees trained independently, and the model outputs the most voted class by individual trees. For SAIL, the sets

S and S’ are to be compared, providing a Boolean change indicator. The set [S’, change indicator] is then provided to the RF classifier in which each decision tree is trained using separately bootstrapped samples from the data set [efron1992bootstrap]. Moreover, a subset of attributes is randomly chosen from the available attributes to split each tree at each node, as is common practice for RF [breiman2001random]. However, further details, such as the number of decision trees employed, have not been provided in [chakraborty2018sail].

Reconstruction Model .

This model consists of a multi-layer, multi-channel neural network. The model is trained using the set [

S’,S]; however, the details of the network structure, training algorithm, or weight initialization have not been provided in [chakraborty2018sail]. Generating the training data and conducting the training procedure is to be conducted again for each newly introduced design and each newly introduced logic locking scheme, while the form of training data remains the same for both and .

Iii-B Sweep [alaql2019sweep]

SWEEP [alaql2019sweep] is a constant-propagation attack developed to circumvent MUX-based logic locking [JV-Tcomp-2013]. In MUX-based logic locking, the key-gates are commonly 2:1 MUXes, where the “true input” is connected with the intended signal of the original design, while the “false input” is connected to another signal of the original design [JV-Tcomp-2013]. As the true input can be easily made either to be the first or the second input pin of the MUX, the key-bit provided at the select pin of the MUX can be accordingly either 0 or 1—there is no inherent information leakage as there is no fixed correspondence of correct key-bit values and MUX key-gates.

Still, SWEEP succeeded in learning the synthesis-induced structural changes as follows. The attack is made aware of (i) the obfuscation algorithm leveraged for selecting signals for true and false inputs, and the (ii) locked design. The attack performs a training stage in which the obfuscation algorithm is utilized to generate some locked designs with known correct key-bit values. Next, an iterative procedure is followed. Each key-input is visited twice, setting the correct/incorrect key value as a constant and synthesizing the locked design for both cases. A set of features is extracted from the synthesis reports obtained for both cases (correct versus incorrect key-bit assignment). A feature weighting algorithm is utilized to learn the correlation between the extracted features and the correct key-bit values. Once training is completed, the same constant-propagation technique used in the training stage is utilized to extract the features from the design under attack. Finally, the correct key-bit values are identified using the weighted function generated from the training data.

Iv Proposed Defense

Fig. 5: High-level concept of UNSAIL. Left: training SAIL models after incorporating UNSAIL. The same subgraph has two distinct labels, fundamentally undermining training efforts for SAIL. Right: accordingly, SAIL cannot determine whether the subgraph went through change or not, which is an essential step for the attack’s efficacy.

In this work, we propose UNSAIL, a defense scheme that aims to confuse oracle-less ML-based attacks on logic locking, particularly but not exclusively the ML models used in SAIL.

In general, the quality of training data determines the accuracy and performance of any ML system. Thus, a specific goal for UNSAIL is to inject “bad data” during learning. As shown in Fig. 5, our key idea is to replicate data that evokes different attack responses. Specifically, we introduce identical subgraphs in the final locked netlist that could readily be classified as either “Changed” or “Unchanged,” thereby confusing both and of SAIL. In other words, the “bad data” injected by UNSAIL propagates through SAIL, feeding into its ML models and inducing flawed inferences.

Fig. 6: Example of UNSAIL integrated with X(N)OR logic locking scheme. (a) Original design. (b) Locked design using one XOR key-gate; the correct key value is k1=0. The gates highlighted in light gray represent a pre-subgraph of size sub=3. (c) Post-synthesized design. Based on the observed subgraph, UNSAIL searches for a NAND-OR structure suitable to insert an additional, corresponding key-gate, namely an XNOR. (d) Final locked design with two identical subgraphs, one generated by synthesis and one added by UNSAIL. The correct keys are k1=0, k2=1.

Next, we outline the working principle of UNSAIL. A motivational example is shown in Fig. 6. Note that more implementation details for UNSAIL are also given in Sec. V-B.

First, a design is locked traditionally, using any combinational logic locking technique of choice, with only a subset of all the desired key-gates being inserted at this stage (say, K/2). This locked design is then passed through synthesis, which transforms some of the key-gates and surrounding structures (i.e., the subgraphs). For the remaining key-gates (say, the other K/2), UNSAIL then carefully “injects” identical subgraphs in that post-synthesized design. Essentially, there are two parts to this. First, we tackle the subgraphs, which remained unchanged during synthesis. For that, we want to revisit the synthesis stage and add transformable UNSAIL key-gate structures matching to the specific sets of gates which remained unchanged so far. In other words, we want synthesis to work on UNSAIL structures, which will then undergo changes. Second, we can achieve a similar effect for the structures that already went through change due to the earlier synthesis step by adding the same structures in the post-synthesized design. The newly added structures are not generated by the synthesis tool and did not undergo any structural change. Since the resulting UNSAIL-locked design now contains genuinely transformed subgraphs and “injected” ones in sufficiently large numbers, SAIL will fail to identify structural changes with confidence.

V Methodology

V-a Implementation of SAIL

The concept of SAIL was presented in [chakraborty2018sail]; we have also reviewed it in Sec. III. However, the precise setup details were not provided in [chakraborty2018sail]. Thus, in this work, we implement SAIL according to the best of our understanding. As it is forming an essential part as a baseline for our work, we discuss our SAIL implementation in some detail.

We encode the subgraphs as vectors that describe the order of gates in the subgraph structures. We consider the following sizes of subgraphs, or sub-sizes for short: sub=3, sub=5, and sub=6. A subgraph is extracted and encoded as an individual vector for each key-input in the locked design for all sub-sizes. For example, if the design is locked using K=64 and sub-sizes of sub=3, sub=5, and sub=6 are considered, then 64 subgraphs/vectors are extracted from the locked design for each sub-size, or 192 subgraphs/vectors in total. We note that in SAIL, the authors considered sub-sizes from sub=3 up to sub=10, and they observed that (a) the accuracy of the SAIL classifier increased with the sub-size but also (b) the average accuracy saturated for sub-size of sub=5 to sub=6 [chakraborty2018sail].

After extracting all vectors, we apply a one-hot encoding on them, and we feed the encoded vectors to a classifier model

that we built as described in [chakraborty2018sail] and reviewed in Sec. III. Concerning the reconstruction model , the authors in [chakraborty2018sail] described it merely as a multi-input multi-channel network; however, no details were given regarding the type of the network used nor regarding its dimensions. In order to reproduce the results of SAIL, we had to implement and test several network types for

, such as feedforward neural network and recurrent neural network. The model that showed the best results was a sequence-to-sequence (Seq2Seq) encoder-decoder model with attention.

333In general, such models consist of an encoder processing the input and a decoder processing the output, while the attention mechanism serves for the decoder to focus on the relevant parts of the encoded input when generating the translation. Such models are commonly used in automated translation and, for our work, the goal is to translate the post-subgraphs to pre-subgraphs.

We implemented the encoder using an embedding layer and two long short-term memory (LSTM) layers. The embedding converts the textual encoded vectors (representing subgraphs) into vectors of real numbers. The decoder model is implemented using two LSTMs with attention.

is an ensemble model; hence, we have trained three such models for the different sub-sizes considered (sub=3, sub=5, and sub=6). The models were combined using a cumulative voting scheme. More implementation and setup details are also provided in Sec. VI.

V-B Implementation of UNSAIL

Fig. 7 illustrates the UNSAIL flow, which can be integrated with any combinational logic locking technique. To protect the original design C using a key-size of K, C is first locked using key-bits following the logic locking technique of the designer’s choice. Next, the locked design C’ is passed through synthesis, to obtain an obfuscated design . The local structures around all key-gates (i.e., subgraphs) are obtained from both C’ and and compared to detect any changes induced by re-synthesis. The subgraphs that went through synthesis-induced changes are selected and stored in a dictionary data structure. Note that subgraphs which did not go through changes during synthesis are also stored for later use, in a separate set U.

Fig. 7: Integration of UNSAIL with combinational logic locking.

Next, the dictionary is used to guide the insertion of the remaining key-gates. The search is carried out until all the remaining key-bits are assigned with key-gates. Note that, if some particular entry cannot be found in the locked netlist, we can easily “fill-up” remaining key-bits by leveraging some more instances of other entries previously found. Some of the UNSAIL-locked structures are synthesized to the specific set of gates in U to confuse the learning of the ML-models further.

Fig. 8: Visualizing the pre-subgraph and post-subgraph data using t-distributed stochastic neighbor embedding [maaten2008visualizing]. (Top) Aggressive synthesis optimization is performed on the RLL benchmarks and, thus, in most cases, the “Changed” and “Unchanged post-synthesis subgraphs can be clearly differentiated, which is the main requirement for SAIL. (Bottom) Once UNSAIL is incorporated, these classes are largely overlapping, and the more substantial the overlap, the larger the complexity for the classification problem in general, i.e., for any classifier model, including the one used for SAIL.

In this work, the subgraph extraction, UNSAIL key-gates insertion stages, and RLL are implemented using Perl scripts that operate directly on Verilog netlists.

V-C Scope and Effect of UNSAIL

We emphasize that for SAIL [chakraborty2018sail], only X(N)OR-based logic locking is evaluated. In our work, however, we thoroughly investigate the impact of various types of key-gates (Table II).

The motivation for considering different structures is as follows. X(N)OR key-gates, which are commonly applied for many logic locking schemes, can be replaced by MUXes, which are more resilient against most attacks when compared to simple gates (as also indicated in Sec. III-B). For such replacement, the locked net and its negated signal would be connected to the MUX inputs in one of the two possible orders, and the resulting key-bit is connected to the MUX select line (see also Fig. 12(a) on page 12). However, given that the negated signal might be easy to identify from the netlist structure, such an otherwise resilient MUX could be tackled by SAIL and similar attacks. Thus, we advocate to vary and mix different types of key-gates, and we study the effect of such compound locking (CL). Note that, when locking the designs with the CL variations (Table II), the type of the key-gate is chosen randomly from the set of key-gates. We perform such random selection to break the deterministic nature of the mapping problem targeted at by the ML models of SAIL.

Locking Variations Types of Key-Gates
X(N)OR X(N)OR key-gates
CL_v1
Multiplexers key-gates constructed using AND, OR gates
& multiplexers key-gates constructed using NAND gates
CL_v2
Multiplexers key-gates constructed using NOR gates
& CL_v1
CL_v3
X(N)OR key-gates
& CL_v2
CL_v4
AND/OR key-gates
& CL_v3

CL is short for compound locking where a mix of various key-gates are used.

TABLE II: Variations of Key-Gates Used in This Work

Next, we conduct an exploratory experiment on two cases for the ITC-99 benchmark b17_C to understand the impact of UNSAIL on the final structure of locked designs. For case a), we lock the benchmark with K=512, using only RLL, but considering all the different key-gate structures listed in Table II. For each structure considered, 20 instances of RLL designs are generated. For case b), we lock the benchmark using both RLL and UNSAIL; each technique is employed to realize 256 key-bits, resulting in K=512. As in a), we consider all the different structures, and 20 locked instances are generated for each of the structures.

For both cases, first, the post-subgraphs and pre-subgraphs are extracted. The post-subgraphs that went through change due to synthesis are labeled as “Changed” and those unaltered as “Unchanged.” The related data is then projected non-linearly to 2D using t-distributed stochastic neighbor embedding (t-SNE) [maaten2008visualizing]. Fig. 8(top) represents the data set for RLL traditionally employed—for most key-gate structures; one cluster is dominant, namely that for “Changed” post-subgraphs. Applying UNSAIL, however, can render such classification significantly more difficult—clusters are primarily overlapping, as shown in Fig. 8(bottom). In short, enhancing logic locking through UNSAIL, we can expect a large overlap between classes, essentially ensuring “bad data,” thereby hindering appropriate training of SAIL (or, for that matter, any ML-based attack on structural properties of locked netlists).

To further quantify the difficulty of separating/learning the classes, we study the classification accuracy in detail. Although we investigate various classification models, as discussed in more detail in Sec. VII-A, the robustness of such insights depends on the classifier choices/parameters. Thus, our first goal is to support our claim that UNSAIL can incur a difficult classification problem for any classifier type.

The related notion of meta-analysis of supervised ML models is a research area that aims to correlate the inherent complexity of a dataset with the performance of the classifiers [kalousis2004data]. Several metrics, including the maximum Fisher’s discriminant ratio F1 [ho2001data], have been proposed to characterize the classification complexity inherent to datasets [Complexity_measures, lorena2019complex]. F1 has been shown to be effective for quantifying the difficulty in separating the data into corresponding classes and, hence, in portraying the complexity of the respective classification problem [Complexity_measures, lorena2019complex, example_f1]. In general, Fisher’s discriminant ratio f measures how strongly two classes differ along with a specific feature and is defined as follows:

where represent the mean of the feature values for class x and

represents the standard deviation of the feature values. The range of the ratio is

. A small ratio indicates a substantial overlap between the classes. The larger the ratio, the easier is the separation of the two classes using that feature/attribute. Hence, to measure the overlap between two classes in general, f is calculated for all of the considered features. Then the maximum ratio F1 is selected to judge the separability of the classes.

It is expected that F1 will be lower for UNSAIL when compared to RLL. We quantify the F1 ratio for both UNSAIL and RLL on selected ITC-99 benchmarks for K=512 and sub=3 (Table III). Indeed, the results support our claim: On average, UNSAIL achieves a 55.72% reduction in the F1 ratio, which translates to complex classification problems in general. It is expected that the effect of UNSAIL on the change-prediction classification should be more prominent for those cases where a significant reduction can be observed, i.e., for particular flavors of compound locking. This is verified in Sec. VII-A, as the results obtained there show that compound-based UNSAIL locking affects the classification stage more than X(N)OR-based UNSAIL locking.

Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
b14_C 2.15 1.03 2.13 0.57 3.75 0.66 6.37 0.71 1.41 0.90
b15_C 1.30 0.85 2.06 0.61 3.73 0.75 5.47 0.80 1.04 0.65
b20_C 1.60 0.90 1.98 0.60 4.75 0.82 2.64 0.84 1.22 0.98
b21_C 1.56 2.00 2.00 0.65 4.75 0.63 3.09 2.00 2.00 2.00
b22_C 1.76 0.98 1.94 0.62 3.01 0.86 5.61 0.62 1.21 1.00
b17_C 0.97 0.83 2.04 0.69 3.95 0.45 9.60 0.87 2.00 2.00
Average 1.56 1.10 2.03 0.62 3.99 0.70 5.46 0.97 1.48 1.25
TABLE III: Maximum Fisher’s Discriminant for K=512 and sub=3 on Selected ITC-99 Benchmarks

Vi Experimental Setup

Here, we provide the details regarding the experimental setup followed in our work. See also Fig. 9 for an overview.

Fig. 9: The various components employed in our study. Gray colored boxes represent scripts developed/implemented in-house.

Vi-a Test Cases

We study the effectiveness of UNSAIL on eight combinational benchmarks from the ISCAS-85 suite and six combinational benchmarks from the ITC-99 suite. Similar to the exploratory experiment in Sec. V-C, we consider two cases: a) RLL, and b) UNSAIL based on RLL. For both cases, each benchmark is locked using all the different key-gate structures in Table II. For case a), ISCAS-85 benchmarks are locked with K=64 and K=128, respectively, while ITC-99 benchmarks are locked with K=256 and K=512, respectively. Moreover, each benchmark is locked 20 times for each structure, resulting in a total of 2,800 RLL instances. For each structure, key-size, and benchmark, one of these 20 locked instances is considered as circuit under attack and excluded from the training set, resulting in a total of 140 attacked RLL instances. For case b), same benchmarks are locked, considering the same parameters as above, while following the UNSAIL procedure.

Besides leveraging RLL, we also integrate UNSAIL with FLL [JV-Tcomp-2013] and SLL [yasin_TCAD_2016]

, respectively. Locked instances of selected ISCAS-85 benchmarks are generated using the open-source tool provided in 

[code_pramod]. Since the aforementioned logic locking techniques are essentially X(N)OR-based logic locking, we train the model using RLL-based logic locking and launch the attack on the FLL-based and SLL-based locked instances with K=128. Independently, we also investigate UNSAIL when locking the GPS module in the million-gate ORPSoC design [CEP_github].

Vi-B Setup for Security Evaluation

The SAIL model is implemented as a RF model with 50

decision trees and also as a support vector machine (SVM) using a radial basis function Gaussian kernel. The hyper-parameters for the SVM classifier were re-evaluated for each trained model and, thus, the kernel parameters such as the scale vary depending on the circuit under attack and its corresponding training set. The SAIL model

is implemented as an ensemble of Seq2Seq encoder-decoder models with attention, as described in Sec. V. We use an embedding dimension of 256 for both the encoder and decoder. All of the LSTM layers consist of 200

hidden units. In order to mitigate over-fitting during training, a random-dropout probability of

0.05 was set. Each Seq2Seq model is trained for 60epochs with a mini-batch size of 20. In each epoch, the subgraphs are shuffled, and then mini-batches are constructed. The network parameters are updated using Adam optimizer after each mini-batch. A learning rate of 0.002, a gradient decay factor of 0.9, and a squared gradient decay factor of 0.999 are set. Both SAIL models are implemented in MATLAB.

We also launch SWEEP on RLL and UNSAIL-locked instances using the open-source attack tool [alaql2019sweep]. The default margin value of 0 was used. The higher the margin is, the lower the chance of performing a wild guess by the attack, and the lower the reported accuracy is. We also evaluate the resilience of UNSAIL against another oracle-less attack on logic locking, the redundancy attack [li2019piercing].

The output corruption enforced by UNSAIL is measured by the Hamming distance (HD) between the outputs of the original design and the outputs of the locked design, under the application of random incorrect keys. The output error rate (OER) is calculated, as well. Ideal values for HD and OER would be 50% and 100%, respectively. The simulations for HD and OER are performed using Mentor Graphics ModelSim.

Vi-C Setup for Synthesis, Testing, and Layout Evaluation

Synthesis is performed using Synopsys Design Compiler for the slow process corner with particular focus on area minimization and iso-performance timing closure. Synopsys Tetramax was used to generate a minimal set of test patterns for the locked benchmarks. Test coverage and fault coverage are also calculated using the same tool. For layout-level assessment, we employ the public Nangate 45nm Open Cell Library with ten metal layers and use Cadence Innovus. Layout overheads are calculated at 0.95V, 125C, with the slow process corner and input switching activity of 0.20.

K=64 K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4 X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.72 0.58 1.00 0.64 1.00 0.59 0.97 0.56 0.92 0.62 0.70 0.57 1.00 0.58 1.00 0.63 0.93 0.60 0.95 0.62
c1355 0.92 0.65 1.00 0.74 1.00 0.74 0.97 0.72 0.98 0.72 0.94 0.75 1.00 0.71 1.00 0.71 0.98 0.68 0.98 0.78
c1908 0.83 0.75 1.00 0.69 1.00 0.70 0.97 0.62 0.97 0.74 0.91 0.67 1.00 0.69 1.00 0.62 0.96 0.64 0.95 0.70
c2670 0.68 0.59 1.00 0.59 1.00 0.56 0.94 0.59 0.95 0.62 0.71 0.60 1.00 0.54 1.00 0.60 0.91 0.58 0.91 0.59
c3540 0.80 0.56 1.00 0.67 1.00 0.68 0.89 0.58 0.95 0.61 0.78 0.56 1.00 0.55 1.00 0.57 0.93 0.59 0.93 0.68
c5315 0.85 0.62 1.00 0.61 1.00 0.62 0.97 0.56 0.89 0.66 0.78 0.58 1.00 0.59 1.00 0.59 0.89 0.56 0.95 0.61
c6288 0.88 0.64 1.00 0.73 0.98 0.64 0.98 0.71 0.97 0.70 0.87 0.65 0.99 0.66 1.00 0.66 0.98 0.67 0.95 0.72
c7552 0.88 0.74 1.00 0.69 1.00 0.69 0.98 0.70 0.95 0.66 0.88 0.58 1.00 0.65 1.00 0.66 0.94 0.61 0.96 0.67
Average 0.82 0.64 1.00 0.67 0.99 0.65 0.96 0.63 0.95 0.67 0.82 0.62 0.99 0.62 1.00 0.63 0.94 0.62 0.95 0.67
K=256 K=512
b14_C 0.84 0.63 1.00 0.65 1.00 0.65 0.93 0.62 0.95 0.66 0.83 0.65 1.00 0.60 1.00 0.61 0.96 0.61 0.95 0.63
b15_C 0.77 0.59 1.00 0.59 1.00 0.58 0.94 0.56 0.95 0.59 0.75 0.60 1.00 0.55 1.00 0.57 0.95 0.55 0.95 0.59
b20_C 0.86 0.65 1.00 0.62 0.99 0.64 0.96 0.66 0.95 0.67 0.84 0.64 0.99 0.59 0.99 0.62 0.94 0.60 0.95 0.64
b21_C 0.85 0.65 0.99 0.67 0.99 0.64 0.99 0.60 0.99 0.67 0.86 0.63 1.00 0.62 1.00 0.63 0.94 0.60 0.96 0.64
b22_C 0.85 0.64 0.99 0.65 0.99 0.64 0.98 0.64 0.95 0.69 0.85 0.63 0.99 0.64 0.99 0.64 0.96 0.61 0.96 0.65
b17_C 0.72 0.61 1.00 0.59 1.00 0.57 0.94 0.57 0.96 0.60 0.75 0.58 1.00 0.56 1.00 0.57 0.92 0.57 0.92 0.59
Average 0.82 0.63 0.99 0.63 0.99 0.62 0.96 0.61 0.96 0.65 0.81 0.62 0.99 0.59 0.99 0.61 0.95 0.59 0.95 0.62
TABLE IV: Accuracy for SAIL Random Forest (RF) Classifier for Selected ISCAS-85 and ITC-99 Benchmarks Using sub=3
Fig. 10: Process for assessing the security of UNSAIL against the SAIL attack.

Vii Experimental Investigation

In this section, we first perform a detailed and thorough security analysis of our proposed UNSAIL scheme, starting with the SAIL attack [chakraborty2018sail]. Fig. 10 summarizes the evaluation process used for UNSAIL against the SAIL attack. We consider the role of (i) key-size, (ii) key-gate type, (iii) initial key-gate insertion algorithm, and (iv) subgraph size. When detailing the attack results, we initially report the classification accuracy of the change prediction model using RF and SVM classifier models (Sec. VII-A). Next, we report the key-gate recovery accuracy using the reconstruction model implemented as a Seq2Seq ensemble model (Sec. VII-B). Finally, we report the overall key-gate recovery accuracy when combining both and (Sec. VII-C).

Our evaluation is expanded beyond the SAIL attack; the resilience of UNSAIL is further demonstrated against the SWEEP attack and the redundancy attack in Sec. VII-D and Sec. VII-E, respectively. Furthermore, the results of the output corruption evaluation for UNSAIL are discussed in Sec. VII-F, while the effect of UNSAIL on structural testing is investigated in Sec. VII-G. The overheads of our defense are presented in Sec. VII-H. Finally, the experimental results on the DARPA OpenCores benchmark are discussed in Sec. VII-I.

Vii-a Change-Prediction Model Accuracy on UNSAIL Vs. Traditional Logic Locking

Initially, we investigate the classification accuracy to evaluate the performance of . The accuracy is defined as the number of correct predictions divided by the total number of predictions (i.e., key-size K). We test the model for different key-sizes, different key-gate structures, several sub-sizes, and two classification algorithms. Recall that the goal of UNSAIL is to insert key-gates such that the complexity of the classification problem increases and the classification accuracy drops. Indeed, for all considered cases, the model performs with much lower accuracy on UNSAIL-locked instances.

Varying the Key-Gate Type. Pre- and post-subgraphs of size sub=3 are extracted from all the locked instances. The model is initially implemented as an RF model and trained separately for each benchmark, with the data extracted from 19 (in total 20) instances for each locking variation. Then, the classifier is tested on locked ISCAS-85 and ITC-99 benchmarks with K=64 and K=256, respectively; results from both experiments are shown in Table IV. Studying the classification accuracy of SAIL on RLL X(N)OR vs. RLL CL increases the performance for the latter case. For example, the average accuracy on RLL X(N)OR is 82%, while on RLL CL_v1, it is 99.5%. Further investigation reveals most of the extracted post-subgraphs are affected by re-synthesis for the evaluated CL scheme. This leads to an imbalanced data set where most of the subgraphs belong to one class, namely “Changed,” rendering classification less difficult.

Next, we study the effect of UNSAIL. We note that our defense is capable of reducing the classification accuracy in all the considered test cases. On an average, UNSAIL reduces the classification accuracy when using X(N)OR, CL_v2, and CL_v4 structures by 18pp, 34pp, and 28pp, respectively, for K=64. Hence, UNSAIL has an even more significant effect on reducing the classification accuracy when using CL techniques as compared to UNSAIL X(N)OR locking.

Varying the Key-Size. Next, we repeat the previous experiment for K=128 and K=512, to study the effect of increasing the key-size on the classification accuracy. Results from both the experiments are shown in Table IV. We note that, on an average, the attack (classification accuracy of SAIL) performs slightly better for smaller key-sizes. While the average accuracy on RLL ISCAS-85 benchmarks is 94.4% and 94% for K=64 and K=128, respectively, the average accuracy on RLL ITC-99 benchmarks is 94.4% and 93.8% for K=256 and K=512, respectively.

UNSAIL reduces the average classification accuracy when using X(N)OR, CL_v2, and CL_v4 structures by 20pp, 37pp, and 28pp, respectively, for ISCAS-85 instances with K=128. For ITC-99 instances with K=512, UNSAIL reduces the average classification accuracy by 19pp, 38pp, and 33pp, respectively, for the same structures/locking techniques. We note that UNSAIL has a larger impact on the instances locked with a larger key-size: the average reduction for classification accuracy is 29.2pp for K=64 and 30.8pp for K=128. These findings illustrate that UNSAIL is effective for a varied range of benchmarks with varying key-sizes.

Varying the Sub-size. In this set of experiments, we examine the effect of varying the sub-size on the classification accuracy for UNSAIL. Toward this end, we train and test using sub=5 and sub=6; see Table V for the results.

We observe that the classification accuracy for SAIL increases with an increase of sub-size (which is in agreement with the findings reported in [chakraborty2018sail]). For example, the average classification accuracy for RLL X(N)OR increases from 82% for sub=3 to 93% for sub=6. Increasing the sub-size leads to an imbalanced data set as most subgraphs are affected by re-synthesis, which results in higher classification accuracy. This is intuitive as a large subgraph has a higher probability of being affected by re-synthesis than a smaller subgraph.

Even with such an increase in classification accuracy for larger sub-size, UNSAIL remains successful in reducing accuracy. Comparing UNSAIL vs. RLL, for CL_v1 with sub=3, sub=5, and sub=6, the average classification accuracy is reduced by 37pp, 24pp, and 21pp, respectively, for ISCAS-85 benchmarks locked with K=128. We note that the classifier trained with sub=3 is affected most by UNSAIL; this is expected as UNSAIL structures of size sub=3 were added.

Varying the Classifier Model. To further investigate the efficacy of UNSAIL against other classification algorithms, the SAIL model was implemented using SVM, trained, and tested using sub=3 and different key-sizes; the related results are presented in Table VI. These experiments are in agreement with our earlier findings, namely that the SAIL classifier achieves slightly better accuracy on locked instances with smaller key-size. That is, the average classification accuracy on ISCAS-85 benchmarks locked using RLL with K=64 is 94.4% and reduces marginally to 93.8% with K=128.

The results on UNSAIL-locked instances support our claim that UNSAIL incurs a more complex classification problem for different classifiers compared to RLL. The average classification accuracy for CL_v1 was reduced by 37pp and 36pp using the RF model and SVM model, respectively, for ISCAS-85 benchmarks locked with K=128 and sub=3.

K=128
Sub-size sub=5 sub=6
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4 X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.86 0.70 1.00 0.65 1.00 0.63 0.98 0.67 0.98 0.70 0.88 0.72 1.00 0.70 1.00 0.71 0.98 0.76 0.98 0.71
c1355 0.98 0.84 1.00 0.83 1.00 0.84 1.00 0.72 1.00 0.83 0.98 0.89 1.00 0.90 1.00 0.88 1.00 0.85 0.99 0.85
c1908 0.97 0.79 1.00 0.77 1.00 0.74 0.98 0.69 0.98 0.69 0.97 0.76 1.00 0.83 1.00 0.80 0.98 0.84 0.98 0.83
c2670 0.86 0.66 1.00 0.74 1.00 0.83 0.96 0.74 0.97 0.80 0.89 0.67 1.00 0.76 1.00 0.84 0.96 0.80 0.97 0.80
c3540 0.89 0.66 1.00 0.79 1.00 0.80 0.97 0.72 0.98 0.78 0.90 0.65 1.00 0.80 1.00 0.80 0.98 0.80 0.98 0.80
c5315 0.89 0.66 1.00 0.79 1.00 0.66 0.94 0.71 0.97 0.77 0.88 0.70 1.00 0.82 1.00 0.70 0.94 0.78 0.97 0.77
c6288 0.96 0.70 1.00 0.77 1.00 0.77 0.98 0.75 0.98 0.81 0.98 0.76 1.00 0.80 1.00 0.85 0.99 0.81 0.99 0.83
c7552 0.95 0.73 1.00 0.66 1.00 0.77 0.98 0.68 0.98 0.69 0.95 0.70 1.00 0.71 1.00 0.82 0.98 0.77 0.98 0.75
Average 0.92 0.72 0.99 0.75 0.99 0.76 0.97 0.71 0.98 0.76 0.93 0.73 1.00 0.79 1.00 0.80 0.98 0.80 0.98 0.79
K=512
Sub-size sub=5 sub=6
b14_C 0.92 0.74 1.00 0.73 1.00 0.76 0.98 0.68 0.98 0.73 0.93 0.73 1.00 0.77 1.00 0.77 0.98 0.71 0.99 0.75
b15_C 0.87 0.70 1.00 0.68 1.00 0.71 0.98 0.72 0.98 0.73 0.91 0.71 1.00 0.74 1.00 0.76 0.98 0.75 0.98 0.77
b20_C 0.92 0.72 0.99 0.71 0.99 0.75 0.98 0.72 0.98 0.72 0.92 0.75 0.99 0.79 0.99 0.76 0.98 0.76 0.98 0.77
b21_C 0.94 0.71 0.99 0.72 0.99 0.76 0.97 0.72 0.98 0.72 0.96 0.71 0.99 0.77 0.99 0.81 0.98 0.76 0.98 0.74
b22_C 0.96 0.74 1.00 0.72 1.00 0.74 0.96 0.72 0.99 0.72 0.96 0.75 1.00 0.75 1.00 0.77 0.97 0.76 0.99 0.78
b17_C 0.83 0.67 1.00 0.72 1.00 0.68 0.95 0.72 0.98 0.74 0.86 0.67 1.00 0.74 1.00 0.72 0.95 0.75 0.98 0.76
Average 0.91 0.71 0.99 0.71 0.99 0.73 0.97 0.71 0.98 0.73 0.92 0.72 0.99 0.76 0.99 0.76 0.97 0.75 0.98 0.76
TABLE V: Accuracy for SAIL Random Forest (RF) Classifier for Selected ISCAS-85 and ITC-99 Benchmarks Using sub=5 and sub=6
K=64 K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4 X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.71 0.58 1.00 0.64 1.00 0.62 0.97 0.56 0.94 0.61 0.71 0.56 1.00 0.58 1.00 0.62 0.93 0.62 0.96 0.62
c1355 0.92 0.67 1.00 0.72 1.00 0.75 0.97 0.60 0.98 0.74 0.94 0.73 1.00 0.78 1.00 0.77 0.98 0.70 0.98 0.76
c1908 0.82 0.74 1.00 0.73 1.00 0.70 0.97 0.64 0.97 0.74 0.94 0.69 1.00 0.68 1.00 0.65 0.96 0.65 0.95 0.74
c2670 0.69 0.56 1.00 0.64 1.00 0.59 0.94 0.59 0.95 0.64 0.67 0.59 1.00 0.54 1.00 0.58 0.91 0.57 0.91 0.61
c3540 0.80 0.56 1.00 0.63 1.00 0.64 0.89 0.59 0.95 0.59 0.78 0.54 1.00 0.58 1.00 0.57 0.93 0.58 0.93 0.68
c5315 0.83 0.62 1.00 0.61 1.00 0.62 0.97 0.55 0.89 0.66 0.80 0.58 1.00 0.59 1.00 0.60 0.89 0.57 0.95 0.62
c6288 0.88 0.61 1.00 0.71 0.98 0.64 0.98 0.72 0.97 0.66 0.87 0.67 0.99 0.66 0.99 0.66 0.98 0.65 0.95 0.72
c7552 0.88 0.70 1.00 0.67 1.00 0.69 0.98 0.69 0.95 0.66 0.88 0.63 1.00 0.63 1.00 0.66 0.94 0.61 0.96 0.68
Average 0.82 0.63 1.00 0.67 0.99 0.66 0.96 0.62 0.95 0.66 0.82 0.62 0.99 0.63 0.99 0.64 0.94 0.62 0.95 0.68
K=256 K=512
b14_C 0.84 0.64 1.00 0.65 1.00 0.66 0.93 0.63 0.95 0.64 0.83 0.64 1.00 0.59 1.00 0.61 0.96 0.60 0.95 0.63
b15_C 0.77 0.59 1.00 0.58 1.00 0.58 0.94 0.54 0.95 0.59 0.75 0.59 1.00 0.55 1.00 0.57 0.95 0.55 0.95 0.59
b20_C 0.86 0.64 0.99 0.63 0.99 0.64 0.99 0.66 0.99 0.68 0.84 0.64 0.99 0.59 0.99 0.62 0.94 0.61 0.95 0.64
b21_C 0.85 0.66 0.99 0.67 0.99 0.63 0.98 0.60 0.95 0.66 0.86 0.63 1.00 0.62 1.00 0.63 0.94 0.61 0.96 0.65
b22_C 0.88 0.62 1.00 0.64 0.99 0.63 0.96 0.65 0.95 0.69 0.85 0.62 0.99 0.64 0.99 0.64 0.96 0.61 0.96 0.65
b17_C 0.69 0.60 1.00 0.59 1.00 0.57 0.94 0.57 0.96 0.60 0.75 0.58 1.00 0.57 1.00 0.58 0.92 0.56 0.96 0.59
Average 0.82 0.62 0.99 0.63 0.99 0.62 0.96 0.61 0.96 0.64 0.81 0.62 0.99 0.59 0.99 0.61 0.95 0.59 0.95 0.62
TABLE VI: Accuracy for SAIL Support Vector Machine (SVM) Classifier for Selected ISCAS-85 and ITC-99 Benchmarks Using sub=3

Integrating UNSAIL with FLL and SLL. We locked ISCAS-85 benchmarks using SLL and FLL with K=128 using the binaries provided in [code_pramod]. Each benchmark was locked once using SLL and once using FLL. Reusing the 20 X(N)OR-based RLL instances for each of those benchmarks, (implemented as RF) was trained for sub=6 and then launched on the SLL and FLL instances. The average accuracy of the model on SLL and FLL benchmarks is 97% and 93%, respectively. Comparing the performance of on RLL (Table V) vs.  on FLL, the results are largely consistent. In the case of SLL, we observe an increase in the accuracy of when compared to RLL, namely by an average of 4pp.

To evaluate the performance of UNSAIL in protecting the FLL and SLL instances, we next integrate UNSAIL with SLL and FLL. As usual, 64 key-gates are employed during initial locking, whereupon the remaining 64 key-gates are injected by UNSAIL, thereby achieving K=128. We train the model using RLL-based UNSAIL instances and launch it on the SLL-based and the FLL-based UNSAIL instances; this is fair since SLL and FLL are also using X(N)OR key-gate structures. The results are reported in Fig. 11. The average accuracy on the SLL-based UNSAIL instances is 69%, implying a reduction of 28pp; the average accuracy on the FLL-based UNSAIL instances is 75%, indicating a reduction of 18pp. These findings support our claim the UNSAIL is capable of protecting any combinational logic locking technique from SAIL.

(a)
(b)
Fig. 11: SAIL change-prediction model accuracy on UNSAIL vs. SLL and FLL when K=128 and sub=6 on selected ISCAS-85 benchmarks.

Summary. of SAIL was thoroughly tested on RLL and UNSAIL-locked instances of selected ISCAS-85 and ITC-99 benchmarks. We considered different types of key-gate, key-sizes, sub-sizes, and other classification models. The classifier was additionally studied on SLL/FLL vs. SLL/FLL-based UNSAIL. Results show that the classification accuracy of increases with an increase in sub-size, which is also reported in [chakraborty2018sail]. It was also observed that achieves better accuracy on benchmarks locked with smaller key-sizes. Besides, achieves a higher accuracy on (1) the CL-based RLL vs. X(N)OR-based RLL and (2) SLL instances when compared to RLL and FLL.

Analysis of the results for UNSAIL-locked instances shows that our defense is capable of decreasing the classification accuracy of for all of the tested cases, achieving the best performance when sub=3 is used. Our defense technique does not require a specific classification model or setup; it succeeds in increasing the complexity for classifying the key-gates and related subgraphs under all scenarios. Moreover, our defense does not require a specific locking technique and can protect any traditionally locked design.

Vii-B Reconstruction Model Accuracy on UNSAIL Vs. Traditional Logic Locking

Here, we study for RLL and UNSAIL-locked instances. The accuracy of recovering the pre-synthesis key-gate is shown for ISCAS-85 benchmarks and ITC-99 benchmarks with K=128 and K=512, respectively, in Table VII. Two important observations can be inferred from the results, which are discussed next.

Varying the Key-Gate Type. Recall that in this work, we also study the effect of different types of key-gates (Table II). We believe that by varying the locking structures, the model will need to understand (have learned on) a larger variation of synthesis-induced changes, which tends to affect the underlying accuracy. Moreover, by randomizing the selection of key-gate types used, we can expect to limit the “deterministic footprint” for obfuscation inferred by the synthesis tools.

We observe that the average recovery accuracy reduces once we introduce more variations to the key-gate structures. The average accuracy on RLL ISCAS-85 instances with K=128 for X(N)OR, CL_v1, CL_v2, CL_v3, and CL_v4 key-gate structures is 64%, 51%, 35%, 28%, and 40%, respectively (Table VII). For the large ITC-99 benchmarks with K=512, the average key-gate recovery accuracy is 72%, 46%, 48%, 32%, and 43%, respectively.

Effect of UNSAIL Structures. The accuracy of is further reduced by the UNSAIL structures, as shown in Table VII. The average key-gate recovery accuracy for UNSAIL-locked ISCAS-85 instances with K=128) for X(N)OR, CL_v1, CL_v2, CL_v3, and CL_v4 techniques is 50%, 24%, 15%, 17%, and 23%, respectively, which represents an average reduction of 17.8pp when compared to RLL. For the larger ITC-99 benchmarks with K=512, the average reduction is 18.6pp. Similarly, a consistent effect induced by UNSAIL for reducing the accuracy is also observed when comparing UNSAIL to SLL and FLL, as shown in Table VIII.

Summary. More variations in key-gate structures hinder the reconstruction model in general, and the UNSAIL structures strengthen this effect further, for any logic locking technique.

K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.72 0.46 0.47 0.26 0.31 0.16 0.26 0.15 0.38 0.17
c1355 0.60 0.62 0.56 0.26 0.40 0.11 0.33 0.24 0.36 0.28
c1908 0.64 0.56 0.52 0.24 0.34 0.16 0.29 0.18 0.45 0.27
c2670 0.77 0.48 0.57 0.26 0.32 0.18 0.24 0.14 0.41 0.23
c3540 0.62 0.46 0.47 0.23 0.30 0.17 0.30 0.21 0.34 0.26
c5315 0.65 0.52 0.45 0.22 0.30 0.14 0.30 0.10 0.43 0.16
c6288 0.57 0.45 0.58 0.21 0.42 0.13 0.25 0.15 0.41 0.33
c7552 0.55 0.43 0.47 0.23 0.37 0.15 0.24 0.16 0.40 0.17
Average 0.64 0.50 0.51 0.24 0.35 0.15 0.28 0.17 0.40 0.23
K=512
b14_C 0.71 0.50 0.45 0.24 0.43 0.30 0.31 0.22 0.41 0.27
b15_C 0.77 0.57 0.48 0.23 0.49 0.23 0.32 0.19 0.44 0.27
b20_C 0.71 0.56 0.47 0.23 0.49 0.26 0.32 0.18 0.45 0.25
b21_C 0.71 0.52 0.46 0.25 0.48 0.26 0.29 0.18 0.45 0.27
b22_C 0.68 0.50 0.46 0.30 0.50 0.27 0.34 0.17 0.42 0.24
b17_C 0.74 0.54 0.46 0.22 0.47 0.26 0.34 0.2 0.38 0.28
Average 0.72 0.53 0.46 0.25 0.48 0.26 0.32 0.19 0.43 0.26
TABLE VII: Key-Gate Detection Accuracy Using SAIL Reconstruction Model on UNSAIL Vs. RLL
Insertion SLL SLL-based UNSAIL FLL FLL-based UNSAIL
c880 0.75 0.35 0.44 0.41
c1355 0.46 0.46 0.37 0.42
c1908 0.44 0.44 0.47 0.32
c2670 0.61 0.33 0.34 0.43
c3540 0.58 0.42 0.41 0.49
c5315 0.38 0.24 0.51 0.4
c7552 0.44 0.44 0.48 0.37
Average 0.52 0.38 0.43 0.41
TABLE VIII: Key-Gate Detection Accuracy Using SAIL Reconstruction Model on UNSAIL Vs. SLL and FLL for K=128

Vii-C Change-Prediction-Boosted Reconstruction Model Accuracy on UNSAIL Vs. Traditional Logic Locking

Finally, to evaluate the effectiveness of UNSAIL, we launch the full SAIL attack where is combined with first to detect the subgraphs that went through changes due to synthesis and then revert those changes. We report the accuracy of the full SAIL attack in Table IX. We use the RF classifier with sub=6 since this configuration provided the highest classification accuracy, as previously shown in Sec.VII-A.

Analyzing the results obtained on RLL-based locking; first, we note that the effect of varying the key-gates, as observed in Sec. VII-B is still present. More specifically, we observe a reduction in the attack accuracy from an average of 68% (RLL-based X(N)OR) down to 27% (RLL-based CL_v3) on ISCAS-85 benchmarks with K=128. Similarly, a reduction in accuracy is also observed for the larger ITC-99 benchmarks with K=512. Focusing on the results obtained on X(N)OR locking, it can be observed that the reconstruction accuracy is boosted when both the models of SAIL are merged. That is, comparing with the results in Table VII for the case of RLL, the average accuracy on ISCAS-85 benchmarks is increased from 64% to 68%. Nevertheless, UNSAIL is capable of reducing the accuracy by 11pp, dropping it to 57%.

This latter finding is further supported once the full SAIL attack is launched on SLL and FLL instances (Table X). Comparing to the results in Table VIII, the average accuracy increases by 3pp for both SLL and FLL. On average, UNSAIL reduces the accuracy by 9pp for SLL and by 2pp for FLL on ISCAS-85 benchmarks with K=128. Although the reduction in accuracy is marginal for FLL-based UNSAIL, the final accuracy of 44% is still below random-guessing (50%).

Analyzing the results of CL structures for RLL, we note that merging the two attack models did not boost the key-gate recovery accuracy for RLL to begin with. This is because most of the key-gate structures added by RLL go through changes due to synthesis; recall that mainly one class of subgraphs can be observed in Fig. 8 for RLL-based CL. Hence, using a classifier model to boost will not provide a significant benefit. For example, the average key-gate recovery accuracy for CL_v4 was 40% using on its own, which even (slightly) reduced to 39% for the combined and boosted attack setup. In contrast, UNSAIL ensures that two types of classes exist in training and, thus, the key-gate detection accuracy for CL-based UNSAIL is observed to increase. Although the accuracy of the classifier was reduced by an average of 19.5pp when using CL-based UNSAIL, the classifier was still able to improve the overall accuracy of the attack. Even then, the average key-gate recovery accuracy for CL-based UNSAIL instances is 53%, which is just a shade better than random-guessing, rendering the full SAIL attack futile.

K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.77 0.47 0.52 0.59 0.31 0.60 0.26 0.41 0.38 0.49
c1355 0.66 0.63 0.54 0.48 0.38 0.40 0.30 0.49 0.36 0.50
c1908 0.64 0.65 0.51 0.54 0.51 0.54 0.29 0.50 0.41 0.50
c2670 0.81 0.63 0.55 0.59 0.32 0.55 0.20 0.53 0.41 0.52
c3540 0.66 0.59 0.47 0.58 0.30 0.59 0.30 0.58 0.34 0.57
c5315 0.69 0.57 0.49 0.59 0.30 0.54 0.30 0.43 0.43 0.52
c6288 0.57 0.53 0.54 0.57 0.33 0.53 0.25 0.52 0.38 0.55
c7552 0.62 0.45 0.30 0.60 0.28 0.59 0.29 0.50 0.40 0.55
Average 0.68 0.57 0.49 0.57 0.34 0.54 0.27 0.50 0.39 0.53
K=512
b14_C 0.71 0.59 0.46 0.56 0.43 0.6 0.30 0.47 0.42 0.48
b15_C 0.77 0.66 0.49 0.58 0.49 0.61 0.32 0.49 0.46 0.52
b20_C 0.71 0.64 0.47 0.56 0.50 0.59 0.35 0.44 0.45 0.49
b21_C 0.67 0.59 0.46 0.55 0.48 0.56 0.29 0.46 0.45 0.51
b22_C 0.68 0.58 0.46 0.59 0.50 0.57 0.34 0.50 0.42 0.50
b17_C 0.74 0.63 0.48 0.49 0.49 0.58 0.32 0.48 0.38 0.48
Average 0.71 0.62 0.47 0.56 0.48 0.59 0.32 0.47 0.43 0.50
TABLE IX: Key-Gate Detection Accuracy Using SAIL Change-Prediction-Boosted Reconstruction Model on UNSAIL Vs. RLL using sub=6 for the Change-Prediction Model
Insertion SLL SLL-based UNSAIL FLL FLL-based UNSAIL
c880 0.73 0.49 0.47 0.51
c1355 0.51 0.51 0.48 0.5
c1908 0.45 0.52 0.46 0.45
c2670 0.63 0.33 0.36 0.39
c3540 0.67 0.57 0.45 0.57
c5315 0.37 0.38 0.49 0.39
c7552 0.52 0.45 0.54 0.28
Average 0.55 0.46 0.46 0.44
TABLE X: Key-Gate Detection Accuracy Using SAIL Change-Prediction-Boosted Reconstruction Mode on UNSAIL Vs. SLL and FLL when K=128 and Using sub=6 for the Change-Prediction Model

Vii-D SWEEP Attack [alaql2019sweep] on UNSAIL Vs. RLL

Here, we launch the SWEEP attack on locked ISCAS-85 and ITC-99 benchmarks. The attack model is trained using the variations of key-gate types shown in Table II. The accuracy metric is used as suggested in [alaql2019sweep]; it denotes the percentage of correctly extracted key-bits out of the entire key-size. The attack was launched on RLL and UNSAIL-locked instances; results are documented in Table XI.

First, SWEEP does not cope well with X(N)OR locking, as indicated in [alaql2019sweep]. We experimentally verify this through the low accuracy of 33% observed on the RLL X(N)OR benchmarks. Second, although SWEEP was explicitly developed to attack MUX-based locking, the accuracy of our MUX-based CL techniques for RLL is relatively low, with an average of 41.63%. For the related CL approach, we replace an X(N)OR key-gate by a MUX, with one input driven by the true wire/signal as is and the other input driven by the false signal, which is simply the true signal inverted (Fig. 12(a)). Hence, depending on the key-bit assignment (to the MUX select line), the MUX key-gate will either be replaced by a buffer or an inverter by the synthesis tool run by SWEEP. Accordingly, few structural changes are induced, which can be extracted for the training of the SWEEP model. This is in contrast to other techniques broken by SWEEP, e.g., we observe that SWEEP was able to handle FLL (Fig. 12(b)) significantly better, with an average accuracy of 76.3%.444For FLL, there is a specific algorithm underlying to select the true and false wires connected to the MUX key-gates [JV-Tcomp-2013]. Depending on the key-bit, different fan-in cones will be fed to the MUX inputs (Fig. 12(b)), resulting in various synthesis-induced changes for those fan-in cones, which enable SWEEP to learn the correlation between the extracted features and the correct key-bit. Even if the selection of wires is randomized, the wrong key-bit assignments still result in larger fan-in cones/logic structures on average when compared to the correct key-bit assignment, as indicated in [alaql2019sweep].

Third, SWEEP was launched on RLL vs. UNSAIL-locked instances, to study the effect of adding UNSAIL key-gate structures on the performance of SWEEP. On average, UNSAIL degrades the performance of the attack by 15pp, which demonstrates that UNSAIL hardens locking against another ML-based attack, not only SAIL.

K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.35 0.18 0.24 0.06 0.41 0.26 0.36 0.28 0.34 0.31
c1355 0.04 0.06 0.20 0.14 0.16 0.07 0.18 0.09 0.36 0.21
c1908 0.13 0.14 0.16 0.12 0.20 0.12 0.20 0.15 0.39 0.17
c2670 0.41 0.23 0.38 0.24 0.29 0.17 0.34 0.17 0.44 0.23
c3540 0.33 0.28 0.37 0.25 0.38 0.29 0.31 0.27 0.52 0.30
c5315 0.41 0.19 0.48 0.27 0.50 0.27 0.48 0.22 0.62 0.27
c6288 0.25 0.20 0.30 0.27 0.27 0.20 0.36 0.22 0.42 0.27
c7552 0.27 0.12 0.44 0.17 0.37 0.20 0.41 0.23 0.43 0.29
Average 0.27 0.17 0.32 0.19 0.32 0.20 0.33 0.21 0.44 0.25
K=512
b14_C 0.39 0.28 0.50 0.30 0.46 0.27 0.47 0.24 0.53 0.35
b15_C 0.41 0.33 0.45 0.29 0.44 0.25 0.46 0.23 0.52 0.39
b20_C 0.37 0.38 0.50 0.30 0.44 0.28 0.44 0.32 0.50 0.40
b21_C 0.38 0.33 0.44 0.25 0.44 0.25 0.47 0.25 0.53 0.37
b22_C 0.38 0.29 0.48 0.27 0.47 0.30 0.44 0.22 0.53 0.40
b17_C 0.43 0.34 0.49 0.24 0.47 0.25 0.47 0.29 0.57 0.38
Average 0.39 0.33 0.48 0.27 0.45 0.27 0.46 0.26 0.53 0.38
TABLE XI: Accuracy of SWEEP Attack [alaql2019sweep] on RLL Vs. UNSAIL
Fig. 12: Example of logic locking using MUXes. The true path is denoted by green, while the false path is denoted by red. (a) MUX key-gate inserted by UNSAIL. The false wire is the negation of the true wire. (b) MUX key-gate inserted by traditional logic locking. The false wire is taken from the design.

Vii-E Redundancy Attack [li2019piercing] on UNSAIL Vs. RLL

We launched the redundancy attack on RLL and UNSAIL-locked ISCAS-85 and ITC-99 benchmarks for K=128. We demonstrate the percentages of deciphered key-bits in Table XII. In our analysis, we study the effect of different key-gates as well. The average attack accuracy on X(N)OR-based RLL-locked benchmarks is 42%. We observe that the attack is more successful on CL_v4-based RLL, with an average accuracy of 52%. Note that CL_v4 contains AND/OR key-gates where incorrect key-bits result in stuck-at-faults; such key-gates are more vulnerable to the redundancy attack.

The redundancy attack’s accuracy is reported as high as 79.59% on X(N)OR-based RLL-locked ISCAS-85 benchmarks in [li2019piercing], in contrast to the average accuracy of 25%, observed in our study for the same benchmarks. The degradation in accuracy observed here is contingent upon the re-synthesis step followed in our work. Note that industry-grade synthesis tools invoke redundancy checking and removal as an integral step of logic optimization [jiang2009logic, li2019piercing]. Thus, when the locked RLL benchmarks are re-synthesized, the synthesis tool removes redundancies in the netlist that could have been generated by incorrect key-bits. Consequently, we observe a low attack accuracy on both RLL and UNSAIL schemes.555In fact, we have launched the redundancy attack on X(N)OR-based RLL-locked benchmarks without re-synthesis (in BENCH format) and observed similar accuracy values as reported in [li2019piercing]. When comparing the resilience of baseline RLL with UNSAIL, we note that UNSAIL reduces the accuracy of the attack further by an average of 3.34pp across all locking variations.

K=128
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Insertion RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL RLL UNSAIL
c880 0.10 0.06 0.08 0.10 0.12 0.12 0.11 0.12 0.34 0.25
c1355 0.02 0.02 0.90 0.01 0.11 0.01 0.45 0.04 0.55 0.17
c1908 0.08 0.08 0.01 0.11 0.05 0.06 0.05 0.07 0.29 0.14
c2670 0.29 0.33 0.16 0.35 0.27 0.27 0.19 0.33 0.40 0.39
c3540 0.52 0.54 0.42 0.56 0.43 0.36 0.38 0.41 0.80 0.47
c5315 0.43 0.39 0.32 0.43 0.48 0.40 0.45 0.39 0.55 0.52
c6288 0.31 0.26 0.20 0.23 0.31 0.26 0.27 0.26 0.52 0.31
c7552 0.30 0.32 0.27 0.22 0.26 0.22 0.27 0.26 0.31 0.35
Average 0.26 0.25 0.29 0.25 0.25 0.21 0.27 0.23 0.47 0.32
K=128
b14 0.50 0.42 0.44 0.47 0.45 0.40 0.48 0.44 0.49 0.55
b15 0.63 0.59 0.60 0.57 0.52 0.61 0.50 0.55 0.64 0.58
b20 0.54 0.32 0.42 0.44 0.33 0.53 0.39 0.39 0.64 0.54
b21 0.52 0.42 0.42 0.37 0.43 0.39 0.41 0.48 0.51 0.52
b22 0.63 0.38 0.46 0.43 0.43 0.47 0.48 0.43 0.53 0.49
b17 0.73 0.75 0.53 0.59 0.54 0.62 0.60 0.65 0.60 0.65
Average 0.59 0.48 0.48 0.48 0.45 0.50 0.48 0.49 0.57 0.56
TABLE XII: Accuracy of Redundancy attack [li2019piercing] on UNSAIL Vs. RLL

Vii-F Hamming Distance (HD) and Output Error Rate (OER) Analysis on UNSAIL Vs. RLL

Next, we calculate the HD and OER between the original benchmark outputs and the outputs of the RLL and UNSAIL-locked instances by applying random keys. This is done to quantify the level of functional obfuscation for the logic locking techniques in general, independent of an actual attack. For each benchmark, 100 random keys are chosen, and the locked instances’ outputs are compared with the golden outputs by applying 10,000 random input patterns. The results are documented in Table XIII.

It is observed that, with an increase in key-size, HD increases for ISCAS-85 and ITC-99 benchmarks. This is intuitive, as an increase in key-size/number of key-inputs allows for a more widespread propagation of potentially false key-bit assignments (albeit that remains subject to the netlist structure) and, in turn, the possibility for more output corruption.

Comparing UNSAIL vs. RLL, RLL achieves a higher HD, by an average of 3.25pp. This is because UNSAIL inserts key-gates in specific locations that would affect the performance of the ML models used by SAIL but might not be effective in propagating the effect of faults to the outputs when incorrect key-bits are applied. Next, we also calculate OER for UNSAIL-locked benchmarks; ideally, OER should be 100%. The average OER obtained for UNSAIL on ISCAS-85 benchmarks (K=64 and K=128) is 99.95%, whereas the OER obtained for ITC-99 benchmarks with K=256 and K=512 is 100% for all the test cases.

RLL UNSAIL
Key-gates X(N)OR CL_v1 CL_v2 CL_v3 CL_v4 X(N)OR CL_v1 CL_v2 CL_v3 CL_v4
Key-size 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128
c880 0.28 0.34 0.29 0.38 0.26 0.37 0.25 0.37 0.29 0.29 0.18 0.39 0.28 0.40 0.24 0.36 0.30 0.34 0.29 0.37
c1355 0.33 0.46 0.22 0.46 0.32 0.43 0.33 0.43 0.29 0.44 0.24 0.38 0.16 0.32 0.19 0.36 0.25 0.39 0.28 0.34
c1908 0.33 0.43 0.32 0.43 0.30 0.44 0.29 0.40 0.30 0.33 0.23 0.35 0.28 0.35 0.27 0.39 0.33 0.32 0.23 0.35
c2670 0.09 0.11 0.09 0.12 0.11 0.14 0.06 0.13 0.11 0.12 0.10 0.11 0.09 0.11 0.09 0.14 0.08 0.12 0.08 0.13
c3540 0.35 0.44 0.30 0.36 0.30 0.40 0.34 0.39 0.31 0.40 0.27 0.43 0.31 0.34 0.27 0.33 0.29 0.35 0.29 0.37
c5315 0.17 0.21 0.15 0.22 0.15 0.20 0.14 0.22 0.15 0.19 0.11 0.18 0.11 0.15 0.14 0.17 0.12 0.16 0.10 0.15
c6288 0.35 0.42 0.36 0.41 0.35 0.45 0.37 0.45 0.33 0.43 0.36 0.37 0.33 0.39 0.36 0.35 0.32 0.40 0.33 0.36
c7552 0.14 0.21 0.16 0.19 0.15 0.18 0.13 0.16 0.13 0.19 0.10 0.13 0.12 0.16 0.10 0.17 0.17 0.16 0.14 0.15
Average 0.25 0.33 0.24 0.32 0.24 0.33 0.24 0.32 0.24 0.30 0.20 0.29 0.21 0.28 0.21 0.28 0.23 0.28 0.22 0.28
Key-size 256 512 256 512 256 512 256 512 256 512 256 512 256 512 256 512 256 512 256 512
b14_C 0.17 0.25 0.16 0.29 0.16 0.30 0.18 0.30 0.16 0.26 0.17 0.22 0.17 0.20 0.12 0.21 0.13 0.20 0.15 0.27
b15_C 0.15 0.23 0.14 0.24 0.14 0.22 0.13 0.24 0.12 0.18 0.09 0.18 0.12 0.15 0.11 0.19 0.12 0.15 0.08 0.18
b20_C 0.10 0.20 0.13 0.18 0.12 0.18 0.09 0.17 0.10 0.15 0.10 0.15 0.07 0.17 0.08 0.15 0.07 0.12 0.06 0.15
b21_C 0.12 0.17 0.12 0.20 0.10 0.19 0.12 0.17 0.10 0.16 0.08 0.15 0.07 0.16 0.11 0.14 0.08 0.15 0.07 0.14
b22_C 0.07 0.16 0.07 0.16 0.08 0.13 0.08 0.16 0.06 0.12 0.07 0.15 0.09 0.10 0.07 0.10 0.06 0.11 0.03 0.12
b17_C 0.05 0.10 0.05 0.09 0.05 0.08 0.05 0.10 0.05 0.09 0.04 0.09 0.03 0.08 0.02 0.07 0.03 0.07 0.03 0.06
Average 0.11 0.19 0.11 0.19 0.11 0.18 0.11 0.19 0.10 0.16 0.09 0.16 0.09 0.14 0.08 0.14 0.08 0.13 0.07 0.15
TABLE XIII: HD Results for RLL and UNSAIL Schemes on Selected ISCAS-85 Benchmarks Upon Applying 100 Random Keys and 10,000 Random Input Patterns for each Key Assignment

Vii-G UNSAIL Test and Fault Coverage

Here, we study the impact of UNSAIL on the testability of the overall design. We report the fault coverage and test coverage for locked benchmarks without key constraints, as recommended in [yasin_DATE_2016]. For context, we also obtain the coverage values for the original designs. Fault coverage represents the percentage of detected faults out of the total faults in the design while test coverage represents the percentage of the detected faults out of the detectable faults in the design [bushnell2004essentials].

The test coverage for the original ISCAS-85 and ITC-99 benchmarks is 100% for all benchmarks. The average fault coverage is 99.98% and 99.92% for ISCAS-85 and ITC-99 benchmarks, respectively. For UNSAIL-locked benchmarks, the test coverage remains at 100% for all locked benchmarks. Moreover, the average fault coverage is 99.71% and 99.96% for the locked ISCAS-85 and ITC-99 benchmarks, respectively. These experiments illustrate that UNSAIL does not negatively impact the testability of the underlying designs.

Vii-H Implementation Overheads

Obfuscation Time. To obfuscate a design using UNSAIL, the design must be initially locked using a traditional logic locking algorithm (e.g., RLL, FLL, etc.) and then synthesized using a synthesis tool of choice. The pre- and post-subgraphs must be extracted and compared, and the additional UNSAIL key-gates structures must be inserted. In our experiments, the extraction of subgraphs took less than a second for ISCAS-85 benchmarks for both K=64 and K=128, respectively. For ITC-99 benchmarks, the extraction of subgraphs took between 70s-138s for K=512. In case of locking the GPS module in the ORPSoC design, the extraction took on average 6.5h. Insertion of UNSAIL key-gates took less than a second for ISCAS-85 benchmarks, around 105s for the ITC-99 benchmarks, and on average 6h for the GPS module, which demonstrates the scalability of our approach also for large designs.

Layout Cost Incurred by UNSAIL. The area and power overheads for RLL and UNSAIL-locked instances of ITC-99 benchmarks are reported for the largest key-size of K=512. Area and power overheads for all the benchmarks have been obtained for an iso-performance layout implementation considering 5 ns timing constraint. Area overheads for RLL and UNSAIL-locked instances using X(N)OR key-gates are shown in Fig. 13(a) and Fig. 13(b), respectively; the related power overheads are illustrated in Fig. 14(a) and Fig. 14(b).

We note that UNSAIL increases the area overheads (by 0.37%5.79%) and power overheads (by 4.88%14.17%) compared to the baseline RLL, i.e., at iso-performance. This is because of two reasons playing out at the synthesis and physical-layout level. First, the synthesis tool is unconstrained after the insertion of UNSAIL key-gate structures. Thus, while more secure than RLL, the UNSAIL netlists are less optimized with regards to area and power. Second, as our physical-layout flow is optimized for timing closure, algorithms internally invoked by Cadence Innovus, like the insertion of buffers and/or upsizing of gates, as well as further re-routing of nets cause an increase in both area and power overheads. However, the cost for UNSAIL is amortized for large, million-gate-based designs, as illustrated next.

(a)
(b)
Fig. 13: Area overheads for selected ITC-99 benchmarks locked with K=512 at iso-performance of 200 MHz. (a) RLL with X(N)OR key-gates. (b) UNSAIL integrated with RLL using X(N)OR key-gates. Each box consists of 20

trials, the boxes span from the 5th to the 95th percentile, the whiskers indicate the minimum and maximum values, the red bars indicate the median, and the red dots represent outliers, respectively.

(a)
(b)
Fig. 14: Power overheads for selected ITC-99 benchmarks locked with K=512 at iso-performance of 200 MHz. (a) RLL with X(N)OR key-gates. (b) UNSAIL integrated with RLL using X(N)OR key-gates. The details regarding boxes are the same as in Fig. 13.

Vii-I Results on DARPA OpenCores Benchmark [CEP_github]

For the DARPA CEP benchmark [CEP_github], also known as ORPSoC, we lock the sensitive GPS module using X(N)OR locking. The SAIL RF classifier was trained and tested on both RLL and UNSAIL with K=512. The classification accuracy for the case of RLL with sub=3 is 77%, whereas, for the case of UNSAIL, it is 55%, i.e., only slightly better than random-guessing. For sub=6, UNSAIL still succeeds in lowering the classification accuracy, namely from 91% to 79%. The model is tested as well, and the key-gate recovery accuracy for RLL and UNSAIL is 73% and 53%, respectively. Once both models are combined and the full SAIL attack is launched, the key-gate detection accuracy for RLL vs. UNSAIL is reduced from 73% to 66%. The area and power overheads for UNSAIL using K=512 are 0.26% and 0.61%, respectively, for iso-performance at 100 MHz.

In summary, we demonstrated that UNSAIL is both effective and cost-efficient when protecting large designs against SAIL.

Viii Discussion

Viii-a Impact of Re-Synthesizing UNSAIL-locked Designs

Essentially, for logic locking using UNSAIL, several key-gate structures are added to confuse ML-based attacks. A defender would like to have those structures injected by UNSAIL ideally untouched. One might argue that an attacker could re-synthesize the UNSAIL-locked designs to remove the subterfuge added by our defense. However, doing so will only increase the complexity of SAIL, as explained next.

The goal of SAIL is to obtain the locked netlist before re-synthesis, let us call it netlist A. This locked netlist is re-synthesized for obfuscation, providing netlist B. Next, UNSAIL additionally locks the re-synthesized netlist, providing netlist C. If an attacker re-synthesizes the final locked design one more time, he/she will end up with netlist D. In such case, an even more powerful attack must be developed to revert all the changes, i.e., to go back from netlist .

Viii-B Impact of Layout Optimization on UNSAIL

We also compared the post-layout netlists to the pre-layout (i.e., post-synthesis) netlists, to investigate whether the UNSAIL key-gate structures are carried over or resolved by layout-level optimization techniques. On average, we note that 10% of all key-gate structures are affected, i.e., they go through some optimization. Still, we argue that such a transformation will not affect the overall resilience offered by UNSAIL. This is because an enhanced, yet-to-be-demonstrated two-step SAIL attack, capable of working on post-layout designs, would have to infer the additional changes incurred due to layout-level optimizations. Even when assuming such a powerful attack exists, the attacker would still be left only with the post-synthesis netlist, which remains protected by UNSAIL.

Viii-C Extended Threat Model

As discussed, various attacks have challenged logic locking while leveraging an oracle [chakraborty2019keynote]; this fact has resulted in broad efforts to protect against such oracle-guided attacks. However, recent ML-based, oracle-less attacks are considered more powerful, as they have shown to undermine the security promises of logic locking already during the early stages of the IC supply chain, without requiring an oracle.

In this work, we have proposed and demonstrated UNSAIL to protect logic locking against such potent oracle-less attacks. Recall that UNSAIL is compatible with any traditional locking scheme of choice. Given that traditional locking techniques are often incorporated with locking solutions resilient against SAT-based attacks, we argue that by integrating such a resilient scheme also with UNSAIL, the design could be protected from both oracle-guided and oracle-less attacks at once. Related efforts shall, however, remain scope for future work.

Note that pairing UNSAIL with a SAT-resilient technique would not compromise the resilience offered by UNSAIL against SAIL. This is because any SAT-resilient technique is independent and separate from the UNSAIL structures. Furthermore, SAT-resilient techniques differ significantly from traditional locking in terms of logic and structural properties. Thus, in its current form, one cannot readily apply SAIL to those resilient techniques, and it remains to be seen if SAIL could be tailored for such SAT-resilient techniques at all.

Ix Conclusion

In this work, we initially implemented a reference platform for the SAIL attack, to thoroughly investigate the security of logic locking against such an oracle-less

machine learning (ML)-based attack. For the first time, our study considers various key-sizes, key-gate structures, and key-gate insertion heuristics. Among others, we find that compound logic locking, i.e., where various key-gate structures are randomly selected and used at once, tend to be more resilient. Second, we presented a defense mechanism called

UNSAIL, which targets specifically at the training stage of such an ML-based attack. The presented defense can be integrated with any combinational logic locking scheme, and we have considered various traditional logic locking schemes toward that end.

UNSAIL serves to confuse the SAIL models by introducing additional structural transformations that these models cannot distinguish from regular ones (i.e., those introduced by synthesis tools for the sake of obfuscation, as is common practice with logic locking). We have initially motivated the notion of UNSAIL using Fisher’s discriminant ratio, which demonstrated more complex classification problems for SAIL in particular and classification-based attacks in general. Besides SAIL, we show that our defense can hinder another potent oracle-less, ML-based attack, called SWEEP. For both SAIL and SWEEP, we have performed a thorough evaluation when different attack models and configurations are utilized. Reflection of the results argues that UNSAIL degrades the accuracy of all the stages/models of SAIL, achieving an overall reduction of attack accuracy of 11 percentage points (pp); UNSAIL also degrades the performance of the SWEEP attack by an average of 15pp; all while inducing only marginal layout overheads. We have demonstrated that UNSAIL is further capable of thwarting non-ML-based oracle-less attacks, i.e., the redundancy attack specifically, which can recover the key-bits of UNSAIL-locked designs only with a low accuracy of 38% on average. Finally, UNSAIL-locked designs can be activated post-testing, ensuring high fault coverage and test quality all while additionally offering protection from an untrusted test facility.

References