Are Adversarial Perturbations a Showstopper for ML-Based CAD? A Case Study on CNN-Based Lithographic Hotspot Detection

06/25/2019 ∙ by Kang Liu, et al. ∙ NYU college The Chinese University of Hong Kong 1

There is substantial interest in the use of machine learning (ML) based techniques throughout the electronic computer-aided design (CAD) flow, particularly those based on deep learning. However, while deep learning methods have surpassed state-of-the-art performance in several applications, they have exhibited intrinsic susceptibility to adversarial perturbations --- small but deliberate alterations to the input of a neural network, precipitating incorrect predictions. In this paper, we seek to investigate whether adversarial perturbations pose risks to ML-based CAD tools, and if so, how these risks can be mitigated. To this end, we use a motivating case study of lithographic hotspot detection, for which convolutional neural networks (CNN) have shown great promise. In this context, we show the first adversarial perturbation attacks on state-of-the-art CNN-based hotspot detectors; specifically, we show that small (on average 0.5 preserving and design-constraint satisfying changes to a layout can nonetheless trick a CNN-based hotspot detector into predicting the modified layout as hotspot free (with up to 99.7 strategy to improve the robustness of CNN-based hotspot detection and show that this strategy significantly improves robustness (by a factor of 3) against adversarial attacks without compromising classification accuracy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 12

page 13

page 17

page 18

page 19

page 28

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Electronic system design flows provide several optimization and verification challenges as the scale and complexity of designs increases, placing a higher pressure on designers to deliver timely results. There is substantial interest in using machine learning (ML) techniques for solving hard electronic computer-aided design (CAD) problems ranging from logic synthesis to physical design and design for manufacturability (DFM) (Moore, 2018). A promised outcome of deep learning enhanced design flows is a faster and scalable development cycle, enabled by improvements in time-consuming steps of design space exploration (Greathouse and Loh, 2018), logic optimization (Yu et al., 2018) and lithographic analysis (Yang et al., 2018b).

Nonetheless, while deep learning methods have surpassed state-of-of-the-art performance on a wide range of applications, they have been shown to be brittle against adversarial perturbations (Goodfellow et al., 2015). Adversarial perturbations are small, imperceptible but targeted modifications to the input of the deep neural network, resulting in incorrect behavior. For example, Fig. 1 shows an image of a horse from the CIFAR-10 dataset (Krizhevsky, 2009)

— each of the the subsequent four images are adversarially perturbed versions of the first that are classified as

airplane, automobile, bird and cat, respectively. As noted earlier, the perturbations are so small that they are imperceptible.

Figure 1. A ”clean” image of a horse (leftmost) and adversarial images with corresponding prediction labels. The adversarial perturbations are so minute as to appear imperceptible.

Adversarial perturbations have been demonstrated in practically every application in which deep networks are used (Biggio and Roli, 2018), and have raised fundamental questions about the ability of deep neural networks to generalize. This leads to a natural question: what are the implications of adversarial perturbations on the security, soundness, and robustness of deep learning techniques in CAD-related problems? While the CAD domain presents a challenge for adversaries, given the domain-specific knowledge required to perform stealthy (and meaningful) attacks, it is crucial to investigate whether adversarial perturbations pose a showstopping threat to this innovation.

As a motivating example, we study the challenging CAD problem of lithographic layout hotspot detection. In physical design of an integrated circuit (IC), layout patterns are etched into silicon using optical lithography. Due to lithographic process variations, specific patterns are susceptible to manufacturing errors; these hotspots need to be detected and fixed early in the IC design flow to avoid yield loss. The conventional approach to hotspot detection is physics-based optical lithography simulations. While accurate, they are time consuming and computationally expensive for the full IC. Noting that one can pose hotspot detection as image classification, recent work has proposed adoption of convolutional neural networks (CNN) for this problem, achieving state-of-the-art results (Yang et al., 2018b). Once hotspots are detected, resolution enhancement techniques (RETs) such as optical proximity correction (OPC) and the insertion of sub-resolution assist features (SRAFs) can enhance IC layouts. Changes are verified using further lithography simulations, and iterated upon as required.

Now consider the following scenario where a designer is considering the purchase of a 3rd-party macro for their IC design. The designer wants to check the quality of the macro and has the IC layout images for verification. Using a CNN-based hotspot detector, the designer can quickly ascertain if the IC layout is printable as-is, and gauge the potential effort needed to correct any design flaws. To pass off a sub-par design as high quality, the 3rd-party vendor selectively modifies the layout to force the detector to misclassify hotspot regions as non-hotspot. In other words, the attacker hides hotspots by exploiting properties of the CNN — identifying and taking advantage of the susceptibility of CNNs to adversarial perturbations. However, malicious insertion is non-trivial. Unlike image perturbations that involve adding imperceptible noise (Biggio and Roli, 2018), the attacker must add semantically meaningful and realistic IC layout features to the design that pass design rule check (DRC), such as respecting spacing constraints. Successful attacks can have a significant impact: this sabotage can propagate undetected manufacturability issues, causing downstream reductions in IC yield and wasted designer effort.

With lithographic hotspot detection as our motivating case study, we investigate, for the first time, a targeted attack on deep learning based CAD tools, to demonstrate the feasibility, challenges, and potential security implications for the CAD community. Our empirical findings highlight the need to study the underlying mechanics and to give wider consideration of the security and robustness implications of integrating ML-based tools into the IC design flow. Our contributions are threefold:

  • The first exploration of the impact of adversarial perturbations on deep neural network based CAD tools using IC lithographic hotspot detection as a case study.

  • Comprehensive evaluation of two attack scenarios on CNN-based hotspot detectors: (1) white-box attacks, wherein the attacker has access to the model parameters of the detector and (2) black-box attacks, wherein the attacker has access only to the outputs of the detector.

  • Exploration of adversarial retraining as a defense against adversarial perturbation attacks, yielding an equally accurate but robustified CNN for hotspot detection.

The rest of the paper is organized as follows. We explore the motivations for adopting deep learning in hotspot detection, outlining the motivations and goals of a potential attacker (Section 2). This is followed by technical preliminaries to understand the principles of CNNs and the notion of adversarial perturbations (Section 3). In this work, our study centers around two CNN-based hotspot detectors and we detail the architecture and training of these detectors (Section 4). Following this, we describe the attack methodologies, detailing how adversarial IC layouts can be generated to effectively hide the presence of hotspots (Section 5). The first attack is a white-box attack, where the internals of the detector are available to the attacker. We then consider a more conservative attack, where the attacker can only query a black-box model. We verify, via lithography simulations, that a vast majority of adversarially perturbed IC layouts are still hotspots but are not picked up as such by the hotspot detectors (Section 6). Given the high success rate of our attack experiments, we propose robust retraining — a promising defense against adversarial attacks — presenting encouraging results (Section 7). Our findings pose interesting questions that we then discuss (Section 8). To contextualize our work, we present related literature in deep learning for CAD and adversarial attacks (Section 9). Ultimately, we conclude from this study that one should be aware of the limitations of using deep learning in CAD, and also be encouraged to investigate and adopt proactive countermeasures (Section 10).

2. Motivation

2.1. Deep Learning for Hotspot Detection

2.1.1. Lithographic Hotspot Detection

In advanced technology nodes the layout feature sizes are much smaller than the light wavelengths used in the optical lithography systems. As a result, complex interactions between light patterns in lithography have made printed patterns sensitive to process variations. This has increased challenges in IC back-end design and sign-off flows. Lithography induces defects due to phenomena such as diffraction, resulting in lithographic hotspots (Yang et al., 2018b, 2019; Reddy et al., 2018).

Consider Fig. 2(a), which shows an IC layout containing two vias colored green. If this layout were printed as is, the resulting printed output would be unsatisfactory. Only a small region of the desired vias is printed — shown in purple in Fig. 2(b). Thus, resolution enhancement techniques (RETs) such as sub-resolution assist features (SRAF) (Geng et al., 2019) and optical proximity correction (OPC) (Yang et al., 2018a) have been proposed to ease IC layout manufacturability; they aim to compensate for distortion during lithography. Fig. 2(c) shows the effect of SRAF insertion. The printed pattern more accurately reflects the required pattern (Fig. 2(d)). However, even when equipped with rigorous RETs, the layout can have hotspots due to unpredictable lithography process variations. Therefore, it is vital to spot potential hotspots before manufacturing and correct them either by using RET or by re-design.

2.1.2. Deep Learning Based Hotspot Detection

In light of the prohibitive run-time of lithographic simulation, recent work has sought to speed-up hotspot detection using pattern matching

(Yu et al., 2012) and machine learning (Matsunawa et al., 2015; Yang et al., 2017). Pattern matching methods find similar or identical hotspot-causing patterns in a new design from a library of known hotspots. These techniques are both fast and accurate if the patterns are similar to those in the library, but cannot find previously unseen hotspot patterns. In contrast, machine learning solutions seek to capture the underlying physics of lithographic simulation (i.e., the relationships between IC layout features and their manufacturability) and, as such, generalize to unseen patterns (or at least that has been the hope). Recent advancements on CNN based hotspot detection (Yang et al., 2018b, 2019) have shown that both shallow and deep CNNs are more accurate compared to legacy machine learning based and pattern matching based techniques.

(a) Layout with vias only
(b) Lithography simulation of layout with vias only
(c) Layout with vias and SRAFs
(d) Lithography simulation of layout with vias and SRAFs
Figure 2. Illustration of lithography simulation results of layouts with vias only, and both vias and SRAFs.

2.2. Threat Model

2.2.1. Setting

To motivate our work in examining the security and robustness of deep learning in CAD, we explore the scenario of a designer considering the purchase of a macro from a 3rd party IP vendor. The 3rd party IP vendor distributes hard macros in GDSII format (Company, 1987), where the circuit is laid out and allegedly enhanced for lithography using RETs. As part of the validation process, the designer does a ”sanity-check” on the macro to establish its quality by using a CNN-based hotspot detector (which may be a commercial tool in a local or cloud setting).

2.2.2. Attack Goals

The vendor aims to sell low-quality hard macros, either to make a profit from their design short-cuts or to sabotage the designer (by forcing them to waste time and resources in rectifying poor designs). To achieve this aim, the attacker’s goal is, therefore, to fool the target CNN-based hardware detector into classifying hotspots as non-hotspot. This should be achieved by making the smallest changes to the layouts as possible. In this work, we investigate SRAF insertion as an RET; consequently, the aim of the attacker is to insert as few SRAFs as possible.

2.2.3. Attacker Capabilities

In the context of deep learning, the attacker capabilities can be defined based on the amount of information they possess about the network under attack. This includes information about the network’s hyper-parameters: its overall architecture, its weights and biases, the training algorithms and training data, etc. For this case study, we consider two scenarios: (1) an attacker with white-box access, where they have full knowledge of the CNN, including its network architecture, weights and biases; and (2) an attacker with only black-box access, where they are able to query the detector, receiving both output classification as well as the accompanying prediction confidence. Both models have been studied in prior work (Goodfellow et al., 2015; Papernot et al., 2017; Liu et al., 2017; Biggio and Roli, 2018).

3. Deep Learning Preliminaries

To appreciate the potential of deep learning for CAD problems such as hotspot detection, we present relevant technical preliminaries for CNNs and adversarial perturbations.

3.1. CNN Basics

A CNN features an input layer, a number of hidden layers, and an output layer. The CNN takes in some input (e.g., an image) and propagates the data through a series of linear and non-linear operations (akin to convolution and activation of ”neurons”). After all the input has been transformed by each of the hidden layers, the final output produces a classification prediction for the input. The CNN is ”trained” by configuring the parameters of the filters in each layer (the weights).

We can express this formally as follows. A neural network is defined as a function that takes input and gives output , such that . For an -class classifier, we define as an array, , where

is the prediction probability of class

, . The network output is subject to the constraints: , and . The label of input takes the output class with the highest prediction probability, such that

. A deep neural network classifier has multiple layers of neurons, the last being a softmax layer. Hence, the neural network can be expressed as:

(1)

where

(2)

Here

is the activation function of layer

, and is the model weights and is the bias. Some common choices of activation function

include logistic, tanh and ReLU

(Nair and Hinton, 2010). In an image classification neural network, input is either a grey-scale image with one channel or an RGB image with three channels, where each channel of pixel takes integer values from .

3.2. Adversarial Perturbations

The existence of adversarial inputs for classification using neural networks was first described by Szegedy et al. (Szegedy et al., 2014). They observed a phenomenon whereby neural networks would change its output prediction based on imperceptible perturbations in the input. In these cases, while the network would be ”fooled”, a human would not be ”fooled”. This property can be exploited by an adversary, whereby inputs can be crafted to fool a target network and cause misclassification.

Formally, let denote the true label of a clean input , and denote the prediction label of given by the neural network. The adversary aims to generate an adversarial input close to , and mislead the network to output a target label , while . The difference between and is measured by a distance metric and constrained by a constant , such that . Normally is so small to be perceptual to human eyes and should not change the prediction label from to .

In non-targeted attacks, adversaries search for adversarial inputs as long as its output label . In targeted attacks, the target label is pre-defined by the adversary, and could be quite distinct than . There are several schemes for crafting adversarial perturbations. Our work is inspired by the following methods that have been explored in a general adversarial perturbation context.

3.2.1. Basic fast gradient sign (FGS) method

Goodfellow et al. (Goodfellow et al., 2015) proposed the FGS method for adversarial input generation. For non-targeted attacks, starting with a clean input

, the adversary moves each pixel in the opposite direction of the gradient of the loss function of the true label with respect to

. The goal is to mislead the network into outputting any label other than the true label. The non-targeted FGS attack can be described mathematically as follows:

(3)

is a small constraint scalar, is the loss function and ensures pixel values fall in the desired range.

On the other hand, in a targeted attack, the adversary seeks to fool the network into misclassifying as a specific target label. This is achieved by altering pixels in the direction of the gradient of the loss function of the target label with respect to . The attack is described by Equation 4.

(4)

These two attacks emphasize computational efficiency and speed at the expense of introducing relatively large perturbations. Sophisticated techniques that seek to find the smallest possible perturbation, albeit at greater computational expense, have subsequently been proposed (Kurakin et al., 2016) — one such attack is described next.

3.2.2. Iterative fast gradient sign (IFGS) methods

IFGS methods operate over multiple iterations, adding relatively small perturbations in each (Kurakin et al., 2016). As such, IFGS methods can generate adversarial inputs with smaller distortion when compared to basic FGS. Equation 5 and Equation 6 describe the updates performed by the non-targeted and targeted versions of IFGS in each iteration. As an example, the adversarial perturbations in Fig. 1 were generated by the IFGS method.

(5)
(6)

3.2.3. Semantically meaningful perturbations

Another body of work has focused on semantically meaningful perturbations. For instance, specially crafted stickers affixed to traffic signs can mislead traffic sign classifiers (Eykholt et al., 2018). These perturbations are not imperceptible, in fact, quite the opposite, they are easily spotted, but are designed to seem innocuous. For instance, a human is unlikely to think that a small sticker on a traffic sign indicates an adversarial attack. Our work crafts such perceptible but semantically meaningful perturbations. However, the notion of what is semantically meaningful is informed by the underlying domain of lithography.

4. Case Study: IC Lithographic Hotspot Detection

In this work we use two different CNN-based hotspot detectors to explore our proposed attacks. They are trained using the same dataset and act as targets for adversarial perturbations. This section describes details of our dataset, the network architectures, and the training process. Our case study draws heavily from prior work (Yang et al., 2018b).

4.1. Layout Dataset

Existing datasets for lithographic hotspot detection, for instance, the widely used ICCAD ’12 contest dataset (Torres, 2012), do not come with much of the information required to verify the success of adversarial attacks. For instance, the ICCAD ’12 data specifies neither design rules nor does it specify lithography simulation parameters. Therefore, for this case study, we prepared our own layout dataset comprising of 10403 layout clips stored in the GDSII format. We targeted the detection of lithographic hotspots for via layers using SRAF-based RET. To create the large number of layout samples, we generated the via patterns in the following manner:

  1. Within each clip region () we place lower layer metal gratings with fixed wire critical dimension (CD) and pitch;

  2. We add an upper metal layer with preset CD and spacing constraints;

  3. The cross regions between two metal layers become candidates for via placement — we place vias stochastically with a given probability;

  4. Finally, vias that violate design rules are removed. In this dataset, we use vias sized 70 nm 70 nm and enforce a minimum via spacing of 70 nm.

(a) Example Layout
(b) Edge Placement Error (EPE)
Figure 3. (a) Example of a layout with vias (in green), SRAFs (in white), and the forbidden areas (striped region). (b) Illustration of edge placement error (EPE). Each target via pattern has four measure points (one at the center of each edge). The EPE is the perpendicular displacement from the measure point to the corresponding printed image (contour).

Once the ”raw” layouts are produced, we perform optical proximity correction (OPC) and lithography simulation using Mentor Calibre (Graphics, 2019) to insert SRAFs; we set the allowable SRAF region to a 100 nm – 500 nm city-block distance. An example of a layout clip is shown in Fig. (a). Next, we determine the ground truth hotspot/non-hotspot labels for the layouts. In this work, we use the edge placement error (EPE) as our metric for determining the quality of the printed patterns. Each via pattern in a layout is associated with four measure points, with one point at the center of each edge. The EPE is defined as the perpendicular displacement from the measure point to the corresponding printed contour, as illustrated in Fig. (b). A layout is identified as a hotspot layout if there exists any measure points with the EPE greater than 2 nm, as in typical industrial settings.

As shown in Fig. (a), the layouts we produced have three key features: vias (the desired pattern to be printed), SRAFs (used to improve printability), and forbidden regions (where SRAFs should not be placed). Each via is surrounded by a square forbidden region whose edges are 100 nm from the via’s edges. The GDSII files contain three layers of interest: (1) a via layer, (2) an SRAF layer, and (3) a forbidden region layer.

4.2. Design of CNN-based hotspot detectors

Using the layout dataset we trained two different CNN-based hotspot detectors to represent networks of different complexity, adopting procedures described in prior work (Yang et al., 2018b). The parameters are shown in Table 1 and Table 2.

  • Network A is a smaller 9-layer network that is fast(er) to train and for prediction. We observed that further increasing network depth/complexity did not increase accuracy; i.e., Network A is ”right-sized” for accuracy.

  • Network B is a larger 15-layer network that is slower to train, but is potentially less susceptible to attack as the complex architecture learns sophisticated features for hotspot detection. Prior work on adversarial robustness suggests that deeper, complex network are more resilient to attack (Madry et al., 2017).

4.2.1. Data Preprocessing

The dimension of the GDSII layouts is 2000 nm 2000 nm, which can be represented as 2000 pixel 2000 pixel binary-valued images, where all the layers are flattened. Layout polygons are represented with pixel intensity of 255 and the background is represented with a pixel intensity of 0. For training and inference we scale the layout image by a factor of 255 so that all the pixel intensities are either 1 or 0.

Training a CNN on large images requires significant computation resources and time. Therefore, as proposed in (Yang et al., 2018b), we compute a discrete cosine transformation (DCT) on each image to extract its features as input for the networks. The equation for DCT is shown in Equation 7.

(7)

Here and are the horizontal and vertical coordinates of the image pixels, and and are the width and height of the image. and represent the horizontal and vertical coordinates of the DCT coefficients. To reduce the image dimensions, we perform DCT on non-overlapping 100 pixel

100 pixel sub-blocks on each layout image (with a 100 pixel stride), and then keep a selection of DCT coefficients. For Network A, we keep the coefficients of the 32 lowest frequencies, producing inputs of size (20, 20, 32). For Network B, we keep the coefficients of the 36 lowest frequencies (i.e., more information for the larger network), producing inputs of size (20, 20, 36). This speeds up training with low loss of information without affecting network performance.

Layer Kernel Size Stride Output Size
input - - (20, 20, 32)
conv1_1 3 1 (20, 20, 16)
conv1_2 3 1 (20, 20, 16)
maxpooling1 2 2 (10 , 10, 16)
conv2_1 3 1 (10, 10, 32)
conv2_2 3 1 (10, 10, 32)
maxpooling2 2 2 (5, 5, 32)
fc1 - - 250
fc2 - - 2
Table 1. Architecture of Network A.
Layer Kernel Size Stride Output Size
input - - (20, 20, 36)
conv1_1 3 1 (20, 20, 16)
conv1_2 3 1 (20, 20, 16)
conv1_3 3 1 (20, 20, 16)
maxpooling1 2 2 (10, 10, 16)
conv2_1 3 1 (10, 10, 32)
conv2_2 3 1 (10, 10, 32)
conv2_3 3 1 (10, 10, 32)
maxpooling2 2 2 (5, 5, 32)
conv3_1 3 1 (5, 5, 64)
conv3_2 3 1 (5, 5, 64)
conv3_3 3 1 (5, 5, 64)
maxpooling3 2 2 (3, 3, 64)
fc1 - - 500
fc2 - - 2
Table 2. Architecture of Network B.

4.2.2. Training

We train both networks using the same layout dataset. We randomly split 10403 layout images into 8000 training images and 2403 test images, where the training data consists of 2774 hotspot and 5226 non-hotspot images, and test data has 841 hotspot and 1562 non-hotspot images. To compensate for data imbalance, we incorporate class weights to weigh the loss function during training, which tells the model to ”pay more attention” to samples from an under-represented class (He and Garcia, 2008)

. This is done for both networks to achieve a balanced hotspot and non-hotspot detection accuracy. We implement network training with the Keras library

(Chollet et al., 2015), and use the ADAM optimizer (Kingma and Ba, 2014)

for loss minimization. The confusion matrix is shown in

Table 3. Our training of the baseline networks follows the same methodology as in prior work (Yang et al., 2018b). Although Network A and Network B have the same overall111Overall accuracy is defined as the average of non-hotspot classification accuracy and hotspot classification accuracy. and hotspot prediction accuracy, we seek to explore the robustness of both networks with different depths/complexity.

Prediction
Network A Network B
non-hotspot hotspot non-hotspot hotspot
Condition non-hotspot 0.72 0.28 0.72 0.28
hotspot 0.29 0.71 0.28 0.72
Table 3. Confusion matrix of networks A and B.

5. Proposed Attack Methodologies

5.1. Overview

We propose attack methodologies for modifying layouts with hotspots such that they fool the CNN-based hotspot detector into misclassifing layouts as non-hotspot. We experiment with two attack types: a white-box attack, where the attacker has full access to the internal details (weights, architecture, etc.) of the hotspot detector, and a black-box attack, where the attacker can only query the detector to receive the output prediction and associated confidence. During the attack, the attacker aims to fool the target detector by modifying layouts in a semantically meaningful way. This means that the attacker cannot alter the IC layout by moving via locations as this may change design functionality; in this attack we only add SRAFs to the layout. Further, the modifications must be small and innocuous, for instance, by only using shapes that already exist in the layout dataset. Finally, the perturbations should not introduce DRC violations. Based on these considerations, our perturbations must satisfy the following constraints:

  1. Insertion Constraint: Maliciously-inserted SRAFs can only be added to the SRAF layer.

  2. Shape Constraint: Maliciously-inserted SRAFs should be rectangles, with a fixed width of 40 nm. The height can be selected within 40 nm – 90 nm, at a resolution of 1 nm. The SRAF can be placed either horizontally or vertically.

  3. Spacing Constraint: The Euclidean distance between any two SRAFs should be at least 40 nm.

  4. Forbidden Zone Constraint: Maliciously-inserted SRAFs cannot overlap with the forbidden region in a layout.

For simplicity, the our attack evaluation involves adding 40 nm wide SRAFs with the following height options: 40, 50, 60, 70, 80 or 90 nm, all placed horizontally.

5.2. White-box Attack

In the white-box attack, the attacker knows the internal details of the target CNN based hotspot detector, and exploits this as part of the attack. We propose a gradient-guided approach to generate adversarial layouts, inspired by the fast gradient sign approach (Goodfellow et al., 2015) (explained in Section 3). Since the baseline hotspot detection networks (in Subsection 4.2) take DCT coefficients as inputs, a naïve attack would need to modify these coefficients and then perform inverse-DCT to produce adversarially perturbed layouts. There are at least three reasons why this naïve approach is infeasible:

  1. There is not enough data to reconstruct any layout without information loss, since the input DCT coefficients used for inference are only the low frequency components.

  2. There is no guarantee that modifications of DCT coefficients, when reflected back to layout images, satisfy the attack constraints above.

  3. It is challenging to modify DCT coefficients that result in an exact 0 1 change in layout image pixels, as the images are binary-valued.

Figure 4. Illustration of end-to-end hotspot detection with DCT implemented as a convolutional layer.

5.2.1. DCT as a convolution layer in the network

Our solution to this problem is to implement the DCT computation as a convolution layer of the neural network such that the combined network works in an end-to-end fashion. The network takes in layout images as inputs; this allows us to perturb image pixels while also incorporating the attack constraints. This idea comes from Equation 7, where we observed that the summation and element-wise product can be realized directly as a convolution layer of the CNN (without adding bias). The weights of a DCT filter for calculating the -th DCT coefficient is obtained as shown in Equation 8:

(8)

Here and are the horizontal and vertical coordinates of the -th DCT coefficient, and and are the horizontal and vertical coordinates of each weight of the filter. and are the width and height of the filter. Since the DCT computation operates on 100  100 sub-blocks of each image, the DCT convolution layer will have filters of size (100, 100) and strides of 100. We illustrate this end-to-end network that combines the DCT computation and hotspot detection in Fig. 4.

5.2.2. Attack Process

With this new end-to-end network, the attacker can now explore the gradients of the network in terms of the image and guide placement of SRAFs to positions that have the highest impact on the network output prediction (shifting from hotspot to non-hotspot). We define the loss of the attack as the distance between the prediction probability of perturbed hotspot layout and ideal non-hotspot layout (i.e., a layout with perfect prediction probability of [1, 0]). This is represented as Equation 9 where is the probability that a layout is classified as hotspot.

(9)

The attacker aims to keep minimizing the loss as they choose and add perturbations (i.e, SRAFs) iteratively, until at some point the perturbed layout image is predicted as non-hotspot. As an attack parameter the attacker can choose a maximum number of SRAFs to insert, . The detailed algorithm of the white-box attack is shown in Algorithm 1.

To perform the attack, we first calculate the gradient of the loss function with respect to each pixel (line 4 in Algorithm 1). This gradient represents the amount of ”influence” that a given pixel has on the final network prediction. However, since we are modifying blocks of image pixels instead of a singe pixel (when we insert SRAFs), we sum the gradients of a potential perturbation block at each potential insertion location (line 5-10 in Algorithm 1). We illustrate this concept as

1
in Fig. 5. This represents the ”influence” that a perturbation has on the final network prediction when it is inserted in that location. Specifically, we are changing blocks of pixel values in the positive direction from 0 to 1, so the image block that has the largest negative sum of gradients will have the most influence in minimizing the loss.

However, these gradient sums only reflect the influence for a small change in the input. As we are shifting pixel values with a relatively large step (i.e., from 0 to 1) there is no guarantee that the largest negative sum of gradients will still have the most significant influence. Therefore, instead of picking the SRAF insertion that has the largest negative sum of gradients, we query the CNN for top- candidate perturbation blocks with the largest negative sum of gradients (line 11-14 in Algorithm 1). We refer to this as the attacker-specified check parameter. Of these candidates, we pick the one that has the largest influence (i.e., that fools the network toward predicting a hotspot as non-hotspot) (line 15 in Algorithm 1).

Our strategy for finding candidate SRAF insertion (

2
in Fig. 5) is as follows. We define the location of SRAFs by the coordinate of its top-left corner point. We slide the SRAF over the center region of the layout (we leave a 200 nm boundary on each side of the layout image) with horizontal and vertical stride of 40 nm. This forms all the possible locations for potential perturbation addition. However, if any part of a location and its surrounding 40 nm has an existing pixel value of 1 (i.e., it is already occupied with an SRAF), or overlaps with any forbidden region, this location is marked as invalid for SRAFs. We set the loss of this location/shape pair to be (line 10 in Algorithm 1). In this way, we ensure that inserted SRAFs satisfy the attack constraints.

Figure 5. Illustration of the white-box attack process with one SRAF insertion.
1:Input: original hotspot image , white-box network function with DCT convolutional layer at the bottom, loss function , image pixel indexing function , perturbation pattern set , surrounding spacing , maximum number of perturbation addition , check parameter .
2:
3:while  do
4:     compute image gradient
5:     for  in  do
6:          for each position in  do
7:               if pixel values of at and its surrounding area  then
8:                     Sum gradients of the area.
9:               else
10:                                                   
11:     for  = 1 to  do
12:          Get and of the -th smallest element of
13:          perturbed image and set
14:          compute loss      
15:      and set Insert perturbation.
16:
17:     if  is hotspot then
18:          
19:     else if  is non-hotspot then
20:          Return: Adversarial non-hotspot layout generated.      
21:Return: Otherwise, failed to generate non-hotspot layout within attacker-specified bound.
Algorithm 1 White-box Attack

If the constraints are all satisfied, we consider this location to be valid. If the sum of the gradients at this location is one of the largest negative sums, we compute the loss for this layout image with the hypothetically inserted SRAF shape at this location (shown as

3
of Fig. 5). Since the attacker has the flexibility to add six different shapes of SRAFs (width varies from 40, 50, 60, 70, 80 to 90 nm), they will iterate the gradient summation on all the possible locations for each of these perturbation shapes. In each iteration of adding one SRAF, one of the 6 shapes is added to the current layout such that it yields the lowest prediction probability for hotspot.

The algorithm stops either when the network predicts the perturbed layout as non-hotspot (hotspot prediction probability 0.5), or when the number of inserted SRAFs has reached the maximum allowance and no adversarial non-hotspot layout is generated (line 16-20 in Algorithm 1).

5.3. Black-Box Attack

This attack explores the case where an attacker has less knowledge of the target network. With a black-box access to the network, the attacker can query the hotspot detector with computed DCT coefficients of a layout to obtain the output prediction probability. We illustrate the attack in Fig. 6. Details of the black-box algorithm are shown in Algorithm 2.

At a high-level, the black-box attack iteratively queries the detector with different SRAF shape and insertion location combinations. The attacker first adds a single SRAF. The attacker exhaustively examines all the possible valid locations for each valid SRAF shape (

1
in Fig. 6), and queries the network with DCT coefficients of the candidate modified layout (

2
in Fig. 6, line 4-10 in Algorithm 2). The location and perturbation that has the minimum loss is selected, using the same loss function as in Equation 9 (

3
in Fig. 6, line 11 in Algorithm 2). Further SRAFs are added in the same way. Like the white-box attack, the algorithm terminates either by returning a successful adversarial non-hotspot layout, or fails to produce an adversarial layout within the specified maximum number of inserted SRAFs.

Figure 6. Illustration of the black-box attack process with one SRAF insertion.
1:Input: original hotspot image , DCT computation function , black-box network function , loss function , image pixel indexing function , perturbation pattern set , surrounding spacing , maximum number of perturbation additions .
2:
3:while  do
4:     for  in  do
5:          for each position in  do
6:               if pixel values of at and its surrounding area  then
7:                    perturbed image and set
8:                    compute loss
9:               else
10:                                                   
11:      and set Insert perturbation.
12:
13:     if  is hotspot then
14:          
15:     else if  is non-hotspot then
16:          Return: Adversarial non-hotspot layout generated.      
17:Return: Otherwise, failed to generate non-hotspot layout within attacker-specified bound.
Algorithm 2 Black-box Attack

6. Empirical Evaluation of Attack Success

6.1. Experimental Setup

To investigate the implications of our proposals, as well as study the relationship between the attack efficacy and the target networks, we performed white-box and black-box attacks on both CNN-based hotspot detection networks. We run all experiments on a desktop computer with Intel CPU i9-7920X (12 cores, 2.90 GHz) and single Nvidia GeForce GTX 1080 Ti GPU.

For the white-box attack, we conduct adversarial non-hotspot layout generation on 500 correctly classified hotspot layouts from the validation set (i.e., the layouts that were not used for training). As the black-box attack takes longer to perform, we generate adversarial non-hotspot layouts for 150 hotspot layouts. The attack success rate is the percentage of hotspot layouts that were perturbed such that they are misclassified as non-hotspot by the CNN. Across all the experiments in this section, we consider a layout to be hotspot if the network prediction probability for hotspot is 0.5. The average attack time is the end-to-end time (including querying the hotspot detector). We limit the maximum number of adversarial SRAF additions to 20, and the check parameter in the white-box attack is 180222this corresponds to ~11% of the total number of possible SRAF insertion candidates on average.. We illustrate a selection of attack outputs and their corresponding verification results in Fig. 8 and Fig. 9. We present a summary of the results in Table 4.

Attack White-box Black-box
(l)2-3 (l)4-5 Network A B A B
Attack success rate 99.7% 85.5% 99.7% 93.3%
Average attack time per layout 8.6 s 45.1 s 350.5 s 677.3 s
Average number of SRAFs added 5.3 8.3 4.1 7.3
Average area of SRAFs added 0.3% 0.5% 0.3% 0.5%
Table 4. Summary of attack results for white-box and black-box algorithms. For both attacks, the maximum number of SRAF insertions allowed () is 20. For the white-box attack, the check parameter () is 180. and are attacker-specified parameters, as explained in Algorithms 1 and 2.

6.2. Attack Results

The most successful attack was on Network A, where we achieved a 99.7% attack success (498 hotspot layouts were made to be classified as non-hotspot by the CNN). The white-box attack success rate drops between our attack of Network A and B (a decrease of 14.2%). One explanation is that the complex Network B has learned more about the characteristics of hotspots, and therefore is more challenging to fool; this is consistent with prior findings of neural network robustness (Madry et al., 2017).

These trends can be observed in the average time taken to generate an adversarial layout in the white-box attack. Network B required on average ~6 more time than the white-box attack on Network A. The extra time for the white-box attack on Network B to produce a successful perturbed layout is partially due to the increased feedforward computation on more layers (higher overhead in query-time during the attack). Similarly, the average number of SRAFs inserted is greater for the more complex Network B compared to the simpler Network A.

Fig. (a) and Fig. (c) show the percentage of layouts that were successfully perturbed by a given number of SRAF insertions for the white-box attack. In all cases, the minimum number of SRAFs that needed to be added to cause misclassification was 1, and an example of this is shown in Fig. 8. Of the layouts that were successfully perturbed to appear as non-hotspot in each attack, ~13% required only one inserted SRAF to fool Network A, and ~10% for Network B. Furthermore, 50% of the perturbed layouts could fool Network A with 4 or fewer inserted SRAFs. For Network B, 50% of the perturbed layouts had 7 or fewer inserted SRAFs.

Looking to the black-box attack results, we notice the same general trends as the white-box attack, where the simpler Network A is attacked most successfully, while the complex Network B exhibits greater resilience. Fig. (b) and Fig. (d) show the percentage of layouts that were successfully perturbed by a given number of SRAF insertions for the black-box attack. As with the white-box attack, the black-box attack yielded adversarial layouts with as few as one inserted SRAF (an example is shown in Fig. 8). Of the layouts that were successfully perturbed to appear as non-hotspot, ~18% required one inserted SRAF to fool Network A and ~11% required one SRAF to fool Network B. In the black-box attack on Network A, 50% of the adversarial layouts required 3 or fewer added SRAFs. For Network B, 6 or fewer SRAFs were required in 50% of the adversarial layouts.

(a) White-box Attack on Network A
(b) Black-box Attack on Network A
(c) White-box Attack on Network B
(d) Black-box Attack on Network B
Figure 7. Histograms of percentages of different number of adversarial SRAF insertions by white-box attack on Network A (a) and B (c). Histograms of percentages of different number of adversarial SRAF insertions by black-box attack on Network A (b) and B (d).

6.3. Comparison and Observations

There are a number of commonalities and differences between the results of white-box and black-box attacks. Both attacks produced adversarial layouts with only one added SRAFs for a number of layouts. Attacks on the simpler Network A had a higher attack success rate in both white-box and black-box cases. There is also a notable increase in the average number of adversarial SRAFs added; the white-box attack requires 1-2 more SRAFs on average. An example of different SRAF insertions produced by the white-box and black-box attack can be seen in Fig. 9.

Of particular interest to an attacker is the feasibility of the attack in terms of computational overhead. One measure of this is the time taken to generate an adversarial layout. Our results show that the more successful black-box attack is 10 slower than the white-box attack. The trade-off for such high attack success is increased time and computation requirements. This can be explained by the number of times the attacker needs to query the CNN-based detector. In the white-box attack, the attacker only needs to query the network times in each iteration. The first query is incurred when using the network to compute the loss function gradients for each pixel. The subsequent queries obtain the prediction probabilities for candidate adversarial layouts. When is set to the maximum number of possible shape/position combinations, white-box is equivalent to the black-box attack.

As an attacker, one could tune the check parameter, , to balance the attack success rate against the computation resources required. Thanks to the gradient information used to guide the placement of adversarial SRAFs in the white-box algorithm, need not be too large to achieve reasonable or even comparable attack success rate as the black-box attack, while taking advantage of up to a 10 attack time reduction. This can also be useful for defenders, as we discuss in Section 7.

(a) Original Layout
(b) Adversarial Layout (white-box)
(c) Adversarial Layout (black-box)
(d) Lithography Simulation of 8(a)
(e) Lithography Simulation of 8(b)
(f) Lithography Simulation of 8(c)
Figure 8. White-box and black-box attack outputs – Example 1. These examples feature a single inserted SRAF. In the layout images (a-c), the vias are colored green, the original SRAFs are white, and the adversarial SRAFs are red. In the lithography simulation outputs, the vias are shown in purple and hotspots are marked with a cross and labelled with ”HOTSPOT!”.
(a) Original Layout
(b) Adversarial Layout (white-box)
(c) Adversarial Layout (black-box)
(d) Lithography Simulation of 9(a)
(e) Lithography Simulation of 9(b)
(f) Lithography Simulation of 9(c)
Figure 9. White-box and black-box attack outputs – Example 2. These examples feature multiple inserted SRAFs. In the layout images (a-c), the vias are colored green, the original SRAFs are white, and the adversarial SRAFs are red. In the lithography simulation outputs, the vias are shown in purple and hotspots are marked with a cross and labelled with ”HOTSPOT!”.

6.4. Do Adversarial Perturbations Fix Hotspots?

Given the success of our adversarial perturbations, a natural question to ask is whether, using perturbations, we are actually fixing hotspots instead of misleading the designer. To answer this question, we performed lithography simulation of the adversarial layouts to confirm the hypothesis that inserting only a few SRAFs does not drastically improve/fix hotspots but instead cause misclassification in most cases. We used the same experimental settings as those for ascertaining the ground truth labels of the original dataset (described in Section 4). Examples of original and adversarial layouts, as well their simulation outputs, are shown in Fig. 8 and Fig. 9. The simulations revealed that in the majority of cases, our adversarial layouts still produced layout defects. In the white-box attack on Network A, 84.4% of the adversarial layouts that the network classified as non-hotspot were verified as hotspot, while in the same attack on Network B, 77.4% of the adversarial layouts that were classified as non-hotspot were verified as hotspot. In the black-box attack, the verification rate was similar, where 86.7% and 77.9% of the layouts that fooled Network A and B (respectively) were verified as hotspot.

When we examined the lithography simulation outputs we found instances where the number of hotspots in a layout increased, decreased, and stayed the same. An example of an instance where the inserted SRAFs added hotspots is shown in Fig. 9(e), instances where the inserted SRAFs led to less hotspots is shown in Fig. 8(e)(f), and an instance where the inserted SRAFs did not change the hotspot number is shown in Fig. 9(f).

7. Towards a More Robust Network

7.1. Iterative Adversarial Retraining

So far we have shown that the white-box and black-box attacks are effective. Since this implies a feasible threat to deep learning based CAD, there is a need to investigate and propose countermeasures. As such, we propose a strategy to increase the robustness of CNN-based hotspot detectors. The main aim is to reduce the attack success rate without compromising hotspot detection accuracy. The approach can be integrated into the initial training process for the CNN-based network and is a type of adversarial retraining, as proposed in (Tramèr et al., 2018).

First, let us assume that the defender knows the risks of adversarial perturbations on hotspot detectors. Intuitively they can make the trained network robust by including adversarial layouts into the training dataset but with true hotspot labels, and then retrain their detector using the usual methods (Tramèr et al., 2018). To diversify the training data set, the defender can adopt the attacker’s methodology to proactively generate their own adversarial layouts and include them after verifying the true labels using lithography simulation. For robustness, the defender can repeat the adversarial retraining to suppress the success rate of adversarial attacks on the robust retrained network.

In practice, as we showed in Section 6, the black-box attack achieves the highest possible attack success rate. Hence, it would make sense to make the network robust using adversarial layouts produced by this attack. However, given that it can be 10 slower than the white-box attack, this is less feasible under time and computation resource constraints. Therefore, while the white-box attack may have lower success rate in some occasions, it is more efficient in generating adversarial layouts. We adopt the white-box attack as part of the defender’s strategy, and this provides ample training data and robustification results. Adversarial retraining is shown in Algorithm 3 and the flow is shown in Fig. 10.

1:Input: Training data , Training hotspot data , network function , adversarial non-hotspot layout generation function , lithography simulation process , network retraining process , maximum number of retraining rounds .
2:for  = 1 to  do
3:      Generate adversarial non-hotspot layout for training hotspot.
4:      Get verified hotspot through lithography simulation.
5:     
6:      Retrain network with robust training data .
7:Return: Robustified network
Algorithm 3 Adversarial Retraining
Figure 10. Overview of the adversarial retraining process.

7.2. Evaluation

To demonstrate robustification, we perform iterative adversarial retraining on Network A, and perform white-box attacks to determine the attack success rate. We start with Network A in Section 5. We conduct the white-box attack using all hotspot layouts from the training set that are correctly classified by the CNN (

1
of Fig. 10). Of these 2070 hotspot layouts, the white-box attack successfully produces 1725 adversarial layouts (these were verified using lithography simulation,

2
of Fig. 10). These are labelled as hotspot, and added into the training dataset for the 1st round of retraining (

3
of Fig. 10). We call the 1st round retrained Network A’. For the second round of retraining, we take the hotspot layouts from the expanded training dataset that are correctly classified by Network A’, and perform the white-box attack. A’ classifies 3910 hotspot layouts correctly, and from these, the white-box attack produces 2141 lithography simulation-verified hotspot layouts. These layouts are added into the training dataset and we perform a second round of retraining to produce the next generation of robustified network, Network A”. One can repeat this until an attack threshold success rate is met, or after a pre-determined number of rounds.

To evaluate the efficacy of adversarial retraining, we perform white-box attacks using correctly classified hotspot layouts for Network A, A’ and A” from the validation dataset, after each round of training. When we use the white-box attacks to produce new training data and evaluate the retrained networks, we set the maximum number of SRAF insertions allowed () to 20, and the check parameter () to 180 (as in Section 6). The results are shown in Table 5. Throughout the retraining process, the networks’ overall accuracy (the average of hotspot and non-hotspot classification accuracy) and hotspot detection accuracy are maintained. Even though Network A has a simpler architecture, after two rounds of retraining, its resilience surpasses that of Network B (see Table 4).

Network Initial net (A) 1st round retrain (A’) 2nd round retrain (A”)
Network overall accuracy 0.73 0.73 0.73
Hotspot detection accuracy 0.72 0.72 0.72
Attack success rate 99.7% 73.6% 37.2%
Average attack time per layout 8.6 s 18.2 s 22.9 s
Average number of SRAFs added 5.3 7.3 7.2
Average area of SRAFs added 0.3% 0.4% 0.4%
Table 5. White-box attack results with iterative adversarial retraining on Network A. The maximum number of SRAF insertions allowed () is 20, and the check parameter () is 180.

8. Discussion

Over the course of this study, our findings raised several questions that warrant discussion and future study.

What drives differences between white-box vs. black-box attack success rate?

Black-box attacks had a higher success rate compared to white-box attacks, at the cost of longer attack time. We posit that this is largely due to the greedy nature of the black-box attack, where the target network is repeatedly queried. This query-based approach guarantees that the attacker can achieve the highest attack success rate (given a fixed horizontal and vertical sliding stride) in searching for the best shape/position combination. Conversely, while the white-box attack is a gradient-guided approach, it considers candidates and it is possible that the best valid solutions are missed depending on the size of the check parameter.

What factors affect differences in robustness between shallower and deeper networks?

An interesting finding is that that there was a difference in attack success rate against Network A and B, where the more complex Network B displayed greater robustness even while the networks’ baseline hotspot detection accuracies were the same. Network B’s greater attack resilience is also supported by the lithography simulation-based verification, where less adversarial layouts were verified as still hotspot. Prior work has found that networks with greater capacity are more robust (Madry et al., 2017); whether Network B has learned a ”better” approximation of the underlying physics warrants more study.

Do adversarial attacks generalize to other datasets?

Our study focuses on SRAF insertion for improving the printability of via layouts, which represents only one scenario in lithographic hotspot detection. Unfortunately, as noted previously, we were limited in our ability to comprehensively evaluate our attack on the ICCAD’12 contest benchmarks due to the unavailability of a DRC deck and lithography simulation settings for these benchmarks. Nevertheless, in the appendix, we show results for a limited evaluation of adversarial perturbation attacks on the ICCAD’12 dataset using a small set of inferred design rules. Note that although we could not verify via lithographic simulation that the adversarially perturbed layouts remain hotspots, the perturbations are relatively small and are therefore unlikely to have actually fixed hotspots (as we have observed in the SRAF case study). Interestingly, the baseline CNN for the ICCAD’12 benchmarks has accuracy, suggesting that higher accuracy by itself does not necessarily imply greater adversarial robustness.

What does the network learn?

Security aside, perhaps the more basic question raised by our work is this: what does a CNN learn? The high success rate of the attacks indicate that the CNN-based hotspot detectors do not truly and fully ”learn” the physics relevant to the hotspot problem. One might argue that this is to be expected; after all, a CNN is only approximating the underlying physics. Nonetheless, the fact that in several cases only one or two additional SRAFs throws off the CNN is worrisome since, at least intuitively, these small modifications should not drastically fix hotspots. Indeed, in the broader deep learning community, there is on-going work about the interpretability of neural networks that seeks to better understand what concepts networks actually learn, and we would encourage this to be considered in the ML-CAD context also (Montavon et al., 2018).

Are there wider security implications for ML-based CAD flows?

The success of both attacks in lithographic hotspot detection raises important questions about the wider implications to ML in CAD. CAD flows involve many complex steps using tools sourced from different vendors; this provides a wide attack surface (Basu et al., 2019). With ML added to the mix, the risk compounds — prior work has raised concerns about outsourcing deep learning (Tianyu Gu et al., 2019), and given the many CAD domains in which deep learning can contribute (we provide an overview in Section 9), security considerations are paramount. A key insight we provide in this study is the presence of semantically meaningful perturbations in the lithographic context. We posit that similar meaningful perturbations exist in other CAD domains, and further work should be done to discover these.

9. Related Work

The CAD industry is facing challenges with increasing design complexity, especially with the growing time-to-market pressures. ML techniques have been explored to accelerate steps in the VLSI design flow (Kahng, 2018).

In optical lithography, a variety of techniques have been proposed to analyze the printability of layouts and design enhancement. Pattern matching (such as (Yu et al., 2012)) and ML (such as (Matsunawa et al., 2015)) have been studied, offering a range of successes in accuracy and ability to generalize for previously unseen layouts. Recently, there has been an uptick in the study of deep learning approaches, where different facets have been investigated. For example, in (Yang et al., 2018b), Yang et al. train a CNN to detect hotspots from a layout image. They provide a detailed comparison on the effectiveness of different ML techniques at identifying hotspots, concluding that the CNN-based approaches offer superior accuracy. To reduce the computational overhead of processing large layout data, (Jiang et al., 2019)

proposes to binarize and down-sample the input data, yielding 8

speed-up over prior deep learning solutions.

Other studies have exposed challenges in adopting CNNs, such as the abundance (or lack thereof) of labeled data for training. Chen et al. detect hotspots using a CNN (Chen et al., 2019b), but propose a semi-supervised approach to handle the scarcity of labelled data. Using a two-stream architecture, labelled data is used to create a preliminary model that is used to provisionally label other samples together with a measure of confidence in the provisional label. Provisionally labelled samples with high confidence are used to train the model in subsequent training cycles; the general belief is that more data can result in ML models with more knowledge. Synthetic variants of labelled data are proposed in (Reddy et al., 2018) to increase the size of the training dataset. The adversarial retraining procedure proposed in our work can similarly be viewed as a data augmentation strategy; however, unlike prior work, our data augmentation is targeted towards generating adversarially robust networks.

Recently, state-of-the-art applications of CNNs have moved beyond design analysis towards design enhancement to aid in modifying designs to reach a certain goal. Insertion of SRAFs has been framed as a type of image domain transformation, where Generative Adversarial Networks (GANs) are trained to take in layouts and ”predict” where SRAFs should be inserted (Alawieh et al., 2019). Other mask optimizations (such as OPC) have been cast similarly (Yu et al., 2019; Yang et al., 2018a)

. While they focus on accuracy and scalability, our work examines an orthogonal, yet crucial dimension of robustness. In physical design, trained ML models are a faster alternative to simulation, allowing designers to quickly evaluate the validity of a design. Lin et al. perform resist modelling and demonstrate transfer learning for different technologies

(Lin et al., 2018). Cao et al. (Cao et al., 2019) use parameters related to design, pin-mapping, and layout to predict achievable and actual inductance at pre- and post-layout stages.

Checking design rule violation (DRV) is another important aspect of the design flow where deep learning has been used. In (Tabrizi et al., 2018), Tabrizi et al. do routability checks after netlist placement, but before global routing. Routing shorts are predicted using the trained model, allowing designers to avoid potential unroutable layouts. Similarly, Xie et al. (Xie et al., 2018) use a CNN to predict the number of DRVs, even in the presence of design macros and to identify DRV hotspots. DRV prediction ascertains the routability of layouts for earlier correction.

In early stages of design, deep learning has been used for logic optimization (Winston Haaswijk et al., 2018), design space exploration (Greathouse and Loh, 2018), synthesis flow exploration (Yu et al., 2018)

, and high-level area estimations

(Elena Zennaro et al., 2018). Such techniques reduce designer workload by culling the design variants that need to be progressed in the design flow. Yu et al. (Yu et al., 2018) propose the training and use of a CNN to gauge the effectiveness of different combinations of synthesis transformations (termed flows) for a given register transfer level (RTL) design. To avoid exhaustively running all combinations, a model is trained from a subset of possible flows. The trained model is used to predict the quality of several possible flows, outputting a collection of ”angel-flows” that are likely to yield good results in the synthesis of the RTL design. In a related approach, Hasswijk et al. (Winston Haaswijk et al., 2018) train a CNN to discover new transformation algorithms, but focus on graph optimization instead of quality of results from subsequent technology mapping. Orthogonal to these approaches, Greathouse et al. (Greathouse and Loh, 2018) use a neural network to predict how the performance of a software kernel will scale as a function of the number of parallel compute units. The adversarial robustness of these aforementioned approaches remains an open question.

Adversarial attacks on CNNs are being actively studied (Biggio and Roli, 2018), although, prior to our paper, this question of adversarial robustness has not yet been brought up in the electronic CAD domain. Our work contributes to this growing body of literature in this hitherto unanalyzed context. Our attacks modify layouts to cause misclassification on a well-trained network; these belong to the class of inference-time or evasion attacks and have been examined in detail (in a general context) in works like (Goodfellow et al., 2015; Kurakin et al., 2016; Szegedy et al., 2014; Moosavi-Dezfooli et al., 2017; Sharif et al., 2016; Eykholt et al., 2018).

Adversarial attacks in the literature can be classified into two categories based on the contextual meaning/imperceptibility of the added perturbation. One class of adversarial perturbations are meaningless and akin to subtle noise that are crafted to fool the neural network in general cases. The other type of adversarial perturbations have contextual meaning, while still remaining subtle in the semantics of the real-world context. Examples of these include using a pair of glasses to mislead a face recognition system

(Sharif et al., 2016) and a post-it note on a traffic sign to fool a traffic sign detector (Eykholt et al., 2018). Our attack on hotspot detection falls into the second category, as our added SRAFs are semantically meaningful (SRAFs are real-world artifacts) and difficult to perceive as malign.

A variety of defenses against adversarial inputs have been proposed (Dhillon et al., 2018; Tramèr et al., 2018; Madry et al., 2017; Nicolas Papernot et al., 2016). Some focus on the detection of adversarial inputs by identifying feature disparity between valid and adversarial examples (Meng and Chen, 2017; Xu et al., 2018; Metzen et al., 2017). Some resort to transformation techniques to rectify adversarial inputs into ”normal” ones (Meng and Chen, 2017; Samangouei et al., 2018; Guo et al., 2018). Others resort to retraining to counter adversarial attacks (Goodfellow et al., 2015; Tramèr et al., 2018). Research into defenses that offer strong guarantees of robustness is ongoing (Madry et al., 2017). We use adversarial retraining to make CNN-based hotspot detectors robust without sacrificing detection accuracy and adding computation overhead in inference.

In contrast to inference-time attacks, another class of attacks on deep learning is that of training-time or backdoor attacks (Tianyu Gu et al., 2019; Liu et al., 2018b). Here the training dataset is in some way compromised (or poisoned). Part of the study into these attacks includes examining the risks when the integrity of the training data is compromised (Tianyu Gu et al., 2019), and the risks that come from re-using potentially compromised networks. Recent work that aims to improve the resilience to backdoors include (Liu et al., 2018a; Bolun Wang et al., 2019; Chen et al., 2019a). Understanding the implications of these attacks in ML-based CAD merits investigation.

10. Conclusion

In this paper, we revealed a vulnerability of CNN-based hotspot detection in electronic CAD. We showed that CNN-based hotspot detectors are easily fooled by specially crafted SRAF insertions that can mislead the network to predict a hotspot layout as non-hotspot. We proposed and examined white-box and black-box attacks on well-trained hotspot detection CNNs, and the results showed that up to 99.7% attack success rate was possible. The deeper, more complex CNN we attacked exhibited better natural robustness compared to the less complex CNN. To robustify the vulnerable hotspot detectors, we proposed adversarial retraining, revealing that after only two rounds, the white-box attack success rate could be decreased to 37.2%. Our findings point to semantically meaningful adversarial perturbations as a viable concern for ML-based CAD. This study leads us to urge caution and advocate for further study of the wider security implications of deep learning in this field. As an immediate recommendation for CNN-based hotspot detection, we suggest adversarial retraining as an add-on procedure after initial network training, as it introduces no extra computation overhead at inference and has no accuracy compromise, but adds robustness against adversarial attacks. Ultimately, we find that adversarial perturbations are not necessarily a showstopper for ML-based CAD, assuming, of course, that an appropriately proactive stance is adopted. Hence, our future work will look to other attack types in other CAD problems, including training-time attacks and robustification techniques.

References

  • (1)
  • Alawieh et al. (2019) Mohamed Baker Alawieh, Yibo Lin, Zaiwei Zhang, Meng Li, Qixing Huang, and David Z. Pan. 2019. GAN-SRAF: Sub-Resolution Assist Feature Generation Using Conditional Generative Adversarial Networks. In Design Automation Conference - DAC ’19. ACM, Las Vegas, NV, USA, 1–6. https://doi.org/10.1145/3316781.3317832
  • Basu et al. (2019) Kanad Basu, Samah Mohamed Saeed, Christian Pilato, Mohammed Ashraf, Mohammed Thari Nabeel, Krishnendu Chakrabarty, and Ramesh Karri. 2019.

    CAD-Base: An Attack Vector into the Electronics Supply Chain.

    ACM Trans. Des. Autom. Electron. Syst. 24, 4 (April 2019), 38:1–38:30. https://doi.org/10.1145/3315574
  • Biggio and Roli (2018) Battista Biggio and Fabio Roli. 2018.

    Wild patterns: Ten years after the rise of adversarial machine learning.

    Pattern Recognition 84 (Dec. 2018), 317–331. https://doi.org/10.1016/j.patcog.2018.07.023
  • Bolun Wang et al. (2019) Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zhen, and Ben Y. Zhao. 2019. Stealthy Porn: Understanding Real-World Adversarial Images for Illicit Online Promotion. In IEEE Symposium on Security and Privacy - SP ’19, Vol. 1. IEEE Computer Society, Los Alamitos, CA, USA, 547–561. https://doi.org/10.1109/SP.2019.00032
  • Cao et al. (2019) Yi Cao, Andrew B. Kahng, Joseph Li, Abinash Roy, Vaishnav Srinivas, and Bangqi Xu. 2019. Learning-based prediction of package power delivery network quality. In Asia and South Pacific Design Automation Conference - ASP-DAC ’19. ACM, Tokyo, Japan, 160–166. https://doi.org/10.1145/3287624.3287689
  • Chen et al. (2019a) Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Benjamin Edwards, Taesung Lee, Heiko Ludwig, Ian Molloy, and Biplav Srivastava. 2019a. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. In

    Proceedings of the AAAI Workshop on Artificial Intelligence Safety - SafeAI 2019

    . AAAI, 8.
  • Chen et al. (2019b) Ying Chen, Yibo Lin, Tianyang Gai, Yajuan Su, Yayi Wei, and David Z. Pan. 2019b. Semi-supervised hotspot detection with self-paced multi-task learning. In Asia and South Pacific Design Automation Conference - ASP-DAC ’19. ACM, Tokyo, Japan, 420–425. https://doi.org/10.1145/3287624.3287685
  • Chollet et al. (2015) François Chollet et al. 2015. Keras. https://keras.io.
  • Company (1987) Calma Company. 1987. GDSII Stream Format Manual, Release 6.0. http://bitsavers.informatik.uni-stuttgart.de/pdf/calma/GDS_II_Stream_Format_Manual_6.0_Feb87.pdf
  • Dhillon et al. (2018) Guneet S. Dhillon, Kamyar Azizzadenesheli, Jeremy D. Bernstein, Jean Kossaifi, Aran Khanna, Zachary C. Lipton, and Animashree Anandkumar. 2018. Stochastic activation pruning for robust adversarial defense. In International Conference on Learning Representations - ICLR ’18. OpenReview.net, 1–6. https://openreview.net/forum?id=H1uR4GZRZ
  • Elena Zennaro et al. (2018) Elena Zennaro, Lorenzo Servadei, Keerthikumara Devarajegowda, and Wolfgang Ecker. 2018. A Machine Learning Approach for Area Prediction of Hardware Designs from Abstract Specifications. In 2018 21st Euromicro Conference on Digital System Design - DSD ’18. IEEE Computer Society, 413–420. https://doi.org/10.1109/DSD.2018.00076
  • Eykholt et al. (2018) Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust Physical-World Attacks on Deep Learning Visual Classification. In

    The IEEE Conference on Computer Vision and Pattern Recognition - CVPR ’18

    . IEEE Computer Society, Salt Lake City, Utah, 1625–1634.
    https://doi.org/10.1109/CVPR.2018.00175
  • Geng et al. (2019) Hao Geng, Haoyu Yang, Yuzhe Ma, Joydeep Mitra, and Bei Yu. 2019. SRAF Insertion via Supervised Dictionary Learning. In Asia and South Pacific Design Automation Conference - ASP-DAC ’19. ACM, New York, NY, USA, 406–411. https://doi.org/10.1145/3287624.3287684
  • Goodfellow et al. (2015) Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. http://arxiv.org/abs/1412.6572
  • Graphics (2019) Mentor Graphics. 2019. Calibre LFD. {https://www.mentor.com/products/ic_nanometer_design/design-for-manufacturing/calibre-lfd/}
  • Greathouse and Loh (2018) Joseph L. Greathouse and Gabriel H. Loh. 2018. Machine learning for performance and power modeling of heterogeneous systems. In International Conference on Computer-Aided Design - ICCAD ’18. ACM, San Diego, California, 1–6. https://doi.org/10.1145/3240765.3243484
  • Guo et al. (2018) Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. 2018. Countering Adversarial Images using Input Transformations. In 6th International Conference on Learning Representations - ICLR ’18. OpenReview.net, Vancouver, BC, Canada. https://openreview.net/forum?id=SyJ7ClWCb
  • He and Garcia (2008) Haibo He and Edwardo A Garcia. 2008. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering 9 (2008), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
  • Jiang et al. (2019) Yiyang Jiang, Fan Yang, Hengliang Zhu, Bei Yu, Dian Zhou, and Xuan Zeng. 2019. Efficient Layout Hotspot Detection via Binarized Residual Neural Network. In Design Automation Conference 2019 - DAC ’19. ACM, Las Vegas, NV, USA, 1–6. https://doi.org/10.1145/3316781.3317811
  • Kahng (2018) Andrew B. Kahng. 2018. Machine Learning Applications in Physical Design: Recent Results and Directions. In International Symposium on Physical Design - ISPD ’18. ACM, Monterey, California, USA, 68–73. https://doi.org/10.1145/3177540.3177554
  • Kingma and Ba (2014) Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs] (Dec. 2014). http://arxiv.org/abs/1412.6980 arXiv: 1412.6980.
  • Krizhevsky (2009) Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
  • Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).
  • Lin et al. (2018) Yibo Lin, Yuki Watanabe, Taiki Kimura, Tetsuaki Matsunawa, Shigeki Nojima, Meng Li, and David Z. Pan. 2018. Data Efficient Lithography Modeling with Residual Neural Networks and Transfer Learning. In International Symposium on Physical Design - ISPD ’18. ACM, Monterey, California, USA, 82–89. https://doi.org/10.1145/3177540.3178242
  • Liu et al. (2018a) Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018a. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Research in Attacks, Intrusions, and Defenses (Lecture Notes in Computer Science), Michael Bailey, Thorsten Holz, Manolis Stamatogiannakis, and Sotiris Ioannidis (Eds.). Springer International Publishing, 273–294. https://doi.org/10.1007/978-3-030-00470-5_13
  • Liu et al. (2017) Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into Transferable Adversarial Examples and Black-box Attacks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Sys6GJqxl
  • Liu et al. (2018b) Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018b. Trojaning Attack on Neural Networks. In Network and Distributed System Security Symposium - NDSS’18. The Internet Society. http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_03A-5_Liu_paper.pdf
  • Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1706.06083 [cs, stat] (June 2017). http://arxiv.org/abs/1706.06083 arXiv: 1706.06083.
  • Matsunawa et al. (2015) Tetsuaki Matsunawa, Jhih-Rong Gao, Bei Yu, and David Z Pan. 2015.

    A new lithography hotspot detection framework based on AdaBoost classifier and simplified feature extraction. In

    Design-Process-Technology Co-optimization for Manufacturability IX, Vol. 9427. International Society for Optics and Photonics, 94270S.
    https://doi.org/10.1117/12.2085790
  • Meng and Chen (2017) Dongyu Meng and Hao Chen. 2017. Magnet: a two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security. ACM, Dallas, TX, 135–147. https://doi.org/10.1145/3133956.3134057
  • Metzen et al. (2017) Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On Detecting Adversarial Perturbations. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJzCSf9xg
  • Montavon et al. (2018) Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2018. Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73 (2018), 1–15. https://doi.org/10.1016/j.dsp.2017.10.011
  • Moore (2018) Samuel K. Moore. 2018. DARPA Picks Its First Set of Winners in Electronics Resurgence Initiative. https://spectrum.ieee.org/tech-talk/semiconductors/design/darpa-picks-its-first-set-of-winners-in-electronics-resurgence-initiative
  • Moosavi-Dezfooli et al. (2017) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal Adversarial Perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition - CVPR ’17. IEEE Computer Society, 86–94. https://doi.org/10.1109/CVPR.2017.17
  • Nair and Hinton (2010) Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on International Conference on Machine Learning - ICML ’10. Omnipress, USA, 807–814. http://dl.acm.org/citation.cfm?id=3104322.3104425
  • Nicolas Papernot et al. (2016) Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In 2016 IEEE Symposium on Security and Privacy - SP ’16. 582–597. https://doi.org/10.1109/SP.2016.41
  • Papernot et al. (2017) Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security - ASIACCS ’17. ACM, Abu Dhabi, United Arab Emirates, 506–519. https://doi.org/10.1145/3052973.3053009
  • Reddy et al. (2018) Gaurav Rajavendra Reddy, Constantinos Xanthopoulos, and Yiorgos Makris. 2018. Enhanced hotspot detection through synthetic pattern generation and design of experiments. In IEEE 36th VLSI Test Symposium - VTS ’18. IEEE, San Francisco, CA, 1–6. https://doi.org/10.1109/VTS.2018.8368646
  • Samangouei et al. (2018) Pouya Samangouei, Maya Kabkab, and Rama Chellappa. 2018. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=BkJ3ibb0-
  • Sharif et al. (2016) Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition. In ACM SIGSAC Conference on Computer and Communications Security. ACM, Vienna, Austria, 1528–1540. https://doi.org/10.1145/2976749.2978392
  • Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. http://arxiv.org/abs/1312.6199
  • Tabrizi et al. (2018) Aysa Fakheri Tabrizi, Nima Karimpour Darav, Shuchang Xu, Logan Rakai, Ismail Bustany, Andrew Kennings, and Laleh Behjat. 2018. A Machine Learning Framework to Identify Detailed Routing Short Violations from a Placed Netlist. In Design Automation Conference - DAC ’18. ACM, New York, NY, USA, 48:1–48:6. https://doi.org/10.1145/3195970.3195975
  • Tianyu Gu et al. (2019) Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 7 (2019), 47230–47244. https://doi.org/10.1109/ACCESS.2019.2909068
  • Torres (2012) J. Andres Torres. 2012. ICCAD-2012 CAD contest in fuzzy pattern matching for physical verification and benchmark suite. In 2012 IEEE/ACM International Conference on Computer-Aided Design - ICCAD ’12. 349–350.
  • Tramèr et al. (2018) Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. 2018. Ensemble Adversarial Training: Attacks and Defenses. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rkZvSe-RZ
  • Winston Haaswijk et al. (2018) Winston Haaswijk, Edo Collins, Benoit Seguin, Mathias Soeken, Frédéric Kaplan, Sabine Süsstrunk, and Giovanni De Micheli. 2018. Deep Learning for Logic Optimization Algorithms. In 2018 IEEE International Symposium on Circuits and Systems - ISCAS ’18. 1–4. https://doi.org/10.1109/ISCAS.2018.8351885
  • Xie et al. (2018) Zhiyao Xie, Yu-Hung Huang, Guan-Qi Fang, Haoxing Ren, Shao-Yun Fang, Yiran Chen, and Nvidia Corporation. 2018. RouteNet: Routability Prediction for Mixed-size Designs Using Convolutional Neural Network. In International Conference on Computer-Aided Design - ICCAD ’18. ACM, New York, NY, USA, 80:1–80:8. https://doi.org/10.1145/3240765.3240843
  • Xu et al. (2018) Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In Network and Distributed System Security Symposium - NDSS ’18. The Internet Society, San Diego, California. http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_03A-4_Xu_paper.pdf
  • Yang et al. (2018a) Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, and Evangeline F. Y. Young. 2018a. GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets. In Design Automation Conference - DAC ’18. IEEE, 1–6. https://doi.org/10.1109/DAC.2018.8465816
  • Yang et al. (2019) Haoyu Yang, Piyush Pathak, Frank Gennari, Ya-Chieh Lai, and Bei Yu. 2019. Hotspot detection using squish-net. In Design-Process-Technology Co-optimization for Manufacturability XIII, Vol. 10962. International Society for Optics and Photonics, 109620S. https://doi.org/10.1117/12.2515172
  • Yang et al. (2018b) Haoyu Yang, Jing Su, Yi Zou, Yuzhe Ma, Bei Yu, and Evangeline F. Y. Young. 2018b.

    Layout Hotspot Detection with Feature Tensor Generation and Deep Biased Learning.

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018), 1–1. https://doi.org/10.1109/TCAD.2018.2837078
  • Yang et al. (2017) Haoyu Yang, Jing Su, Yi Zou, Bei Yu, and Evangeline F. Y. Young. 2017. Layout Hotspot Detection with Feature Tensor Generation and Deep Biased Learning. In Design Automation Conference 2017 - DAC ’17. ACM, Austin, TX, USA, 1–6. https://doi.org/10.1145/3061639.3062270
  • Yu et al. (2019) Bo-Yi Yu, Yong Zhong, Shao-Yun Fang, and Hung-Fei Kuo. 2019. Deep learning-based framework for comprehensive mask optimization. In Asia and South Pacific Design Automation Conference - ASP-DAC ’19. ACM, Tokyo, Japan, 311–316. https://doi.org/10.1145/3287624.3288749
  • Yu et al. (2018) Cunxi Yu, Houping Xiao, and Giovanni De Micheli. 2018. Developing synthesis flows without human knowledge. In Design Automation Conference - DAC ’18. ACM, San Francisco, California, 1–6. https://doi.org/10.1145/3195970.3196026
  • Yu et al. (2012) Yen-Ting Yu, Ya-Chung Chan, Subarna Sinha, Iris Hui-Ru Jiang, and Charles Chiang. 2012. Accurate process-hotspot detection using critical design rule extraction. In Design Automation Conference - DAC ’12. ACM, 1167–1172. https://doi.org/10.1145/2228360.2228576

Appendix: Exploration using ICCAD ’12 Dataset

To further explore wider security implications in ML for CAD, we investigated adversarial perturbations in another hotspot detection task. In this experiment, we attacked a different CNN-based hotspot detector, trained using a dataset from the ICCAD 2012 contest on pattern matching for physical verification (Torres, 2012). The attack goal is similar to that in Section 3 in that the attacker wants to modify given hotspot layouts such that they are misclassified as non-hotspot by the detector. Note however, we could not perform lithography simulation-based verification as the physical models required were not available as part of the ICCAD competition dataset.

Dataset and Hotspot Detector Design

We use layout 4 from the ICCAD dataset containing 4547 training and 32067 test layout images. 4452 of the training samples are non-hotspot and the remaining 95 are hotspot. Of the test samples, 31890 are non-hotspot and 177 are hotspot. Each layout image has dimensions of 1200 pixel 1200 pixel, and has binary valued pixel intensities to represent the pattern to be printed. Each pixel corresponds to of the layout. Before training and inference we preprocess the layout images using the same DCT filters as described in Subsection 4.2 to obtain the DCT coefficients as input to the hotspot detector. The resulting input dimension is (12, 12, 32). We use a similar CNN architecture and training procedure as for Network A (Section 5) but with this modified input dimension — the network parameters are shown in Table 6. The trained network has 98.5% non-hotspot classification accuracy and 92.7% for hotspot classification accuracy on the test data.

Layer Kernel Size Stride Output Size
input - - (12, 12, 32)
conv1_1 3 1 (12, 12, 16)
conv1_2 3 1 (12, 12, 16)
maxpooling1 2 2 (6, 6, 16)
conv2_1 3 1 (6, 6, 32)
conv2_2 3 1 (6, 6, 32)
maxpooling2 2 2 (3, 3, 32)
fc1 - - 250
fc2 - - 2
Table 6. Network Architecture.
Black-box attack

We conducted a black-box attack on the test hotspot layouts to examine the efficacy of our proposed attack scheme. However, instead of inserting SRAFs, we add isolated printing patterns to the layouts. The attack constraints in this experiment are: (1) shape constraint: adversarial insertions can only be chosen from a restricted set of four basic shapes that already exist in the layout dataset, as illustrated in Fig. 11; (2) spacing constraint: inserted patterns should be at least 45 nm away from any surrounding patterns; (3) alignment constraint: inserted patterns need to be aligned with existing shapes; (4) insertion region: inserted patterns must not overlap with a 100 nm wide border at the edges of the layout image. The black-box algorithm is reused from Section 5.

Attack Results

We performed the black-box attack using 164 hotspot layouts from the test set that the detector correctly classified as hotspot. Of these 164 layouts, 127 were successfully perturbed to fool the network. The results are summarized in Table 7. We illustrate some of the perturbed layouts in Fig. 12 with 1-4 adversarial inserted patterns.

Attack success rate 77.4%
Average attack time per layout 63.5 s
Average number of patterns added 4.5
Average area of patterns added 2.1%
Table 7. Summary of black-box attack result
Figure 11. Restricted set of patterns for black-box attack (shape dimensions shown). The (a) square, (b) rectangle, (c) zig-zag, and (d) Tee shapes are drawn from the existing layout dataset.

Figure 12. Top row: original hotspot layouts. Bottom row: corresponding adversarial non-hotspot layouts.
Remarks

Based on these additional experiments, it appears that this dataset is also susceptible to adversarial perturbation attacks, despite the fact that the CNN-based hotspot detector baseline accuracy on this dataset is high. Access to the lithography simulation settings will further allow us to verify that the modified layouts remain hotspots.