DeepAI
Log In Sign Up

Keeping Deep Lithography Simulators Updated: Global-Local Shape-Based Novelty Detection and Active Learning

Learning-based pre-simulation (i.e., layout-to-fabrication) models have been proposed to predict the fabrication-induced shape deformation from an IC layout to its fabricated circuit. Such models are usually driven by pairwise learning, involving a training set of layout patterns and their reference shape images after fabrication. However, it is expensive and time-consuming to collect the reference shape images of all layout clips for model training and updating. To address the problem, we propose a deep learning-based layout novelty detection scheme to identify novel (unseen) layout patterns, which cannot be well predicted by a pre-trained pre-simulation model. We devise a global-local novelty scoring mechanism to assess the potential novelty of a layout by exploiting two subnetworks: an autoencoder and a pretrained pre-simulation model. The former characterizes the global structural dissimilarity between a given layout and training samples, whereas the latter extracts a latent code representing the fabrication-induced local deformation. By integrating the global dissimilarity with the local deformation boosted by a self-attention mechanism, our model can accurately detect novelties without the ground-truth circuit shapes of test samples. Based on the detected novelties, we further propose two active-learning strategies to sample a reduced amount of representative layouts most worthy to be fabricated for acquiring their ground-truth circuit shapes. Experimental results demonstrate i) our method's effectiveness in layout novelty detection, and ii) our active-learning strategies' ability in selecting representative novel layouts for keeping a learning-based pre-simulation model updated.

READ FULL TEXT VIEW PDF

page 1

page 2

page 4

page 9

page 14

02/11/2020

From IC Layout to Die Photo: A CNN-Based Data-Driven Approach

Since IC fabrication is costly and time-consuming, it is highly desirabl...
07/13/2018

Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Layout hotpot detection is one of the main steps in modern VLSI design. ...
10/05/2020

OLALA: Object-Level Active Learning Based Layout Annotation

In layout object detection problems, the ground-truth datasets are const...
06/14/2022

Learning 3D Object Shape and Layout without 3D Supervision

A 3D scene consists of a set of objects, each with a shape and a layout ...
09/07/2022

TAG: Learning Circuit Spatial Embedding From Layouts

Analog and mixed-signal (AMS) circuit designs still rely on human design...
01/26/2023

RMSim: Controlled Respiratory Motion Simulation on Static Patient Scans

This work aims to generate realistic anatomical deformations from static...
12/10/2021

Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms

Realistic 3D indoor scene datasets have enabled significant recent progr...

I Introduction

After integrated circuit (IC) circuit design and layout, it needs a multi-step sequence of photolithographic and chemical processing steps to fabricate an IC wafer. Because of the great exposure variations in the lithography procedure and the chemical reactions in the etching procedure during fabrication, the lithography and etching procedures together result in nonlinear shape deformation of a designed IC pattern, which is usually too complicated to model. This fact therefore urges the development of deep learning-based pre-simulation models, such as GAN-OPC [41], LithoGAN [42], and LithoNet-OPCNet [30], to handle issues like i) lithography simulation for predicting the shapes of fabricated circuit based on a given IC layout along with IC fabrication parameters, and ii) mask optimization for predicting the best mask to compensate for the fabrication-induced shape deformations. However, deep learning-based models and their training/updating processes usually rely on tremendous training data. The selection of training data consequently becomes a considerable issue because whether a training dataset or a fine-tune dataset (with novel patterns) is informative largely affects the generalization ability of a learning-based model.

Fig. 1: Relationship among novelty detection, pretrained simulation model, newly-designed layout, and IC fabrication. Given a pretrained pre-simulation model, e.g. LithoNet, driven by layout—SEM image pairs, the proposed layout novelty detection method aims to identify potential novel layouts from a pool of newly-designed layout patterns so that these potential novel layouts can be further fabricated to derive their SEM images for fine-tuning the pretrained model. Because only selected layouts need to be fabricated, this framework can save budgets and times for collecting good enough training samples. The proposed novelty detection method can thus act as an oracle for active learning.

However, how to collect an appropriate training dataset and sample a fine-tune dataset, also known as a development set, from the IC layout design and fabrication processes is practically a very complicated issue. This complexity is due to the following two aspects. First, given a deep learning model pretrained on an initial training dataset, it may still need to be fine-tuned on another development set so that it can be generalized to those samples that are unseen in the initial training dataset. In order to determine a proper development dataset, one has to assess a layout’s degree of novelty by checking whether this layout’s SEM (Scanning Electron Microscope) image shape can be accurately predicted by the pretrained pre-simulation model even in the absence of this layout’s ground-truth SEM image. Second, it is unaffordable to exhaustively collect the aerial images of each layout under different fabrication parameter settings as training ground-truths because of high costs. For example, LithoNet [30] learns the layout-to-SEM contour correspondence and the effects of fabrication parameters from a collection of layout-SEM image pairs. If LithoNet needs to learn how layout patterns deforms under different fabrication parameter settings, one should fabricate IC circuits to obtain a comprehensive training set covering all combinations of to-be-learned conditions. Such fabrication processes required to collect training data are too time-consuming and costly. Consequently, these two aspects bring the demand for novelty detection and active learning to both the maintenance of a learning-based pre-simulation model and the selection of a development set, as illustrated in Fig. 1.

To address this issue, we propose an SEM-free (i.e., ground-truth-free) scheme to detect novel layout patterns, whose SEM images are worthy of collecting via the costly IC fabrication process, for informatively updating a pretrained DNN-based lithography simulator. That is, given i) a deep-learning-based pre-simulation model, e.g., LithoNet [30], and ii) a pool of newly-designed IC layout clips, the proposed method aims to identify layout patterns, which are novel and informative to increase the generalization ability of the pretrained model. This scenario leads to two considerations. First, the method should be able to identify novel (unseen) layout designs whose fabrication-induced shape deformation cannot be well predicted by the model pretrained on an initial training set of layout-SEM image pairs. The difficulty laying behind the first consideration is that, during deployment, all inputs are layout patterns, and therefore our method needs to detect layout novelty by learning the relationship between layouts and their predicted layout-to-SEM deformation maps. Second, due to the high costs of IC fabrication and taking SEM images, the method should be able to select a reduced set of most informative layouts to update the pretrained model for the sake of budget efficiency. As a result, only the selected set of layouts will be fabricated to acquire their layout-SEM pairs for fine-tuning the pretrained model. This consideration also hints at the requirement of a sampling process for active learning. Because the solution of the second consideration, i.e., the capability of selecting data samples that can optimally represent the entire training data domain, highly depends on that of the first, we take into account both considerations to propose novel methods for novelty detection and active learning. This work has the following major contributions.

Our novelty detection method is the first to learn global-local features for identifying which layouts are novel and worthy of further fabrication, in contrast to existing approaches that detect novelties based on solely global features [24, 26]. Specifically, we devise two subnetworks to derive two novelty scores—one for measuring global structure dissimilarity and the other for capturing local deformation—complementary to each other. In this way, our method can efficiently collect informative layout–SEM image pairs, which are necessary for fine-tuning learning-based layout-to-SEM prediction or mask-optimization models like LithoNet/OPCNet to keep them updated with newly designed data.
During deployment, our method can detect novel layout patterns in the absence of the ground-truth SEM images of target layouts’ fabricated circuits. Therefore, our model not only meets the practical field requirements for layout pre-inspection, but also functions as an active learning oracle.
We further propose two effective graph sampling-based active-learning strategies, namely one-time sampling and incremental sampling, to sample a much reduced set of representative layouts, which are most worthy of further fabrication for acquiring their reference SEM images, in an on-a-budget environment.

The remainder of this paper is organized as follows. We review related literature in Sec.  II. The proposed layout novelty detection method is detailed in Sec. III. Sec. IV presents our proposed active learning strategies. Sec. V demonstrates and discusses our experimental results. Finally, we draw our conclusion in Sec. VI.

Ii Related Work

Ii-a Learning-Based Lithography Pre-simulation Models

Several learning-based lithography pre-simulation models were proposed for topics such as lithography simulation and mask optimization. In order to save the computational resources, Yang et al. proposed the GAN-OPC method [41] to facilitate the mask optimization process. GAN-OPC aims at creating quasi-optimal masks for given target circuit patterns by learning target-mask mappings. GAN-OPC can generate high-quality masks and thus ensure good printability while requiring reduced normal OPC steps. In addition, Ye et al. devised LithoGAN for lithography simulation [42]. LithoGAN is a GAN-based end-to-end lithography modeling framework that maps input mask patterns directly to the output resist patterns, making it capable of predicting resist patterns accurately while achieving significant speedup compared with conventional lithography simulation methods. Recently, our proposed LithoNet-OPCNet framework [30], successfully addresses the lithography simulation and mask optimization problems simultaneously in an end-to-end learning manner. Specifically, LithoNet, trained on a comprehensive set of layout–SEM image pairs, can accurately predict for an input layout pattern the fabrication-induced shape distortion, OPCNet [30], trained with the guidance provided by a pretrained LithoNet, aims to predict the optical proximity corrected (OPC) photo-mask pattern of an input layout.

Ii-B Novelty Detection

Novelty detection is the procedure used to identify if a data sample is hitherto unknown. It is typically modeled as a one-class classification problem, in which a novelty detector is trained on single-class training samples which are all supposed to be seen normal ones. As a result, the detector can determine whether an input testing sample is dissimilar to the seen training samples in terms of a given distance metric [16, 34]

or a loss function

[11]

during deployment. Though novelty detection is closely related to anomaly/outlier detection, their scenarios are significantly different. Specifically, anomaly/outlier detection methods usually learn to find abnormal samples in a given reference dataset, whereas the reference data used to train a novelty detection model are assumed to be

unpolluted and involve only normal regular samples. Note that a novelty detection method, like other anomaly/outlier detection methods, usually maps its input data to a novelty score so that an appropriate threshold can be defined accordingly to tell novel samples (outliers) and regular ones (inliers) apart [19].

Novelty detection methods have found applications in video surveillance [4, 17], medical imaging [27], abnormal event detection for attributed network [9, 38], etc. In general, common approaches based on probabilistic-based models, such as one-class SVMs [16]

and kernel density estimation 

[34]

, can achieve good performances on handling low-dimensional features. However, these methods may not apply to high-dimensional data well,

e.g.

, images in computer vision tasks. Hence, two sorts of CNN (convolutional neural network) based methods have been proposed to address this problem. One sort learns to generate a reconstructed image and then evaluate an abnormal score according to the difference between the input and the reconstructed images 

[2, 12], and the other learns to embed a latent structural feature of the input and then derive an abnormal score based on the extracted structural feature [1].

For example, Sabokrou et al

. proposed to train an auto-encoder along with a discriminator, conceptually a classifier, in an adversarial manner based on the reconstruction error, and then to determine whether an input is novel by the discriminator

[25]. Similarly, Perera et al. proposed OCGAN [22] to solve the one-class novelty detection problem by learning the latent representations of within-class examples via a denoising auto-encoder network. Moreover, DSGAN was proposed in [32] for synthesizing novel samples surrounding real training data such that the decision border between the regular data and novelties can be determined effectively by typical models. Besides, Pidhorskyi et al

. devised an architecture consisting of an auto-encoder and a discriminator for anomaly detection 

[23]. Their model is trained on top of a double min-max-game framework that iteratively optimizes the distribution of latent codes extracted by the auto-encoder and the fidelity of the reconstructed images.

Furthermore, classification-based novelty/anomaly detection models can generally be boosted via a self-supervised mechanism [10, 3, 33]. The applications of these methods are, however, limited by the assumption that their pre-processing strategies, e.g., rotation, random cropping, and geometric transformations, cannot alter the ground-truth labels (i.e., class information) of the training dataset. As for IC fabrication, a rotated or geometrically-transformed layout pattern will result in a different printed images because the processing results of a stepper/scanner in the x- and y-directions are asymmetric, and hence the self-supervised mechanism is usually not applicable.

Ii-C Active Learning

Active learning refers to cases in which a learning algorithm can assess the necessity of labeling an unlabeled sample by interactively querying an oracle—usually a pre-trained model or a user-specified metric function—about unlabeled samples’ importance [29]. A fundamental concept is uncertainty-based selection, through which an oracle recommends (unlabeled) data of high-uncertainty for labeling and disregards those high-confidence ones [13, 35]. This sort of methods are, however, sensitive to outliers. Recently, several active learning algorithms were devised for CAD/VLSI applications, such as the methods in [15, 46]. However, all these active learning techniques need to collaborate with a reliable oracle. Hence, we aim in this paper to develop an oracle that can assess the novelty of an unseen layout in an SEM-free environment by learning the knowledge contained in pairwise training samples.

Iii Global-Local Shape-Based Novelty Detection

Fig. 2: Framework of the proposed layout novelty detection method. Upper part: The SA-LithoNet, architecturally the encoder part of pretrained LithoNet followed by a self-attention module. Lower part: The autoencoder. The SA-LithoNet can embed an input layout into a latent code characterizing the local layout-to-SEM deformations, whereas the autoencoder is used to measure the global dissimilarity via the reconstruction error. These two parts can jointly derive a global-local (Glocal) score for layout novelty detection.

Iii-a Overview

Due to limited labeling resources, one often adopts a sampling strategy to select a small set of most informative unlabeled novel samples for further labeling routine and then use the newly labeled samples to update the learned model in an active learning manner. As reported in [45], while regular samples, e.g., layout clips, with characteristics similar to that of source training data can usually be predicted fairly well by a model pre-trained on the same source training set, data with unseen patterns, e.g., novel layout clips, are potentially able to improve a pretrained model and thus worth a fabrication to acquire their ground-truth SEM images. Hence, under the premise of saving the costs of fabricating excessive training samples and acquiring their SEM images, our goal here is to identify the most informative unseen layout clips, which are worth a fabrication for acquiring their SEM images to effectively update a pretrained model, from a pool of newly-designed IC layout clips.

To tackle this active learning problem for an IC fabrication pre-simulation model like LithoNet [30], we aim to design a layout novelty detection scheme that can work in the absence of ground-truth SEM images during the inference stage. It can distinguish novel layout clips, whose SEM images cannot be accurately predicted by a pre-simulation model (e.g., LithoNet [30]), from regular layouts whose SEM images can be well predicted. To this end, we elaborate first in Sec. III-B our supervised scheme to label novel layout patterns objectively by annotating novel regions on layout clips with the aid of ground-truth SEM images. We then describe our unsupervised layout novelty detection scheme, namely Glocal novelty score, in Sec. III-CIII-F.

Fig. 2 shows the architecture of our proposed Glocal (global-local) method that consists of two primary components, i.e., an SA-LithoNet and an autoencoder. Suppose that a novel layout should result from the innovation of global planning, the change of local planning, or both. Our method exploits i) LithoNet, a pre-simulation model of fabrication-induced local shape deformation [30], for capturing local shape features with the aid of a self-attention (SA) module, and ii) an autoencoder for characterizing global shape properties. The SA-LithoNet is architecturally the encoder part of a pretrained LithoNet followed by a self-attention module. This design employs SA-LithoNet to extract a feature representing local layout-to-SEM deformations within attended regions, identified by the self-attention module supervised by the novelty labels. Based on the assumption that, the local-shape feature of a novel sample should be deviated from the distribution of regular samples, we employ the SA-LithoNet feature for local novelty scoring via multi-class SVM (MC-SVM) classification. Besides that, we use the reconstruction error with the autoencoder, representing the global shape dissimilarity, as the global novelty score. As a result, we combine the local and global novelty scores to obtain the Glocal novelty score.

Iii-B Model Inconsistency-Guided Novelty Annotation

Because manually annotating novel patterns in a rich collection of layout clips is nontrivial, even for an experienced engineer, we devise first a supervised mechanism for annotating potential novelties on layout patterns to train and evaluate our novelty detection model. This mechanism aims to find a novel layout pattern based on the inconsistency between the pattern’s ground-truth SEM image and the corresponding layout-to-SEM prediction yielded by LithoNet [30]. Hence, we name this mechanism Model Inconsistency-Guided Novelty Annotation (MIGNA).

MIGNA aims to identify those local regions where the shape contours of the layout-to-SEM predictions [30]

significantly deviate from their counterparts in the corresponding ground-truth SEM images. Such deviations imply that the pretrained layout-to-SEM prediction model may not have learned from enough similar training layout patterns yet. Common sorts of shape deviations may include, for example, unexpected abnormal patterns such as enclosures, neckings, and bridges. One possible cause of these deviations is the unexpected diffraction, usually induced by the layout arrangements around abnormal patterns, during the lithography process, which makes the same layout pattern result in different SEM patterns with neighborhood-related shape variants. Consequently, when a learning-based pre-simulation model like LithoNet is trained on a training dataset containing insufficient similar patterns, its shape predictions tend to deviate from the corresponding ground-truths. Such deviations should be considered as anomalies, i.e., layout novelties, due to insufficient training patterns.

To identify unexpected shape deformations due to inaccurate predictions, we set a threshold of three standard deviations from the mean L1-norm of the pixel-wise differences between the layout-to-SEM predictions and their ground-truth SEM images. Three standard deviations from the mean is a common cut-off in practice for identifying outliers in a Gaussian-like distribution

111Statistically, data fall within , and thus the rest data are usually regarded as outliers.. Consequently, our MIGNA method involves the following steps.
Step-1: Measure the pixel-wise deformation map based on the L1-distance between the ground-truth SEM image and the layout-to-SEM prediction for the same layout clip, where LithoNet [30] is adopted as the layout-to-SEM predictor.
Step-2: Partition the deformation map into non-overlapping patches, and discard those patches reaching image borders (60 border patches are omitted in our implementation).
Step-3: Annotate a patch as “anomaly” if its local L1-distance exceeds the mean L1-distance of the whole training dataset by three standard deviations or more.
Step-4: Label a layout as “novelty” if it contains at least a predetermined number of abnormal patches.

In this way, we can annotate the layout novelties systematically in a supervised manner.

Iii-C Global-Local (Glocal) Novelty Score

Inspired by residue-based and classification-based novelty detection models, as illustrated in Fig. 2, our method consists of two subnetworks: i) an autoencoder, trained on a collection of layout images, and ii) an attention-guided layout-to-SEM prediction model, SA-LithoNet, comprising the encoder part of LithoNet [30] and a self-attention (SA) module. While the autoencoder characterizes the global shape appearance of a given layout, the SA-LithoNet extracts a latent feature code representing local shape deformations. Then, we evaluate the global-local (Glocal) novelty score based on i) a local anomaly score obtained by using SA-LithoNet (elaborated in Sec. III-D and Sec. III-E), and ii) a global novelty score derived by using the autoencoder (elaborated in Sec. III-F). The local anomaly score

is derived by the proposed MC-SVM (Multi-class SVM) algorithm that estimates the distance from the training dataset to the input in the latent feature space. Meanwhile, the global novelty score

is evaluated based on conventional residue-based novelty detection scheme.

The Glocal novelty score of an input layout is defined as

(1)

where denotes the following normalization process:

(2)

where and denote the mean and standard deviation, respectively. Note that (1) follows the designs in [1, 23], in which a final novelty score is obtained by summing up two normalized independent novelty scores.

Iii-D Attention-Guided Layout-to-SEM Prediction Model

Fig. 3: Block diagram of self-attention module for learning the dependencies between the novelty label and LithoNet features in our proposed attention-guided novelty detection model.

Self-attention (SA) mechanisms [36] like Vision Transformer [7]

and Non-local Neural Networks 

[37] have recently demonstrated their high efficacy in finding spatial long-range dependencies among image patches so that all dependent contextual features can be taken into account together to optimize a specific vision task. In order to extract a latent code carrying wider-range representative features for characterizing the fabrication-induced circuit shape deformation, we propose SA-LithoNet by appending an SA module to the encoder of LithoNet [30], as illustrated in Fig. 3.

By evaluating the dependencies between patches within the latent feature tensor embedded by the encoder of LithoNet, the SA module reorganizes the latent feature and then takes into account a wider-range of layout shape details according to the patch dependencies, as will be described later in (

4). Besides, to reduce the amount of parameters while still achieving a good performance, we adopt the design of SAGAN [43] and replace the fully connected layer with convolutions, based on which the query (), key (), and value () maps are derived from dimension-reduced features. As illustrated in Fig. 3, the self-attention module can be expressed as

(3)

where denotes the

latent feature extracted from input layout pattern

by LithoNet [30], and are respectively the height and width of the feature and is the feature channel-depth, , , , and are convolution kernels for feature channel-depth reduction.

As shown in Fig. 3, the attention map derived after softmax is

(4)

where, , and are tensors, represents the normalized attention (dependency) in the -th query tensor contributed by the -th key tensor. Therefore, the output self-attention feature map is a tensor, with obtained by

(5)

where is the -th sub-tensor within the value map , and denotes the convolution kernels.

As a result, the final feature tensor enhanced by this SA module is

(6)

where is a learnable parameter, initialized as .

The SA module can learn the spatial dependency within the input feature tensor and is then used to derive a tensor more representative than its input for the novelty detection task. Based on the assumption that the local-shape feature of a novel sample should be deviated from the distribution of regular samples, we employ the SA-LithoNet feature in (6) to evaluate the local novelty score based on the proposed Multi-Class SVM (MC-SVM) method described below.

Iii-E Local Shape Deformation-Based Novelty Score

Generally, MC-SVM performs -means clustering to group the training data into feature clusters at first, and then applies one-class SVMs (OC-SVMs) [16, 14] on the feature clusters individually to map regular-sample features into independent hyperspheres. Given a layout sample , we apply MC-SVM to evaluate its novelty score based on the distance between the sample feature and each hypersphere center , where denotes the attention-guided feature embedding formulated in (6). If the minimal unseen-to-center distance exceeds a threshold, then the unseen sample is classified as a novelty.

First, in the -means clustering step of our MC-SVM-based novelty detection, given a training dataset and a set of latent features , we iteratively group all into clusters in the feature space and find the cluster centers by solving the following optimization problem:

(7)

where with denoting the -th cluster, and is the cluster center of .

Then, the second step of MC-SVM is to map the clusters into individual hyperspheres. In this way, the novelty of a test layout pattern can be verified by checking if its mapped feature is far away from all hyperspheres. This hypersphere mapping is similar to the SVDD [16] and OC-SVM [14] algorithms. Specifically, for , all latent features are mapped to a hypersphere centered at by solving the following problem:

(8)
subject to

where denotes the number of samples in , is a slack variable used as a penalty to control the soft-boundary and the hypersphere volume with an outlier tolerance value , denotes the kernel function for mapping, and is the radius of the -th hypersphere. Numerical methods for solving this optimization problem can be found in [16, 5].

As a result, we can define the local novelty score of a newly-designed layout as the minimal distance from its mapped latent code to the nearest hypersphere:

(9)

This local novelty score is evaluated based on the SA-LithoNet latent code. Because LithoNet is a layout-to-SEM pre-simulation model that learns to represent local circuit shape deformations due to a fabrication process, a large implies that a layout sample’s SA-LithoNet latent code tends to be out-of-distribution, and that the pattern may not be predicted well with the current SA-LithoNet model. can thus well serve the purpose of local layout novelty scoring.

Iii-F Autoencoder-based Global Novelty Score

Since the SA-LithoNet latent code is mainly for representing fabrication-induced local shape deformations, to better capture novel layout patterns, we propose to add another complimentary global feature, extracted by an autoencoder, to characterize layout patterns’ global shape structures.

Typically, supervised by the MSE (mean-squared-error) reconstruction loss, an autoencoder learns to embed its input into a lower-dimensional latent code, based on which the autoencoder can reconstruct an image close to its input. Therefore, with the aid of the MSE loss, an autoencoder can capture the global structural characteristics of an image well. The reconstruction error between a newly-designed layout and its reconstructed version yielded by an autoencoder trained on a training dataset can thus be used to define a novelty score indicating the degree of global structural dissimilarity between the input layout and the training dataset. As a result, this global novelty score is defined as

(10)

where is the reconstructed version of yielded by the autoencoder.

Iv Graph Sampling for Active Learning

After identifying novelties in a given pool of newly-designed layouts, we can then fabricate the novel layout patterns on wafers and then collect their layout-SEM pairs to update the layout-to-SEM model (e.g., LithoNet). However, since both fabricating ICs and taking SEM images are costly, given a limited cost budget, we usually can only sample a small set of most representative patterns from the detected novelties for further fabrication. To this end, we propose two sampling strategies: the one-time sampling and the incremental sampling. Each strategy starts from building an initial undirected -NN graph composed of the novel layout designs as the graph nodes by employing the latent code embedded by a pretrained autoencoder. Then, based on the node degree of the initial graph, we further construct a dense graph and a sparse graph

. Finally, we rank the priority of each node via a random-walk method, whose node visiting probability is determined based on the latent code

extracted by SA-LithoNet, to select the most representative nodes accordingly.

Iv-a One-time Sampling

The one-time sampling (OTS) algorithm aims to select the most representative layout clips from a given set of novel layout clips in only one sampling iteration. It primarily consists of two phases: i) data graph construction and ii) sampling by ranking. Its pseudo code is shown in Algorithm 1.

Step-1: Data graph construction
This step first estimates the data manifold, in which layout patterns lie, by building an initial -NN graph based on the latent code extracted by the autoencoder. The resulting -NN graph is a directed graph, where each node has a fixed out-degree of but a variable in-degree, and a directed edge from to represents that is a -nearest neighbor of in terms of the feature distance between and . As a result, in order to obtain an undirected graph specifying the distribution of layout patterns, the adjacency matrix of data graph is obtained by , where is the adjacency matrix of the initial -NN graph.

On top of that characterizes the data manifold of novel layout clips, we further separate all nodes (layouts) in into two groups based on each node’s degree (i.e., the total number of edges of a node to the others) and construct one dense graph and one sparse graph accordingly. We here set as the threshold value for node separation with and denoting respectively the mean and standard deviation of the degrees of all nodes in . Therefore, the nodes with a degree larger than are those lying in somewhere in densely with similar layouts, and these nodes are used to constitute the dense graph . On the contrary, those nodes in with a degree smaller than are used to constitute a sparse graph , where each node represents a layout clip far away from other designs in the feature space. Note that both and are undirected graphs derived from .

Step-2: Sampling priority ranking
Because and contain layout clips belonging to two different kinds of distributions, respectively, the ways to rank the sampling priorities of nodes in each graph ought to be different. Therefore, we devise i) two different schemes for determining starting seeds, and ii) two different weight functions for assessing the random walk probability for and , respectively, to trigger our random-work-based graph exploration algorithm. Then, after exploring a given graph thoroughly, the sampling priorities of nodes in the graph are ranked by their number of total visits.

The starting seeds for and are determined by using the closeness centrality and the eigen-centrality, respectively. This design comes from two reasons. First, because consists of nodes (layouts) which are far from each other in the feature space, a node with a large closeness centrality, i.e., a small mean distance from itself to other nodes, should be representative. Second, since the nodes with higher eigen-centrality (akaeigenvector-centrality) values in a graph make higher impacts to other nodes as they are connected to nodes with higher eigen-centrality values [20], they should be sampled in higher priorities. The eigen-centrality of nodes on is defined by

(11)

where is the eigenvector recording the eigen-centrality, and

is the largest eigenvalue of

, the adjacency matrix of . Also, the closeness of node in is evaluated by

(12)

where is the geodesic distance, i.e., the length of the shortest path on the graph, between nodes and , denotes the neighborhood of , and denotes the number of nodes in the graph.

Next, the graph exploration algorithms for the dense graph and the sparse graph are designed based on breadth first search (BFS) and depth first search (DFS), respectively [20]. This design is based on the properties that i) BFS can avoid visiting a node twice in one exploration, and ii) DFS can explore a graph as far as possible along a branch before backtracking. Therefore, given a collection of starting nodes, we accomplish the graph exploration by assessing each node’s random walk probability, designed for DSF or BFS purpose.

The random walk probability of visiting a node from its adjacent node is defined as

(13)

where is the visiting weight from to , the visiting weight for the dense graph is obtained in (14), and the weight for the sparse graph is defined in (15).

(14)

and

(15)

where denotes the intersection of the one-ring-neighborhoods222The one-ring-neighborhood of a node is the set of all nodes connected with by an edge [18]. of and , is the degree of , is a min-max scaling function which maps an input value into [0, 1]. Moreover,

is the similarity score defined as the difference between i) the cosine similarity between nodes

and and ii) the expected cosine similarity between any two nodes in , that is,

(16)

where denotes the number of 2-combinations of elements in . is the cosine similarity between the latent features extracted by SA-LithoNet as follows:

(17)

Concisely, encourages visiting an adjacent with a distinct feature from for performing DFS on a sparse graph, whereas gives a larger weight to with a similar feature to ’s for performing BFS.

1:Graph , where and are respectively the adjacency matrix and node set of ; Required number of seed samples

; Number of epochs

.
2:Sampling set ;
3:Evaluate an initial seed sore for each node in based on (11) or (12);
4:Take nodes with the Top- largest initial scores to form the starting-seed set ;
5:for all the -th node  do
6:     for  do
7:         ;
8:         for  do
9:              Update all via (14) or (15);
10:              Visit an adjacent randomly based on (13);
11:              ;
12:         end for
13:     end for
14:     For each node in , total the number of visits;
15:     Select nodes of Top- visits to form ;
16:end for
17:return ;
Algorithm 1 One-Time Sampling

Iv-B Incremental Sampling

Unlike the one-time sampling strategy, we further devise an incremental sampling method to split the total resource budget of fabricating unseen layout patterns and taking the corresponding SEM images into a few smaller fine-tuning datasets. In this way, we iteratively update a pretrained pre-simulation model to extend its generalization ability. To this end, the fine-tuning dataset selected in the -th iteration should be able to best update the pre-simulation model fine-tuned on the ()-th fine-tuning dataset.

The proposed incremental sampling method, taking the aforementioned one-time sampling method as its backbone, is an iterative routine with a stop-criterion function measuring the difference between the knowledge learned from two successive iterations under a resource budget. As depicted in Algorithm 2, the main idea of our incremental sampling method is to re-rank the sampling priorities of unselected samples in the unseen-pattern pool after each sampling iteration with the aid of a meticulously-designed node attribute Informativeness-score.

Informativeness-Score: Assuming each sample carries a certain amount of information, say, information volume [6], the Informativeness-Score (-Score) aims to assess the information volume carried by a selected sample in the feature space. Since the information volume covered by a frequently-visited node may usually be shared by its neighboring nodes, to avoid acquiring redundant information, the sampling priority of a frequently-visited node should be lower, and vice versa. Therefore, we first take selected unseen samples as the starting nodes , then evaluate the tendency of individual unselected nodes being visited by random walk, and finally evaluate the -Score based on the tendency values. Algorithm 3 shows the pseudo-code for evaluating the -Score. Note that i)

is a vector whose

-th entry denotes the -Score of the -th node, and ii) in Algorithm 3, denotes the vector whose entries record the cumulative -Score of individual nodes in the -th iteration.

Budget: We exploit a variable , denoting budget, to bound the maximal total visiting distance. This parameter is used to model the maximal information volume a starting node possesses, so is set to be used to construct our -NN graph. This design enables the Algorithm 3 to visit at least nodes while evaluating the -Score.

Step–Cost: We evaluate the cost per move from to its neighbor based on the distance between autoencoder features and the ratio of graph densities between two nodes as follows:

(18)

where

(19)

and

(20)

where denotes the graph density [8] measuring how close on average a node approaches its neighbors in the feature space spanned by , and prevents the same dense region from being selected redundantly by subtracting the weight of the destination node by the weight of the starting node of the current step.

Tendency Weight: The tendency weight used to derive the random-walk probability of the -th sampling iteration is given by

(21)

and

(22)

where

(23)

Note that these two equations are similar to (14) and (15) but use a different factor to balance the influence on the -th node brought by the local neighborhood.

Stop Criterion: The stop criterion aims to check if the selected nodes represent the data graph well. This criterion implies that i) all samples on the graph can be equally visited by random walk, and ii) an additional batch of sampling cannot increase the normalized cumulative -Score of each node. Hence, the stop criterion is defined as follows:

(24)

where is the ratio of the number of selected samples after the -th iteration to the number of total samples, and is the normalized cumulative -Score.

1:Input graph ; Number of epochs ; Number of to-be-selected samples per batch ;
2:Sampling Set ;
3:; ;
4: = OneTimeSampling(, , );
5:Update via (24);
6:while  do
7:     Update -Score: ;
8:     for  do
9:         ;
10:         for  do
11:              Update via (21) or (22);
12:              Visit an randomly based on (13);
13:              ;
14:         end for
15:         For each node in , total the number of visits;
16:     end for
17:     Nodes with Top- visits in ;
18:     ;
19:     ;
20:     Update ;
21:end while
22:return ;
Algorithm 2 Incremental Sampling
1:The input graph ; Sampling set ; Number of epochs ;
2:A global constant used for constructing -NN graph;
3:-score vector of nodes in after the -th iteration.
4:for All nodes in  do
5:     ;
6:     ;
7:     while   do
8:         for epoch  do
9:              Update via (21) or (22);
10:              Visit an randomly based on (13);
11:              Evaluate via (18);
12:              ;
13:              ;
14:         end for
15:     end while
16:end for
17:Total the number of visits, i.e., , to each in ;
18:return
Algorithm 3 Informativeness-Score

V Experimental Results

V-a Dataset and Network Configuration

Two datasets are used in our experiments. Both datasets comprise pair-wise image samples, each consisting of a layout pattern and a corresponding binarized SEM image. Dataset-1 is used as seen data, i.e., the training set, consisting of image pairs among which SEM images have “enclosure” patterns and the other have “bridge” patterns. Meanwhile, Dataset-2, the blind testing set, contains image pairs among which involve “enclosure” patterns and the remaining involve “bridge” patterns. Some examples of enclosure and bridge patterns are illustrated in Fig. 4. With this setting, we assumes that the enclosure patterns in Dataset-1 to be regular ones and the bridge patterns tend to be novelties. Consequently, a successful novelty detection scheme should rate the bridge patterns in Dataset-2 with higher glocal novelty scores.

(a) Expectation (b) Enclosure (c) Bridge
Fig. 4: Illustration of “enclosure” and “bridge” defect patterns in out datasets. (a) An expected defect-free SEM reference contour. (b) Enclosure pattern: enclosure means that the metal line fails to enclose the via due to contour shrinking after fabrication. (c) Bridge pattern: bridge means that unexpected connection between two metal lines occurs.
Encoder
Layer Filter Output Size
Input

Conv-BN-ReLU

Conv-BN-ReLU
Conv-BN-ReLU
Conv-BN-ReLU
Decoder
Upsample
Conv-BN-LReLU
Upsample
Conv-BN-LReLU
Upsample
Conv-BN-LReLU
Upsample
Conv-BN-Sigmoid
TABLE I: Architecture of the autoencoder used in our method

Both the auto-encoder and SA-LithoNet described in Fig. 2 are pretrained on Dataset-1. For SA-LithoNet, we adopt the same LithoNet architecture and train it with the same settings used in [30]. Table I shows the architecture of our auto-encoder, that is trained for epochs via the mean-squared-error (MSE) loss with a learning rate of and a batch-size of .

We conduct two experiment sets to verify the effectiveness of our method. The first set validates whether our novelty detection scheme can accurately identify novel layout patterns, and the second evaluates the effectiveness of our sampling methods in selecting representative novel patterns for updating a pretrained pre-simulation model like LithoNet.

V-B Layout Novelty Detection

In order to show the effectiveness of our layout novelty detection algorithm, we first verify the stability and robustness of our supervised MIGNA method, and then use the MIGNA results as the golden references to evaluate the accuracy of our global-local (glocal) layout novelty scoring.

We use the AUC (Area Under the Curve) score of the ROC (Receiver Operating Characteristic) curve as the objective evaluation metric. The higher the AUC score is, the more accurate the predictions are. Table 

II compares the AUC scores of the detection results with different novelty detection methods based on the MIGNA-annotated references (see Sec. III-B) listed in the left three columns. Here, denotes the threshold of anomaly patches for assessing the layout novelty, and the number of layouts classified as novelty in Dataset-2 decreases with . The proposed SA-Glocal novelty scoring outperforms the SA-LithoNet-based local scoring and autoencoder-based global scoring for all settings.

MIGNA annotations AUC scores
# normal # novel SA-Litho Autoencoder Ours
(Local) (Global) (Glocal)
3 299 701 0.825 0.684 0.862
4 383 617 0.805 0.737 0.861
5 449 551 0.744 0.749 0.846
6 559 441 0.683 0.683 0.756
7 655 345 0.624 0.609 0.676
TABLE II: Comparison of AUC scores with four different methods under different settings of for assessing a novelty layout, where the best results are indicated in bold

Fig. 5 shows the ROC performances on Dataset-2 with different novelty detection methods, including our methods (autoencoder-based, LithoNet-based, and SA-Glocal) and three state-of-the-art novelty detection approaches: LSA [1], GEOM [10], and GOAD [3]. In Fig. 5, the MIGNA-annotated labels are used as pseudo ground-truths to calculate the true positive rates (TPRs) and false positive rates (FPRs). The ROC curves demonstrate that the LithoNet-based local novelty scoring and the autoencoder-based global novelty scoring are complementary to each other, where the former detects much more novelties at low FPRs while the latter can detect almost all novelties at about FPR. Consequently, they can be combined together to boost the performance of novelty detection, as shown via the SA-Glocal curve. Table III lists the AUC scores with the six schemes, showing that the proposed SA-Glocal novelty scoring well beats all the others, achieving a significantly higher AUC score of .

To validate the impacts of various novelty detection schemes on the performance of model update, we randomly select 50, 100, 150, 200, and 250 samples out of the novel patterns detected by each method, together with the original training set, to form the finetune sets of different sizes, and then use them to update the LithoNet model. Fig. 6 compares the inference performances of different fine-tuned LithoNet models, where each point on a curve corresponds to a LithoNet updated by a finetune set containing randomly-selected novel samples detected by one specific novelty detection scheme. The horizontal axis indicates the amount of novel samples randomly picked into the finetune dataset. Note that we here adopt the same similarity metrics used in [30], including C2C-distance (contour-to-contour distance) [31], IOU (intersection over union), SSIM (structural similarity index measure) [39], and pixel-error-rate, to evaluate the performances of the LithoNet models updated on the various fine-tune sets. The results demonstrate that, for a pretrained LithoNet model, the novel samples detected by our SA-Glocal are significantly more informative than those detected by LSA, GOEM, and GOAD, making SA-Glocal outperform the competing methods in terms of all quality metrics for model update. Moreover, Fig. 6 also hints that SA-Glocal scoring is capable of being an active learning oracle because even a very limited amount of randomly-selected novel patterns identified by SA-Glocal can best fine-tune a pretrained LithoNet.

Table IV shows the ablation study of our novelty detection method. Here, LithoNet (OC-SVM) local scoring shows the baseline performance, i.e., an AUC value of 0.720, obtained by feeding the LithoNet latent codes of test layouts into the conventional one-class SVM outlier detector [16]. The MC-SVM-based LithoNet scoring presented in Sec. III-E improves the AUC score to 0.744. Moreover, by combining the autoencoder global feature with the LithoNet local feature (i.e., the Glocal method), the AUC score significantly increases to 0.846. This demonstrates the effectiveness and the robustness of our Glocal design. Finally, the last three rows in Table IV evidence that SA-LithoNet can further boost the representability of the latent feature, particularly making the proposed SA-Glocal method achieve the best performance: 0.932 AUC score.


Fig. 5: ROC curves on Dataset-2 with different novelty detection schemes, including the proposed SA-Glocal score, the SA-LithoNet-based local novelty score, and the autoencoder-based global novelty score, and three representative ones: LSA [1], GEOM [10], and GOAD [3].
Method AUC Score
LSA [1] 0.690
GEOM [10] 0.752
GOAD [3] 0.768
Autoencoder (Global) 0.675
LithoNet (Local) 0.744
SA-Glocal (AE + SA-LithoNet) 0.932
TABLE III: AUC scores of different layout novelty detection methods on Dataset-2, where the best and second-best results are respectively highlighted in bold and underline
Method AUC Score
Autoencoder 0.675
LithoNet (OC-SVM) 0.720
LithoNet (MC-SVM) 0.744
Glocal (AE + LithoNet) 0.846
SA-LithoNet (OC-SVM) 0.857
SA-LithoNet (MC-SVM) 0.864
SA-Glocal (AE + SA-LithoNet) 0.932
TABLE IV: Ablation study: the AUC scores of the proposed method and its variants on Dataset-2
Fig. 6: Inference performance of LithoNet fine-tuned on different finetune sets of different sizes (50, 100, 150, 200, 250, and 300 samples) randomly selected from the novel samples detected by a specific novelty detection scheme.

V-C Performance Evaluation on Active-Learning Schemes

The experiments reported herein are conducted by the following steps. First, for Dataset-2, we label those samples whose SA-Glocal score as novelties, and regular patterns otherwise. Second, we partition Dataset-2 into two subsets: i) a finetune set consisting of a subset with randomly picked regular image pairs and a subset with randomly picked novel pairs, and ii) a blind testing set comprising regular pairs and novel pairs. Third, we update the pretrained LithoNet individually on the finetune sets together with the original training set, selected by different active learning strategies, and then evaluate their model performance on .

Fig. 7: Inference performances of LithoNet fine-tuned on different finetune sets sampled by various active approaches from the detected novelties. Each plot shows the curves of one specific performance metric, including C2C-distance, SSIM, Error-rate, and IOU values.
Fig. 8: Breakdowns of inference performances of LithoNet models updated on different finetune sets, all containing samples. Each bar shows the breakdown of a specific range of C2C-distances obtained on the testing samples. It is obvious that the LithoNet models fine-tuned on the , formed by our incremental sampling scheme or our one-time sampling schemes, have best generalization ability since most testing samples result in C2C-distances smaller than pixels through these two LithoNet models.

Fig. 7 compares the performance with various active learning strategies, where each point on a curve corresponds to a different subset of . We compare the proposed one-time sampling (OTS) and incremental sampling (INS) methods with random sampling and existing active/graph sampling methods, including K-center greedy (Kcenter) [28], RCMS with uncertainly sampling (RCMS) [40], Margin AL [44], informative cluster diversity (ICD) [21], and graph density [8]. The horizontal axis in Fig. 7 indicates the amount of novel samples selected into the finetune set. Moreover, Fig. 8 shows the breakdowns of C2C distance ranges obtained on the testing dataset with different LithoNet models fine-tuned on different -sample333The sampling amount of the proposed incremental sampling method (INS) cannot be assigned in advance and is determined during run-time. The number of samples closest to 100 calculated by INS is 105. fine-tune sets selected by different sampling methods, respectively. The comparison shows that the two LithoNet models, respectively fine-tuned on the two -sample fine-tune sets selected by our proposed one-time sampling and incremental sampling schemes, are best improved. Specifically, the C2C-distances of more than of testing samples are less than 0.45 pixel. In contrast, the two LithoNets, fine-tuned on sample sets selected by RCMS [40] and Graph Density (GD) [8], achieve the closest performances to the OTS and INS fine-tuned models. However, they result in much fewer test samples within the least C2C-distances range (i.e., ]) than our methods’, resulting in the performance differences illustrated in Fig. 7.

We can conclude from Fig. 7 and Fig. 8 the following observations. First, the novelties detected by the Glocal novelty scoring are beneficial for updating a pretrained LithoNet since i) the C2C-distance and error-rate decrease, and ii) the SSIM and IOU scores increase with the number of selected samples. This observation is reasonable because the benefit of adding new data points to a training dataset will diminish if these new data points are significantly similar to existing samples in the training dataset, as revealed in [6]. Second, while the proposed one-time sampling (OTS) method outperforms the competing methods in all aspects, the proposed incremental sampling (INS) method has best cures and can reach the performance plateau when only sampling 105/460 of the data. This means that our novelty detection together with graph sampling can effectively accomplish the goal of active learning from novel layout clips.

Vi Conclusions

In this paper, we proposed a deep learning-based layout novelty detection method that can work in the absence of ground-truth SEM images. The proposed method architecturally consists of two subnetworks, a pretrained autoencoder and a pretrained layout-to-SEM simulator. The former subnetwork learns to capture global shape structures of training (layout) samples so that it can be used to derive the autoencoder-based global novelty score. Besides, the latter subnetwork aims to extract a latent code representing the fabrication-induced local shape deformation of a given layout so that the extracted latent code can be used to evaluate an attention-guided local novelty score. These two novelty scores together form the proposed Glocal layout novelty measure. We have also proposed two graph sampling-based active-learning strategies, one-time sampling and incremental sampling, to select a much reduced set of representative layouts most worthy of further fabrication for acquiring the ground-truth SEM images, in an on-a-budget environment. Our experimental results demonstrate that the proposed method can detect novel layout patterns effectively, and the identified layout novelties can be used to improve the generalization capability of a learning-based layout-to-SEM pre-simulation model.

References

  • [1] D. Abati, A. Porrello, S. Calderara, and R. Cucchiara (2019) Latent space autoregression for novelty detection. In

    Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.

    ,
    pp. 481–490. Cited by: §II-B, §III-C, Fig. 5, §V-B, TABLE III.
  • [2] J. An and S. Cho (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE 2 (1), pp. 1–18. Cited by: §II-B.
  • [3] L. Bergman and Y. Hoshen (2020) Classification-based anomaly detection for general data. arXiv preprint arXiv:2005.02359. Cited by: §II-B, Fig. 5, §V-B, TABLE III.
  • [4] S. Calderara, U. Heinemann, A. Prati, R. Cucchiara, and N. Tishby (2011) Detecting anomalies in people’s trajectories using spectral graph analysis. Comput. Vis. Image Understand. 115 (8), pp. 1099–1111. Cited by: §II-B.
  • [5] W. Chang, C. Lee, and C. Lin (2013) A revisit to support vector data description. Dept. Comput. Sci., Nat. Taiwan Univ., Taipei, Taiwan, Tech. Rep. Cited by: §III-E.
  • [6] Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie (2019) Class-balanced loss based on effective number of samples. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9268–9277. Cited by: §IV-B, §V-C.
  • [7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. (2021) An image is worth 16x16 words: transformers for image recognition at scale. In Proc. Int. Conf. Learn. Rep., Cited by: §III-D.
  • [8] S. Ebert, M. Fritz, and B. Schiele (2012) Ralf: a reinforced active learning formulation for object class recognition. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 3626–3633. Cited by: §IV-B, §V-C.
  • [9] H. Fan, F. Zhang, and Z. Li (2020) AnomalyDAE: dual autoencoder for anomaly detection on attributed networks. In Proc. IEEE Int. Conf. Acoustics Speech Signal Process., pp. 5685–5689. Cited by: §II-B.
  • [10] I. Golan and R. El-Yaniv (2018) Deep anomaly detection using geometric transformations. arXiv preprint arXiv:1805.10917. Cited by: §II-B, Fig. 5, §V-B, TABLE III.
  • [11] N. Japkowicz, C. Myers, M. Gluck, et al. (1995) A novelty detection approach to classification. In Proc. Int. Joint Conf. Artif. Intell., Vol. 1, pp. 518–523. Cited by: §II-B.
  • [12] M. Kliger and S. Fleishman (2018) Novelty detection with gan. arXiv preprint arXiv:1802.10560. Cited by: §II-B.
  • [13] D. D. Lewis (1995) A sequential algorithm for training text classifiers: corrigendum and additional data. In ACM SIGIR Forum, Vol. 29, pp. 13–19. Cited by: §II-C.
  • [14] K. Li, H. Huang, S. Tian, and W. Xu Improving one-class svm for anomaly detection. In Proc. Int. Conf. Mach. Learn. Cybern., Vol. 5, pp. 3077–3081. Cited by: §III-E, §III-E.
  • [15] Y. Lin, M. Li, Y. Watanabe, T. Kimura, T. Matsunawa, S. Nojima, and D. Z. Pan (2018)

    Data efficient lithography modeling with transfer learning and active data selection

    .
    IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 38 (10), pp. 1900–1913. Cited by: §II-C.
  • [16] B. Liu, Y. Xiao, L. Cao, Z. Hao, and F. Deng (2013) Svdd-based outlier detection on uncertain data. Knowledge and Inf. Syst. 34 (3), pp. 597–618. Cited by: §II-B, §II-B, §III-E, §III-E, §V-B.
  • [17] M. Mathieu (2015) Masked autoencoder for distribution estimation. Cited by: §II-B.
  • [18] M. Meyer, M. Desbrun, P. Schröder, and A. Barr (2003) Discrete differential-geometry operators for triangulated 2-manifolds. In Visualization and mathematics III, H.-C. Hege and K. Polthier (Eds.), pp. 35–57. Cited by: footnote 2.
  • [19] D. Miljković (2010) Review of novelty detection methods. In Proc. Int. Convention MIPRO, pp. 593–598. Cited by: §II-B.
  • [20] M. Newman (2010) Networks: an introduction. Oxford University Press. Cited by: §IV-A, §IV-A.
  • [21] S. Paul, J. Bappy, and A. Roy-Chowdhury (2016)

    Efficient selection of informative and diverse training samples with applications in scene classification

    .
    In Proc. IEEE Inf. Conf. Image Process., pp. 494–498. Cited by: §V-C.
  • [22] P. Perera, R. Nallapati, and B. Xiang (2019) Ocgan: one-class novelty detection using gans with constrained latent representations. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 2898–2906. Cited by: §II-B.
  • [23] S. Pidhorskyi, R. Almohsen, D. A. Adjeroh, and G. Doretto (2018) Generative probabilistic novelty detection with adversarial autoencoders. arXiv preprint arXiv:1807.02588. Cited by: §II-B, §III-C.
  • [24] M. Pimentel, D. Clifton, L. Clifton, and L. Tarassenko (2014) A review of novelty detection. Signal Process. 99, pp. 215–249. Cited by: §I.
  • [25] M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli (2018) Adversarially learned one-class classifier for novelty detection. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 3379–3388. Cited by: §II-B.
  • [26] M. Salehi, H. Mirzaei, D. Hendrycks, Y. Li, M. Rohban, and M. Sabokrou (2021) A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: solutions and future challenges. arXiv preprint arXiv:2110.14051. Cited by: §I.
  • [27] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017)

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

    .
    In Proc. Int. Conf. Inf. Process. Med. Imag., pp. 146–157. Cited by: §II-B.
  • [28] O. Sener and S. Savarese (2018) Active learning for convolutional neural networks: a core-set approach. In Proc. Int. Conf. Learn. Rep., Cited by: §V-C.
  • [29] B. Settles (2009) Active learning literature survey. Cited by: §II-C.
  • [30] H. Shao, C. Peng, J. Wu, C. Lin, S. Fang, P. Tsai, and Y. Liu (2021-05) From ic layout to die photo: A CNN-based data-driven approach. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 40 (5), pp. 957–970. Cited by: §I, §I, §I, §II-A, §III-A, §III-A, §III-B, §III-B, §III-B, §III-C, §III-D, §III-D, §V-A, §V-B.
  • [31] H. Shao Contour-to-contour distance. Note: https://www.mathworks.com/matlabcentral/fileexchange/75551-contour-to-contour-distance Cited by: §V-B.
  • [32] Y. L. Sung, S. Hsieh, S. Pei, and C. Lu (2019) Difference-seeking generative adversarial network–unseen sample generation. In Proc. Int. Conf. Learn. Rep., Cited by: §II-B.
  • [33] J. Tack, S. Mo, J. Jeong, and J. Shin (2020) Csi: novelty detection via contrastive learning on distributionally shifted instances. arXiv preprint arXiv:2007.08176. Cited by: §II-B.
  • [34] G. R. Terrell and D. W. Scott (1992) Variable kernel density estimation. Annals of Statistics, pp. 1236–1265. Cited by: §II-B, §II-B.
  • [35] S. Tong and D. Koller (2001) Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2 (Nov), pp. 45–66. Cited by: §II-C.
  • [36] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Adv. Neural Inf. Process. Syst., pp. 5998–6008. Cited by: §III-D.
  • [37] X. Wang, R. Girshick, A. Gupta, and K. He (2018) Non-local neural networks. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7794–7803. Cited by: §III-D.
  • [38] X. Wang, B. Jin, Y. Du, P. Cui, and Y. Yang (2020) One-class graph neural networks for anomaly detection in attributed networks. arXiv preprint arXiv:2002.09594. Cited by: §II-B.
  • [39] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13 (4), pp. 600–612. Cited by: §V-B.
  • [40] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang (2003) Representative sampling for text classification using support vector machines. In Proc. European Conf. Inf. Retr., pp. 393–407. Cited by: §V-C.
  • [41] H. Yang, S. Li, Z. Deng, Y. Ma, B. Yu, and E. F. Young (2019) GAN-OPC: mask optimization with lithography-guided generative adversarial nets. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 39 (10), pp. 2822–2834. Cited by: §I, §II-A.
  • [42] W. Ye, M. B. Alawieh, Y. Lin, and D. Z. Pan (2019) LithoGAN: end-to-end lithography modeling with generative adversarial networks. In Proc. ACM/IEEE Design Autom. Conf., pp. 1–6. Cited by: §I, §II-A.
  • [43] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena (2019) Self-attention generative adversarial networks. In Proc. Int. Conf. Mach. Learn., pp. 7354–7363. Cited by: §III-D.
  • [44] J. Zhou and S. Sun (2014) Improved margin sampling for active learning. In Proc. Chinese Conf. Pattern Recognit., pp. 120–129. Cited by: §V-C.
  • [45] Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang (2017) Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7340–7351. Cited by: §III-A.
  • [46] C. Zhuo, K. Agarwal, D. Blaauw, and D. Sylvester (2010) Active learning framework for post-silicon variation extraction and test cost reduction. In Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, pp. 508–515. Cited by: §II-C.