I Introduction
After integrated circuit (IC) circuit design and layout, it needs a multi-step sequence of photolithographic and chemical processing steps to fabricate an IC wafer. Because of the great exposure variations in the lithography procedure and the chemical reactions in the etching procedure during fabrication, the lithography and etching procedures together result in nonlinear shape deformation of a designed IC pattern, which is usually too complicated to model. This fact therefore urges the development of deep learning-based pre-simulation models, such as GAN-OPC [41], LithoGAN [42], and LithoNet-OPCNet [30], to handle issues like i) lithography simulation for predicting the shapes of fabricated circuit based on a given IC layout along with IC fabrication parameters, and ii) mask optimization for predicting the best mask to compensate for the fabrication-induced shape deformations. However, deep learning-based models and their training/updating processes usually rely on tremendous training data. The selection of training data consequently becomes a considerable issue because whether a training dataset or a fine-tune dataset (with novel patterns) is informative largely affects the generalization ability of a learning-based model.

However, how to collect an appropriate training dataset and sample a fine-tune dataset, also known as a development set, from the IC layout design and fabrication processes is practically a very complicated issue. This complexity is due to the following two aspects. First, given a deep learning model pretrained on an initial training dataset, it may still need to be fine-tuned on another development set so that it can be generalized to those samples that are unseen in the initial training dataset. In order to determine a proper development dataset, one has to assess a layout’s degree of novelty by checking whether this layout’s SEM (Scanning Electron Microscope) image shape can be accurately predicted by the pretrained pre-simulation model even in the absence of this layout’s ground-truth SEM image. Second, it is unaffordable to exhaustively collect the aerial images of each layout under different fabrication parameter settings as training ground-truths because of high costs. For example, LithoNet [30] learns the layout-to-SEM contour correspondence and the effects of fabrication parameters from a collection of layout-SEM image pairs. If LithoNet needs to learn how layout patterns deforms under different fabrication parameter settings, one should fabricate IC circuits to obtain a comprehensive training set covering all combinations of to-be-learned conditions. Such fabrication processes required to collect training data are too time-consuming and costly. Consequently, these two aspects bring the demand for novelty detection and active learning to both the maintenance of a learning-based pre-simulation model and the selection of a development set, as illustrated in Fig. 1.
To address this issue, we propose an SEM-free (i.e., ground-truth-free) scheme to detect novel layout patterns, whose SEM images are worthy of collecting via the costly IC fabrication process, for informatively updating a pretrained DNN-based lithography simulator. That is, given i) a deep-learning-based pre-simulation model, e.g., LithoNet [30], and ii) a pool of newly-designed IC layout clips, the proposed method aims to identify layout patterns, which are novel and informative to increase the generalization ability of the pretrained model. This scenario leads to two considerations. First, the method should be able to identify novel (unseen) layout designs whose fabrication-induced shape deformation cannot be well predicted by the model pretrained on an initial training set of layout-SEM image pairs. The difficulty laying behind the first consideration is that, during deployment, all inputs are layout patterns, and therefore our method needs to detect layout novelty by learning the relationship between layouts and their predicted layout-to-SEM deformation maps. Second, due to the high costs of IC fabrication and taking SEM images, the method should be able to select a reduced set of most informative layouts to update the pretrained model for the sake of budget efficiency. As a result, only the selected set of layouts will be fabricated to acquire their layout-SEM pairs for fine-tuning the pretrained model. This consideration also hints at the requirement of a sampling process for active learning. Because the solution of the second consideration, i.e., the capability of selecting data samples that can optimally represent the entire training data domain, highly depends on that of the first, we take into account both considerations to propose novel methods for novelty detection and active learning. This work has the following major contributions.
Our novelty detection method is the first to learn global-local features for identifying which layouts are novel and worthy of further fabrication, in contrast to existing approaches that detect novelties based on solely global features [24, 26]. Specifically, we devise two subnetworks to derive two novelty scores—one for measuring global structure dissimilarity and the other for capturing local deformation—complementary to each other. In this way, our method can efficiently collect informative layout–SEM image pairs, which are necessary for fine-tuning learning-based layout-to-SEM prediction or mask-optimization models like LithoNet/OPCNet to keep them updated with newly designed data.
During deployment, our method can detect novel layout patterns in the absence of the ground-truth SEM images of target layouts’ fabricated circuits. Therefore, our model not only meets the practical field requirements for layout pre-inspection, but also functions as an active learning oracle.
We further propose two effective graph sampling-based active-learning strategies, namely one-time sampling and incremental sampling, to sample a much reduced set of representative layouts, which are most worthy of further fabrication for acquiring their reference SEM images, in an on-a-budget environment.
The remainder of this paper is organized as follows. We review related literature in Sec. II. The proposed layout novelty detection method is detailed in Sec. III. Sec. IV presents our proposed active learning strategies. Sec. V demonstrates and discusses our experimental results. Finally, we draw our conclusion in Sec. VI.
Ii Related Work
Ii-a Learning-Based Lithography Pre-simulation Models
Several learning-based lithography pre-simulation models were proposed for topics such as lithography simulation and mask optimization. In order to save the computational resources, Yang et al. proposed the GAN-OPC method [41] to facilitate the mask optimization process. GAN-OPC aims at creating quasi-optimal masks for given target circuit patterns by learning target-mask mappings. GAN-OPC can generate high-quality masks and thus ensure good printability while requiring reduced normal OPC steps. In addition, Ye et al. devised LithoGAN for lithography simulation [42]. LithoGAN is a GAN-based end-to-end lithography modeling framework that maps input mask patterns directly to the output resist patterns, making it capable of predicting resist patterns accurately while achieving significant speedup compared with conventional lithography simulation methods. Recently, our proposed LithoNet-OPCNet framework [30], successfully addresses the lithography simulation and mask optimization problems simultaneously in an end-to-end learning manner. Specifically, LithoNet, trained on a comprehensive set of layout–SEM image pairs, can accurately predict for an input layout pattern the fabrication-induced shape distortion, OPCNet [30], trained with the guidance provided by a pretrained LithoNet, aims to predict the optical proximity corrected (OPC) photo-mask pattern of an input layout.
Ii-B Novelty Detection
Novelty detection is the procedure used to identify if a data sample is hitherto unknown. It is typically modeled as a one-class classification problem, in which a novelty detector is trained on single-class training samples which are all supposed to be seen normal ones. As a result, the detector can determine whether an input testing sample is dissimilar to the seen training samples in terms of a given distance metric [16, 34]
or a loss function
[11]during deployment. Though novelty detection is closely related to anomaly/outlier detection, their scenarios are significantly different. Specifically, anomaly/outlier detection methods usually learn to find abnormal samples in a given reference dataset, whereas the reference data used to train a novelty detection model are assumed to be
unpolluted and involve only normal regular samples. Note that a novelty detection method, like other anomaly/outlier detection methods, usually maps its input data to a novelty score so that an appropriate threshold can be defined accordingly to tell novel samples (outliers) and regular ones (inliers) apart [19].Novelty detection methods have found applications in video surveillance [4, 17], medical imaging [27], abnormal event detection for attributed network [9, 38], etc. In general, common approaches based on probabilistic-based models, such as one-class SVMs [16]
[34], can achieve good performances on handling low-dimensional features. However, these methods may not apply to high-dimensional data well,
e.g., images in computer vision tasks. Hence, two sorts of CNN (convolutional neural network) based methods have been proposed to address this problem. One sort learns to generate a reconstructed image and then evaluate an abnormal score according to the difference between the input and the reconstructed images
[2, 12], and the other learns to embed a latent structural feature of the input and then derive an abnormal score based on the extracted structural feature [1].For example, Sabokrou et al
. proposed to train an auto-encoder along with a discriminator, conceptually a classifier, in an adversarial manner based on the reconstruction error, and then to determine whether an input is novel by the discriminator
[25]. Similarly, Perera et al. proposed OCGAN [22] to solve the one-class novelty detection problem by learning the latent representations of within-class examples via a denoising auto-encoder network. Moreover, DSGAN was proposed in [32] for synthesizing novel samples surrounding real training data such that the decision border between the regular data and novelties can be determined effectively by typical models. Besides, Pidhorskyi et al. devised an architecture consisting of an auto-encoder and a discriminator for anomaly detection
[23]. Their model is trained on top of a double min-max-game framework that iteratively optimizes the distribution of latent codes extracted by the auto-encoder and the fidelity of the reconstructed images.Furthermore, classification-based novelty/anomaly detection models can generally be boosted via a self-supervised mechanism [10, 3, 33]. The applications of these methods are, however, limited by the assumption that their pre-processing strategies, e.g., rotation, random cropping, and geometric transformations, cannot alter the ground-truth labels (i.e., class information) of the training dataset. As for IC fabrication, a rotated or geometrically-transformed layout pattern will result in a different printed images because the processing results of a stepper/scanner in the x- and y-directions are asymmetric, and hence the self-supervised mechanism is usually not applicable.
Ii-C Active Learning
Active learning refers to cases in which a learning algorithm can assess the necessity of labeling an unlabeled sample by interactively querying an oracle—usually a pre-trained model or a user-specified metric function—about unlabeled samples’ importance [29]. A fundamental concept is uncertainty-based selection, through which an oracle recommends (unlabeled) data of high-uncertainty for labeling and disregards those high-confidence ones [13, 35]. This sort of methods are, however, sensitive to outliers. Recently, several active learning algorithms were devised for CAD/VLSI applications, such as the methods in [15, 46]. However, all these active learning techniques need to collaborate with a reliable oracle. Hence, we aim in this paper to develop an oracle that can assess the novelty of an unseen layout in an SEM-free environment by learning the knowledge contained in pairwise training samples.
Iii Global-Local Shape-Based Novelty Detection

Iii-a Overview
Due to limited labeling resources, one often adopts a sampling strategy to select a small set of most informative unlabeled novel samples for further labeling routine and then use the newly labeled samples to update the learned model in an active learning manner. As reported in [45], while regular samples, e.g., layout clips, with characteristics similar to that of source training data can usually be predicted fairly well by a model pre-trained on the same source training set, data with unseen patterns, e.g., novel layout clips, are potentially able to improve a pretrained model and thus worth a fabrication to acquire their ground-truth SEM images. Hence, under the premise of saving the costs of fabricating excessive training samples and acquiring their SEM images, our goal here is to identify the most informative unseen layout clips, which are worth a fabrication for acquiring their SEM images to effectively update a pretrained model, from a pool of newly-designed IC layout clips.
To tackle this active learning problem for an IC fabrication pre-simulation model like LithoNet [30], we aim to design a layout novelty detection scheme that can work in the absence of ground-truth SEM images during the inference stage. It can distinguish novel layout clips, whose SEM images cannot be accurately predicted by a pre-simulation model (e.g., LithoNet [30]), from regular layouts whose SEM images can be well predicted. To this end, we elaborate first in Sec. III-B our supervised scheme to label novel layout patterns objectively by annotating novel regions on layout clips with the aid of ground-truth SEM images. We then describe our unsupervised layout novelty detection scheme, namely Glocal novelty score, in Sec. III-C–III-F.
Fig. 2 shows the architecture of our proposed Glocal (global-local) method that consists of two primary components, i.e., an SA-LithoNet and an autoencoder. Suppose that a novel layout should result from the innovation of global planning, the change of local planning, or both. Our method exploits i) LithoNet, a pre-simulation model of fabrication-induced local shape deformation [30], for capturing local shape features with the aid of a self-attention (SA) module, and ii) an autoencoder for characterizing global shape properties. The SA-LithoNet is architecturally the encoder part of a pretrained LithoNet followed by a self-attention module. This design employs SA-LithoNet to extract a feature representing local layout-to-SEM deformations within attended regions, identified by the self-attention module supervised by the novelty labels. Based on the assumption that, the local-shape feature of a novel sample should be deviated from the distribution of regular samples, we employ the SA-LithoNet feature for local novelty scoring via multi-class SVM (MC-SVM) classification. Besides that, we use the reconstruction error with the autoencoder, representing the global shape dissimilarity, as the global novelty score. As a result, we combine the local and global novelty scores to obtain the Glocal novelty score.
Iii-B Model Inconsistency-Guided Novelty Annotation
Because manually annotating novel patterns in a rich collection of layout clips is nontrivial, even for an experienced engineer, we devise first a supervised mechanism for annotating potential novelties on layout patterns to train and evaluate our novelty detection model. This mechanism aims to find a novel layout pattern based on the inconsistency between the pattern’s ground-truth SEM image and the corresponding layout-to-SEM prediction yielded by LithoNet [30]. Hence, we name this mechanism Model Inconsistency-Guided Novelty Annotation (MIGNA).
MIGNA aims to identify those local regions where the shape contours of the layout-to-SEM predictions [30]
significantly deviate from their counterparts in the corresponding ground-truth SEM images. Such deviations imply that the pretrained layout-to-SEM prediction model may not have learned from enough similar training layout patterns yet. Common sorts of shape deviations may include, for example, unexpected abnormal patterns such as enclosures, neckings, and bridges. One possible cause of these deviations is the unexpected diffraction, usually induced by the layout arrangements around abnormal patterns, during the lithography process, which makes the same layout pattern result in different SEM patterns with neighborhood-related shape variants. Consequently, when a learning-based pre-simulation model like LithoNet is trained on a training dataset containing insufficient similar patterns, its shape predictions tend to deviate from the corresponding ground-truths. Such deviations should be considered as anomalies, i.e., layout novelties, due to insufficient training patterns.
To identify unexpected shape deformations due to inaccurate predictions, we set a threshold of three standard deviations from the mean L1-norm of the pixel-wise differences between the layout-to-SEM predictions and their ground-truth SEM images. Three standard deviations from the mean is a common cut-off in practice for identifying outliers in a Gaussian-like distribution
111Statistically, data fall within , and thus the rest data are usually regarded as outliers.. Consequently, our MIGNA method involves the following steps.Step-1: Measure the pixel-wise deformation map based on the L1-distance between the ground-truth SEM image and the layout-to-SEM prediction for the same layout clip, where LithoNet [30] is adopted as the layout-to-SEM predictor.
Step-2: Partition the deformation map into non-overlapping patches, and discard those patches reaching image borders (60 border patches are omitted in our implementation).
Step-3: Annotate a patch as “anomaly” if its local L1-distance exceeds the mean L1-distance of the whole training dataset by three standard deviations or more.
Step-4: Label a layout as “novelty” if it contains at least a predetermined number of abnormal patches.
In this way, we can annotate the layout novelties systematically in a supervised manner.
Iii-C Global-Local (Glocal) Novelty Score
Inspired by residue-based and classification-based novelty detection models, as illustrated in Fig. 2, our method consists of two subnetworks: i) an autoencoder, trained on a collection of layout images, and ii) an attention-guided layout-to-SEM prediction model, SA-LithoNet, comprising the encoder part of LithoNet [30] and a self-attention (SA) module. While the autoencoder characterizes the global shape appearance of a given layout, the SA-LithoNet extracts a latent feature code representing local shape deformations. Then, we evaluate the global-local (Glocal) novelty score based on i) a local anomaly score obtained by using SA-LithoNet (elaborated in Sec. III-D and Sec. III-E), and ii) a global novelty score derived by using the autoencoder (elaborated in Sec. III-F). The local anomaly score
is derived by the proposed MC-SVM (Multi-class SVM) algorithm that estimates the distance from the training dataset to the input in the latent feature space. Meanwhile, the global novelty score
is evaluated based on conventional residue-based novelty detection scheme.The Glocal novelty score of an input layout is defined as
(1) |
where denotes the following normalization process:
(2) |
where and denote the mean and standard deviation, respectively. Note that (1) follows the designs in [1, 23], in which a final novelty score is obtained by summing up two normalized independent novelty scores.
Iii-D Attention-Guided Layout-to-SEM Prediction Model

Self-attention (SA) mechanisms [36] like Vision Transformer [7]
and Non-local Neural Networks
[37] have recently demonstrated their high efficacy in finding spatial long-range dependencies among image patches so that all dependent contextual features can be taken into account together to optimize a specific vision task. In order to extract a latent code carrying wider-range representative features for characterizing the fabrication-induced circuit shape deformation, we propose SA-LithoNet by appending an SA module to the encoder of LithoNet [30], as illustrated in Fig. 3.By evaluating the dependencies between patches within the latent feature tensor embedded by the encoder of LithoNet, the SA module reorganizes the latent feature and then takes into account a wider-range of layout shape details according to the patch dependencies, as will be described later in (
4). Besides, to reduce the amount of parameters while still achieving a good performance, we adopt the design of SAGAN [43] and replace the fully connected layer with convolutions, based on which the query (), key (), and value () maps are derived from dimension-reduced features. As illustrated in Fig. 3, the self-attention module can be expressed as(3) |
where denotes the
latent feature extracted from input layout pattern
by LithoNet [30], and are respectively the height and width of the feature and is the feature channel-depth, , , , and are convolution kernels for feature channel-depth reduction.As shown in Fig. 3, the attention map derived after softmax is
(4) |
where, , and are tensors, represents the normalized attention (dependency) in the -th query tensor contributed by the -th key tensor. Therefore, the output self-attention feature map is a tensor, with obtained by
(5) |
where is the -th sub-tensor within the value map , and denotes the convolution kernels.
As a result, the final feature tensor enhanced by this SA module is
(6) |
where is a learnable parameter, initialized as .
The SA module can learn the spatial dependency within the input feature tensor and is then used to derive a tensor more representative than its input for the novelty detection task. Based on the assumption that the local-shape feature of a novel sample should be deviated from the distribution of regular samples, we employ the SA-LithoNet feature in (6) to evaluate the local novelty score based on the proposed Multi-Class SVM (MC-SVM) method described below.
Iii-E Local Shape Deformation-Based Novelty Score
Generally, MC-SVM performs -means clustering to group the training data into feature clusters at first, and then applies one-class SVMs (OC-SVMs) [16, 14] on the feature clusters individually to map regular-sample features into independent hyperspheres. Given a layout sample , we apply MC-SVM to evaluate its novelty score based on the distance between the sample feature and each hypersphere center , where denotes the attention-guided feature embedding formulated in (6). If the minimal unseen-to-center distance exceeds a threshold, then the unseen sample is classified as a novelty.
First, in the -means clustering step of our MC-SVM-based novelty detection, given a training dataset and a set of latent features , we iteratively group all into clusters in the feature space and find the cluster centers by solving the following optimization problem:
(7) |
where with denoting the -th cluster, and is the cluster center of .
Then, the second step of MC-SVM is to map the clusters into individual hyperspheres. In this way, the novelty of a test layout pattern can be verified by checking if its mapped feature is far away from all hyperspheres. This hypersphere mapping is similar to the SVDD [16] and OC-SVM [14] algorithms. Specifically, for , all latent features are mapped to a hypersphere centered at by solving the following problem:
(8) | |||||
subject to |
where denotes the number of samples in , is a slack variable used as a penalty to control the soft-boundary and the hypersphere volume with an outlier tolerance value , denotes the kernel function for mapping, and is the radius of the -th hypersphere. Numerical methods for solving this optimization problem can be found in [16, 5].
As a result, we can define the local novelty score of a newly-designed layout as the minimal distance from its mapped latent code to the nearest hypersphere:
(9) |
This local novelty score is evaluated based on the SA-LithoNet latent code. Because LithoNet is a layout-to-SEM pre-simulation model that learns to represent local circuit shape deformations due to a fabrication process, a large implies that a layout sample’s SA-LithoNet latent code tends to be out-of-distribution, and that the pattern may not be predicted well with the current SA-LithoNet model. can thus well serve the purpose of local layout novelty scoring.
Iii-F Autoencoder-based Global Novelty Score
Since the SA-LithoNet latent code is mainly for representing fabrication-induced local shape deformations, to better capture novel layout patterns, we propose to add another complimentary global feature, extracted by an autoencoder, to characterize layout patterns’ global shape structures.
Typically, supervised by the MSE (mean-squared-error) reconstruction loss, an autoencoder learns to embed its input into a lower-dimensional latent code, based on which the autoencoder can reconstruct an image close to its input. Therefore, with the aid of the MSE loss, an autoencoder can capture the global structural characteristics of an image well. The reconstruction error between a newly-designed layout and its reconstructed version yielded by an autoencoder trained on a training dataset can thus be used to define a novelty score indicating the degree of global structural dissimilarity between the input layout and the training dataset. As a result, this global novelty score is defined as
(10) |
where is the reconstructed version of yielded by the autoencoder.
Iv Graph Sampling for Active Learning
After identifying novelties in a given pool of newly-designed layouts, we can then fabricate the novel layout patterns on wafers and then collect their layout-SEM pairs to update the layout-to-SEM model (e.g., LithoNet). However, since both fabricating ICs and taking SEM images are costly, given a limited cost budget, we usually can only sample a small set of most representative patterns from the detected novelties for further fabrication. To this end, we propose two sampling strategies: the one-time sampling and the incremental sampling. Each strategy starts from building an initial undirected -NN graph composed of the novel layout designs as the graph nodes by employing the latent code embedded by a pretrained autoencoder. Then, based on the node degree of the initial graph, we further construct a dense graph and a sparse graph
. Finally, we rank the priority of each node via a random-walk method, whose node visiting probability is determined based on the latent code
extracted by SA-LithoNet, to select the most representative nodes accordingly.Iv-a One-time Sampling
The one-time sampling (OTS) algorithm aims to select the most representative layout clips from a given set of novel layout clips in only one sampling iteration. It primarily consists of two phases: i) data graph construction and ii) sampling by ranking. Its pseudo code is shown in Algorithm 1.
Step-1: Data graph construction
This step first estimates the data manifold, in which layout patterns lie, by building an initial -NN graph based on the latent code extracted by the autoencoder.
The resulting -NN graph is a directed graph, where each node has a fixed out-degree of but a variable in-degree, and a directed edge from to represents that is a -nearest neighbor of in terms of the feature distance between and .
As a result, in order to obtain an undirected graph specifying the distribution of layout patterns, the adjacency matrix of data graph is obtained by , where is the adjacency matrix of the initial -NN graph.
On top of that characterizes the data manifold of novel layout clips, we further separate all nodes (layouts) in into two groups based on each node’s degree (i.e., the total number of edges of a node to the others) and construct one dense graph and one sparse graph accordingly. We here set as the threshold value for node separation with and denoting respectively the mean and standard deviation of the degrees of all nodes in . Therefore, the nodes with a degree larger than are those lying in somewhere in densely with similar layouts, and these nodes are used to constitute the dense graph . On the contrary, those nodes in with a degree smaller than are used to constitute a sparse graph , where each node represents a layout clip far away from other designs in the feature space. Note that both and are undirected graphs derived from .
Step-2: Sampling priority ranking
Because and contain layout clips belonging to two different kinds of distributions, respectively, the ways to rank the sampling priorities of nodes in each graph ought to be different.
Therefore, we devise i) two different schemes for determining starting seeds, and ii) two different weight functions for assessing the random walk probability for and , respectively, to trigger our random-work-based graph exploration algorithm. Then, after exploring a given graph thoroughly, the sampling priorities of nodes in the graph are ranked by their number of total visits.
The starting seeds for and are determined by using the closeness centrality and the eigen-centrality, respectively. This design comes from two reasons. First, because consists of nodes (layouts) which are far from each other in the feature space, a node with a large closeness centrality, i.e., a small mean distance from itself to other nodes, should be representative. Second, since the nodes with higher eigen-centrality (akaeigenvector-centrality) values in a graph make higher impacts to other nodes as they are connected to nodes with higher eigen-centrality values [20], they should be sampled in higher priorities. The eigen-centrality of nodes on is defined by
(11) |
where is the eigenvector recording the eigen-centrality, and
is the largest eigenvalue of
, the adjacency matrix of . Also, the closeness of node in is evaluated by(12) |
where is the geodesic distance, i.e., the length of the shortest path on the graph, between nodes and , denotes the neighborhood of , and denotes the number of nodes in the graph.
Next, the graph exploration algorithms for the dense graph and the sparse graph are designed based on breadth first search (BFS) and depth first search (DFS), respectively [20]. This design is based on the properties that i) BFS can avoid visiting a node twice in one exploration, and ii) DFS can explore a graph as far as possible along a branch before backtracking. Therefore, given a collection of starting nodes, we accomplish the graph exploration by assessing each node’s random walk probability, designed for DSF or BFS purpose.
The random walk probability of visiting a node from its adjacent node is defined as
(13) |
where is the visiting weight from to , the visiting weight for the dense graph is obtained in (14), and the weight for the sparse graph is defined in (15).
(14) |
and
(15) |
where denotes the intersection of the one-ring-neighborhoods222The one-ring-neighborhood of a node is the set of all nodes connected with by an edge [18]. of and , is the degree of , is a min-max scaling function which maps an input value into [0, 1]. Moreover,
is the similarity score defined as the difference between i) the cosine similarity between nodes
and and ii) the expected cosine similarity between any two nodes in , that is,(16) |
where denotes the number of 2-combinations of elements in . is the cosine similarity between the latent features extracted by SA-LithoNet as follows:
(17) |
Concisely, encourages visiting an adjacent with a distinct feature from for performing DFS on a sparse graph, whereas gives a larger weight to with a similar feature to ’s for performing BFS.
; Number of epochs
.Iv-B Incremental Sampling
Unlike the one-time sampling strategy, we further devise an incremental sampling method to split the total resource budget of fabricating unseen layout patterns and taking the corresponding SEM images into a few smaller fine-tuning datasets. In this way, we iteratively update a pretrained pre-simulation model to extend its generalization ability. To this end, the fine-tuning dataset selected in the -th iteration should be able to best update the pre-simulation model fine-tuned on the ()-th fine-tuning dataset.
The proposed incremental sampling method, taking the aforementioned one-time sampling method as its backbone, is an iterative routine with a stop-criterion function measuring the difference between the knowledge learned from two successive iterations under a resource budget. As depicted in Algorithm 2, the main idea of our incremental sampling method is to re-rank the sampling priorities of unselected samples in the unseen-pattern pool after each sampling iteration with the aid of a meticulously-designed node attribute Informativeness-score.
Informativeness-Score: Assuming each sample carries a certain amount of information, say, information volume [6], the Informativeness-Score (-Score) aims to assess the information volume carried by a selected sample in the feature space. Since the information volume covered by a frequently-visited node may usually be shared by its neighboring nodes, to avoid acquiring redundant information, the sampling priority of a frequently-visited node should be lower, and vice versa. Therefore, we first take selected unseen samples as the starting nodes , then evaluate the tendency of individual unselected nodes being visited by random walk, and finally evaluate the -Score based on the tendency values. Algorithm 3 shows the pseudo-code for evaluating the -Score. Note that i)
is a vector whose
-th entry denotes the -Score of the -th node, and ii) in Algorithm 3, denotes the vector whose entries record the cumulative -Score of individual nodes in the -th iteration.Budget: We exploit a variable , denoting budget, to bound the maximal total visiting distance. This parameter is used to model the maximal information volume a starting node possesses, so is set to be used to construct our -NN graph. This design enables the Algorithm 3 to visit at least nodes while evaluating the -Score.
Step–Cost: We evaluate the cost per move from to its neighbor based on the distance between autoencoder features and the ratio of graph densities between two nodes as follows:
(18) |
where
(19) |
and
(20) |
where denotes the graph density [8] measuring how close on average a node approaches its neighbors in the feature space spanned by , and prevents the same dense region from being selected redundantly by subtracting the weight of the destination node by the weight of the starting node of the current step.
Tendency Weight: The tendency weight used to derive the random-walk probability of the -th sampling iteration is given by
(21) |
and
(22) |
where
(23) |
Note that these two equations are similar to (14) and (15) but use a different factor to balance the influence on the -th node brought by the local neighborhood.
Stop Criterion: The stop criterion aims to check if the selected nodes represent the data graph well. This criterion implies that i) all samples on the graph can be equally visited by random walk, and ii) an additional batch of sampling cannot increase the normalized cumulative -Score of each node. Hence, the stop criterion is defined as follows:
(24) |
where is the ratio of the number of selected samples after the -th iteration to the number of total samples, and is the normalized cumulative -Score.
V Experimental Results
V-a Dataset and Network Configuration
Two datasets are used in our experiments. Both datasets comprise pair-wise image samples, each consisting of a layout pattern and a corresponding binarized SEM image. Dataset-1 is used as seen data, i.e., the training set, consisting of image pairs among which SEM images have “enclosure” patterns and the other have “bridge” patterns. Meanwhile, Dataset-2, the blind testing set, contains image pairs among which involve “enclosure” patterns and the remaining involve “bridge” patterns. Some examples of enclosure and bridge patterns are illustrated in Fig. 4. With this setting, we assumes that the enclosure patterns in Dataset-1 to be regular ones and the bridge patterns tend to be novelties. Consequently, a successful novelty detection scheme should rate the bridge patterns in Dataset-2 with higher glocal novelty scores.
![]() |
![]() |
![]() |
(a) Expectation | (b) Enclosure | (c) Bridge |
Encoder | ||
---|---|---|
Layer | Filter | Output Size |
Input | – | |
Conv-BN-ReLU |
||
Conv-BN-ReLU | ||
Conv-BN-ReLU | ||
Conv-BN-ReLU | ||
Decoder | ||
Upsample | ||
Conv-BN-LReLU | ||
Upsample | ||
Conv-BN-LReLU | ||
Upsample | ||
Conv-BN-LReLU | ||
Upsample | ||
Conv-BN-Sigmoid |
Both the auto-encoder and SA-LithoNet described in Fig. 2 are pretrained on Dataset-1. For SA-LithoNet, we adopt the same LithoNet architecture and train it with the same settings used in [30]. Table I shows the architecture of our auto-encoder, that is trained for epochs via the mean-squared-error (MSE) loss with a learning rate of and a batch-size of .
We conduct two experiment sets to verify the effectiveness of our method. The first set validates whether our novelty detection scheme can accurately identify novel layout patterns, and the second evaluates the effectiveness of our sampling methods in selecting representative novel patterns for updating a pretrained pre-simulation model like LithoNet.
V-B Layout Novelty Detection
In order to show the effectiveness of our layout novelty detection algorithm, we first verify the stability and robustness of our supervised MIGNA method, and then use the MIGNA results as the golden references to evaluate the accuracy of our global-local (glocal) layout novelty scoring.
We use the AUC (Area Under the Curve) score of the ROC (Receiver Operating Characteristic) curve as the objective evaluation metric. The higher the AUC score is, the more accurate the predictions are. Table
II compares the AUC scores of the detection results with different novelty detection methods based on the MIGNA-annotated references (see Sec. III-B) listed in the left three columns. Here, denotes the threshold of anomaly patches for assessing the layout novelty, and the number of layouts classified as novelty in Dataset-2 decreases with . The proposed SA-Glocal novelty scoring outperforms the SA-LithoNet-based local scoring and autoencoder-based global scoring for all settings.MIGNA annotations | AUC scores | ||||
---|---|---|---|---|---|
# normal | # novel | SA-Litho | Autoencoder | Ours | |
(Local) | (Global) | (Glocal) | |||
3 | 299 | 701 | 0.825 | 0.684 | 0.862 |
4 | 383 | 617 | 0.805 | 0.737 | 0.861 |
5 | 449 | 551 | 0.744 | 0.749 | 0.846 |
6 | 559 | 441 | 0.683 | 0.683 | 0.756 |
7 | 655 | 345 | 0.624 | 0.609 | 0.676 |
Fig. 5 shows the ROC performances on Dataset-2 with different novelty detection methods, including our methods (autoencoder-based, LithoNet-based, and SA-Glocal) and three state-of-the-art novelty detection approaches: LSA [1], GEOM [10], and GOAD [3]. In Fig. 5, the MIGNA-annotated labels are used as pseudo ground-truths to calculate the true positive rates (TPRs) and false positive rates (FPRs). The ROC curves demonstrate that the LithoNet-based local novelty scoring and the autoencoder-based global novelty scoring are complementary to each other, where the former detects much more novelties at low FPRs while the latter can detect almost all novelties at about FPR. Consequently, they can be combined together to boost the performance of novelty detection, as shown via the SA-Glocal curve. Table III lists the AUC scores with the six schemes, showing that the proposed SA-Glocal novelty scoring well beats all the others, achieving a significantly higher AUC score of .
To validate the impacts of various novelty detection schemes on the performance of model update, we randomly select 50, 100, 150, 200, and 250 samples out of the novel patterns detected by each method, together with the original training set, to form the finetune sets of different sizes, and then use them to update the LithoNet model. Fig. 6 compares the inference performances of different fine-tuned LithoNet models, where each point on a curve corresponds to a LithoNet updated by a finetune set containing randomly-selected novel samples detected by one specific novelty detection scheme. The horizontal axis indicates the amount of novel samples randomly picked into the finetune dataset. Note that we here adopt the same similarity metrics used in [30], including C2C-distance (contour-to-contour distance) [31], IOU (intersection over union), SSIM (structural similarity index measure) [39], and pixel-error-rate, to evaluate the performances of the LithoNet models updated on the various fine-tune sets. The results demonstrate that, for a pretrained LithoNet model, the novel samples detected by our SA-Glocal are significantly more informative than those detected by LSA, GOEM, and GOAD, making SA-Glocal outperform the competing methods in terms of all quality metrics for model update. Moreover, Fig. 6 also hints that SA-Glocal scoring is capable of being an active learning oracle because even a very limited amount of randomly-selected novel patterns identified by SA-Glocal can best fine-tune a pretrained LithoNet.
Table IV shows the ablation study of our novelty detection method. Here, LithoNet (OC-SVM) local scoring shows the baseline performance, i.e., an AUC value of 0.720, obtained by feeding the LithoNet latent codes of test layouts into the conventional one-class SVM outlier detector [16]. The MC-SVM-based LithoNet scoring presented in Sec. III-E improves the AUC score to 0.744. Moreover, by combining the autoencoder global feature with the LithoNet local feature (i.e., the Glocal method), the AUC score significantly increases to 0.846. This demonstrates the effectiveness and the robustness of our Glocal design. Finally, the last three rows in Table IV evidence that SA-LithoNet can further boost the representability of the latent feature, particularly making the proposed SA-Glocal method achieve the best performance: 0.932 AUC score.

Method | AUC Score |
---|---|
LSA [1] | 0.690 |
GEOM [10] | 0.752 |
GOAD [3] | 0.768 |
Autoencoder (Global) | 0.675 |
LithoNet (Local) | 0.744 |
SA-Glocal (AE + SA-LithoNet) | 0.932 |
Method | AUC Score |
---|---|
Autoencoder | 0.675 |
LithoNet (OC-SVM) | 0.720 |
LithoNet (MC-SVM) | 0.744 |
Glocal (AE + LithoNet) | 0.846 |
SA-LithoNet (OC-SVM) | 0.857 |
SA-LithoNet (MC-SVM) | 0.864 |
SA-Glocal (AE + SA-LithoNet) | 0.932 |

V-C Performance Evaluation on Active-Learning Schemes
The experiments reported herein are conducted by the following steps. First, for Dataset-2, we label those samples whose SA-Glocal score as novelties, and regular patterns otherwise. Second, we partition Dataset-2 into two subsets: i) a finetune set consisting of a subset with randomly picked regular image pairs and a subset with randomly picked novel pairs, and ii) a blind testing set comprising regular pairs and novel pairs. Third, we update the pretrained LithoNet individually on the finetune sets together with the original training set, selected by different active learning strategies, and then evaluate their model performance on .


Fig. 7 compares the performance with various active learning strategies, where each point on a curve corresponds to a different subset of . We compare the proposed one-time sampling (OTS) and incremental sampling (INS) methods with random sampling and existing active/graph sampling methods, including K-center greedy (Kcenter) [28], RCMS with uncertainly sampling (RCMS) [40], Margin AL [44], informative cluster diversity (ICD) [21], and graph density [8]. The horizontal axis in Fig. 7 indicates the amount of novel samples selected into the finetune set. Moreover, Fig. 8 shows the breakdowns of C2C distance ranges obtained on the testing dataset with different LithoNet models fine-tuned on different -sample333The sampling amount of the proposed incremental sampling method (INS) cannot be assigned in advance and is determined during run-time. The number of samples closest to 100 calculated by INS is 105. fine-tune sets selected by different sampling methods, respectively. The comparison shows that the two LithoNet models, respectively fine-tuned on the two -sample fine-tune sets selected by our proposed one-time sampling and incremental sampling schemes, are best improved. Specifically, the C2C-distances of more than of testing samples are less than 0.45 pixel. In contrast, the two LithoNets, fine-tuned on sample sets selected by RCMS [40] and Graph Density (GD) [8], achieve the closest performances to the OTS and INS fine-tuned models. However, they result in much fewer test samples within the least C2C-distances range (i.e., ]) than our methods’, resulting in the performance differences illustrated in Fig. 7.
We can conclude from Fig. 7 and Fig. 8 the following observations. First, the novelties detected by the Glocal novelty scoring are beneficial for updating a pretrained LithoNet since i) the C2C-distance and error-rate decrease, and ii) the SSIM and IOU scores increase with the number of selected samples. This observation is reasonable because the benefit of adding new data points to a training dataset will diminish if these new data points are significantly similar to existing samples in the training dataset, as revealed in [6]. Second, while the proposed one-time sampling (OTS) method outperforms the competing methods in all aspects, the proposed incremental sampling (INS) method has best cures and can reach the performance plateau when only sampling 105/460 of the data. This means that our novelty detection together with graph sampling can effectively accomplish the goal of active learning from novel layout clips.
Vi Conclusions
In this paper, we proposed a deep learning-based layout novelty detection method that can work in the absence of ground-truth SEM images. The proposed method architecturally consists of two subnetworks, a pretrained autoencoder and a pretrained layout-to-SEM simulator. The former subnetwork learns to capture global shape structures of training (layout) samples so that it can be used to derive the autoencoder-based global novelty score. Besides, the latter subnetwork aims to extract a latent code representing the fabrication-induced local shape deformation of a given layout so that the extracted latent code can be used to evaluate an attention-guided local novelty score. These two novelty scores together form the proposed Glocal layout novelty measure. We have also proposed two graph sampling-based active-learning strategies, one-time sampling and incremental sampling, to select a much reduced set of representative layouts most worthy of further fabrication for acquiring the ground-truth SEM images, in an on-a-budget environment. Our experimental results demonstrate that the proposed method can detect novel layout patterns effectively, and the identified layout novelties can be used to improve the generalization capability of a learning-based layout-to-SEM pre-simulation model.
References
-
[1]
(2019)
Latent space autoregression for novelty detection.
In
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
, pp. 481–490. Cited by: §II-B, §III-C, Fig. 5, §V-B, TABLE III. - [2] (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE 2 (1), pp. 1–18. Cited by: §II-B.
- [3] (2020) Classification-based anomaly detection for general data. arXiv preprint arXiv:2005.02359. Cited by: §II-B, Fig. 5, §V-B, TABLE III.
- [4] (2011) Detecting anomalies in people’s trajectories using spectral graph analysis. Comput. Vis. Image Understand. 115 (8), pp. 1099–1111. Cited by: §II-B.
- [5] (2013) A revisit to support vector data description. Dept. Comput. Sci., Nat. Taiwan Univ., Taipei, Taiwan, Tech. Rep. Cited by: §III-E.
- [6] (2019) Class-balanced loss based on effective number of samples. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9268–9277. Cited by: §IV-B, §V-C.
- [7] (2021) An image is worth 16x16 words: transformers for image recognition at scale. In Proc. Int. Conf. Learn. Rep., Cited by: §III-D.
- [8] (2012) Ralf: a reinforced active learning formulation for object class recognition. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 3626–3633. Cited by: §IV-B, §V-C.
- [9] (2020) AnomalyDAE: dual autoencoder for anomaly detection on attributed networks. In Proc. IEEE Int. Conf. Acoustics Speech Signal Process., pp. 5685–5689. Cited by: §II-B.
- [10] (2018) Deep anomaly detection using geometric transformations. arXiv preprint arXiv:1805.10917. Cited by: §II-B, Fig. 5, §V-B, TABLE III.
- [11] (1995) A novelty detection approach to classification. In Proc. Int. Joint Conf. Artif. Intell., Vol. 1, pp. 518–523. Cited by: §II-B.
- [12] (2018) Novelty detection with gan. arXiv preprint arXiv:1802.10560. Cited by: §II-B.
- [13] (1995) A sequential algorithm for training text classifiers: corrigendum and additional data. In ACM SIGIR Forum, Vol. 29, pp. 13–19. Cited by: §II-C.
- [14] Improving one-class svm for anomaly detection. In Proc. Int. Conf. Mach. Learn. Cybern., Vol. 5, pp. 3077–3081. Cited by: §III-E, §III-E.
-
[15]
(2018)
Data efficient lithography modeling with transfer learning and active data selection
. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 38 (10), pp. 1900–1913. Cited by: §II-C. - [16] (2013) Svdd-based outlier detection on uncertain data. Knowledge and Inf. Syst. 34 (3), pp. 597–618. Cited by: §II-B, §II-B, §III-E, §III-E, §V-B.
- [17] (2015) Masked autoencoder for distribution estimation. Cited by: §II-B.
- [18] (2003) Discrete differential-geometry operators for triangulated 2-manifolds. In Visualization and mathematics III, H.-C. Hege and K. Polthier (Eds.), pp. 35–57. Cited by: footnote 2.
- [19] (2010) Review of novelty detection methods. In Proc. Int. Convention MIPRO, pp. 593–598. Cited by: §II-B.
- [20] (2010) Networks: an introduction. Oxford University Press. Cited by: §IV-A, §IV-A.
-
[21]
(2016)
Efficient selection of informative and diverse training samples with applications in scene classification
. In Proc. IEEE Inf. Conf. Image Process., pp. 494–498. Cited by: §V-C. - [22] (2019) Ocgan: one-class novelty detection using gans with constrained latent representations. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 2898–2906. Cited by: §II-B.
- [23] (2018) Generative probabilistic novelty detection with adversarial autoencoders. arXiv preprint arXiv:1807.02588. Cited by: §II-B, §III-C.
- [24] (2014) A review of novelty detection. Signal Process. 99, pp. 215–249. Cited by: §I.
- [25] (2018) Adversarially learned one-class classifier for novelty detection. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 3379–3388. Cited by: §II-B.
- [26] (2021) A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: solutions and future challenges. arXiv preprint arXiv:2110.14051. Cited by: §I.
-
[27]
(2017)
Unsupervised anomaly detection with generative adversarial networks to guide marker discovery
. In Proc. Int. Conf. Inf. Process. Med. Imag., pp. 146–157. Cited by: §II-B. - [28] (2018) Active learning for convolutional neural networks: a core-set approach. In Proc. Int. Conf. Learn. Rep., Cited by: §V-C.
- [29] (2009) Active learning literature survey. Cited by: §II-C.
- [30] (2021-05) From ic layout to die photo: A CNN-based data-driven approach. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 40 (5), pp. 957–970. Cited by: §I, §I, §I, §II-A, §III-A, §III-A, §III-B, §III-B, §III-B, §III-C, §III-D, §III-D, §V-A, §V-B.
- [31] Contour-to-contour distance. Note: https://www.mathworks.com/matlabcentral/fileexchange/75551-contour-to-contour-distance Cited by: §V-B.
- [32] (2019) Difference-seeking generative adversarial network–unseen sample generation. In Proc. Int. Conf. Learn. Rep., Cited by: §II-B.
- [33] (2020) Csi: novelty detection via contrastive learning on distributionally shifted instances. arXiv preprint arXiv:2007.08176. Cited by: §II-B.
- [34] (1992) Variable kernel density estimation. Annals of Statistics, pp. 1236–1265. Cited by: §II-B, §II-B.
- [35] (2001) Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2 (Nov), pp. 45–66. Cited by: §II-C.
- [36] (2017) Attention is all you need. In Adv. Neural Inf. Process. Syst., pp. 5998–6008. Cited by: §III-D.
- [37] (2018) Non-local neural networks. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7794–7803. Cited by: §III-D.
- [38] (2020) One-class graph neural networks for anomaly detection in attributed networks. arXiv preprint arXiv:2002.09594. Cited by: §II-B.
- [39] (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13 (4), pp. 600–612. Cited by: §V-B.
- [40] (2003) Representative sampling for text classification using support vector machines. In Proc. European Conf. Inf. Retr., pp. 393–407. Cited by: §V-C.
- [41] (2019) GAN-OPC: mask optimization with lithography-guided generative adversarial nets. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 39 (10), pp. 2822–2834. Cited by: §I, §II-A.
- [42] (2019) LithoGAN: end-to-end lithography modeling with generative adversarial networks. In Proc. ACM/IEEE Design Autom. Conf., pp. 1–6. Cited by: §I, §II-A.
- [43] (2019) Self-attention generative adversarial networks. In Proc. Int. Conf. Mach. Learn., pp. 7354–7363. Cited by: §III-D.
- [44] (2014) Improved margin sampling for active learning. In Proc. Chinese Conf. Pattern Recognit., pp. 120–129. Cited by: §V-C.
- [45] (2017) Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7340–7351. Cited by: §III-A.
- [46] (2010) Active learning framework for post-silicon variation extraction and test cost reduction. In Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, pp. 508–515. Cited by: §II-C.