From IC Layout to Die Photo: A CNN-Based Data-Driven Approach

02/11/2020 ∙ by Hao-Chiang Shao, et al. ∙ National Taiwan University of Science and Technology National Tsing Hua University 6

Since IC fabrication is costly and time-consuming, it is highly desirable to develop virtual metrology tools that can predict the properties of a wafer based on fabrication configurations without performing physical measurements on a fabricated IC. We propose a deep learning-based data-driven framework consisting of two convolutional neural networks: i) LithoNet that predicts the shape deformations on a circuit due to IC fabrication, and ii) OPCNet that suggests IC layout corrections to compensate for such shape deformations. By learning the shape correspondence between pairs of layout design patterns and their SEM images of the product wafer thereof, given an IC layout pattern, LithoNet can mimic the fabrication procedure to predict its fabricated circuit shape for virtual metrology. Furthermore, LithoNet can take the wafer fabrication parameters as a latent vector to model the parametric product variations that can be inspected on SEM images. In addition, traditional lithography simulation methods used to suggest a correction on a lithographic photomask is computationally expensive. Our proposed OPCNet mimics the optical proximity correction (OPC) procedure and efficiently generates a corrected photomask by collaborating with LithoNet to examine if the shape of a fabricated IC circuitry best matches its original layout design. As a result, the proposed LithoNet-OPCNet framework cannot only predict the shape of a fabricated IC from its layout pattern, but also suggests a layout correction according to the consistency between the predicted shape and the given layout. Experimental results with several benchmark layout patterns demonstrate the effectiveness of the proposed method.



There are no comments yet.


page 4

page 7

page 8

page 9

page 10

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

After IC circuit design and layout, it typically takes two to three months to fabricate a 12-inch IC wafer, involving a multi-step sequence of photolithographic and chemical processing steps. Among these steps, a lithography process is used to transfer an IC layout pattern from a photomask to a photosensitive chemical photoresist on the substrate, followed by an etching process that chemically removes parts of a polysilicon or metal layer, uncovered by the etching mask, from the wafer surface. Because it is hard to control the exposure conditions and the chemical reactions involved in all fabrication steps, the two processes together lead to nonlinear shape distortion of a designed IC pattern, which is usually too complicated to model. This fact gives rise to an issue so-called mask optimization, a procedure that computes an optimized photomask (or etching mask) to make the shape of the fabricated IC wafer best consistent with its source layout design.

The inevitable shape deformations on a fabricated IC due to the imperfect lithography and etching processes often cause IC defects (e.g., thin wires or broken wires) if an IC circuit layout is not appropriately designed, especially on the first few metal layers. Nevertheless, in most cases we still cannot identify such IC defects due to inappropriate IC circuit layout until capturing and analyzing the scanning electron microscope (SEM) images of metal layers after the wafer fabrication process, making the circuit verification very costly and time-consuming.

It is therefore desirable to develop pre-simulation tools, including i) a lithography simulation method for predicting the shapes of fabricated metal lines based on a given IC layout along with IC fabrication parameters, and ii) a mask optimization strategy for predicting the best mask to compensate for the shape distortions caused by the lithography and etching processes.

Fig. 1: Relationship among OPC simulation, circuit verification on an SEM image, and our method. Conceptually, the OPC step, highlighted by the red dashed lines, is used to suggest modifications of a layout mask so that the fabricated IC could have nearly the same shape as the original layout pattern. The proposed LithoNet and its applications are highlighted by purple contours.

As for lithography simulation, there are two categories of conventional approaches: physics-level rigorous simulation and compact model-based simulation [34, 36]. Rigorous simulation methods simulate physical effects of materials to accurately predict a fabricated wafer image and thus are very time-consuming [29, 17]

. On the contrary, a compact model-based simulation method follows loosely physical phenomena to chase a faster computational speed by exploiting complicated, parameter-dependent, non-linear functions. Different from traditional methods, we aim at developing a convolutional neural network (CNN) based approach, which learns the parametric model of physical and chemical phenomena of a fabrication process directly from a training dataset containing pairs of IC layouts and their corresponding SEM images. Based on the learned network model, we can predict a fabricated wafer image more accurately and efficiently than conventional methods.

Moreover, fab-engineers usually optimize a mask pattern by iteratively modifying the layout design based on its lithography simulations. However, rule-based lithography simulations resort to linear combinations of optical computations derived from several similar yet not identical historical fab-models. This fact may make conventional mask optimization methods unreliable against new layout patterns. The simulation reliability largely relies on a rich amount of historical fabrication data in the database, which is, however, very costly because ground-truth fab-models need to be gathered by fabricating a layout pattern with all possible configurations.

The relationship among the IC fabrication process, lithography simulator, and mask optimizer is illustrated in Fig. 1, where the “OPC” block stands for optical proximity correction, a standard approach to photomask correction for compensating for the shape distortions due to diffraction or process effects as well as guaranteeing the printability of a layout pattern, especially the printability at the corners of the process window [20, 8]. As illustrated in the red dashed rectangles in Fig. 1

, the mask used in the fabrication process is a modified version of a source layout design, aiming to compensate for possible “shrinkages” in line shapes due to the fabrication to mitigate the deviation of a fabricated IC circuitry from its layout design pattern. However, traditional OPC simulations have two primary drawbacks. First, it runs simulations based on those rules and patterns already known; thus, an OPC correction may be unreliable if an unseen layout design is given. Second, not only is a single OPC simulation computationally expensive, but also the whole OPC procedure is a time-consuming trial-and-error routine that is iterated until no irregularity can be found in the OPC estimation result. Due to its high complexity, the OPC simulation is usually performed on a limited number of regions of interest (ROIs) rather than on the whole layout design to reduce computation. Take the

ICWB software (IC WorkBench) developed by Synopsys [28] for example. ICWB takes, in average, about 34 seconds to run a simulation on an layout patch with an Intel Xeon E5-2670 CPU and 128GB RAM. It will cost around 4 days to run once an OPC simulation on a layout design, and such computational cost makes a complete OPC simulation procedure impractical. It is therefore highly desirable to develop an efficient photomask optimization scheme.

Recent progress on image-to-image translation techniques makes them suitable to tackle the above-mentioned lithography simulation (i.e., Layout-to-SEM) and photomask optimization (i.e., SEM-to-Layout) problems. However, these two issues are more complicated than general image-to-image translation problems. Take Layout-to-SEM prediction for example. First of all, the domain of IC layout images and the domain of SEM images are heterogeneous. An IC layout is a purely man-made blueprint with only lines and rectangles on it, and hence it is noise-free and artifact-free. On the contrary, an SEM image is formed by the intensity of detected signal from raster-scanning the IC surface with a focused electron beam. In additional to the continuous shape distortions introduced by the lithography and etching processes, the SEM imaging process itself also suffers from several kinds of interference (e.g., scan-line noise and shading). This fact leads SEM images to a significantly different domain from that formed by layout images. Hence, this issue is essentially a cross-domain image matching and translation problem. Second, in order to predict the corresponding SEM image from an IC layout, our solution must be capable of characterizing the shape correspondence between these two domains of images. This fact raises an unsupervised cross-domain image matching problem, which usually has not been concerned in general image-to-image translation techniques, and thus it requires a more sophisticated solution, as the concerns stated in

[1, 39]. Third, for mask optimization problem, it is very costly to collect a comprehensive set of ground-truth OPC-corrected photomasks, making the training of a photomask optimization network infeasible.

Fig. 2: Proposed framework composed of two data-driven CNN networks: LithoNet and OPCNet. LithoNet is a data-driven simulator of the lithography and etching processes in IC fabrication. OPCNet is a mask optimization network using prediction results of LithoNet as supervision for the purpose of optical proximity correction.

To address the above problems, as illustrated in Fig. 2, we propose a fully data-driven framework that comprises two CNN-based modules, namely, LithoNet and OPCNet, functionally complementary to each other. In short, LithoNet is a cross-domain simulator of the lithography and etching processes in IC fabrication, and OPCNet is a self-supervised mask optimization CNN using the prediction results of LithoNet as supervision for the purpose of OPC.

This paper has four primary contributions:

  • To the best of our knowledge, we are the first to formulate the Layout-to-SEM deformation prediction problem as a cross-domain image translation and correspondence problem, and we propose a two-step CNN-based framework to address it.

  • The proposed LithoNet-OPCNet system is computationally much more efficient than typical optical-based contour simulation scheme, while achieving comparable prediction accuracy. Therefore, our method could enable IC fabrication plants to run a full, large-scale screening on new IC layout designs. Note that the standard OPC approaches rely on sophisticated design rules and design patterns already in the database, and thus it can only examine a limited amount of areas each time.

  • The proposed LithoNet is parameterized with fabrication settings. Hence, it can also predict results under different fabrication conditions so as to assist fabrication plants to find the best suitable working intervals of parameters and thus be beneficial for yield-rate improvement.

  • The proposed OPCNet overcomes the difficulty in lack of ground-truth mask patterns. With the aid of a novel training objective function called I/O-consistency loss, the proposed OPCNet can well simulate the mask optimization process in collaboration with LithoNet.

The remainder of this paper is organized as follows. We review related literature in Section II. The proposed LithoNet and OPCNet are detailed in Section III and IV, respectively. Section V demonstrates and discusses our experimental results. Finally, we draw our conclusion in Section VI.

Ii Related Work

Ii-a Virtual Metrology

In IC fabrication, virtual metrology (VM) refers to the methods for predicting wafer properties based on fabrication parameters and sensor data from equipment without performing physical measurements on the product wafer produced by a whole costly fabrication process [10]. Since VM techniques can significantly reduce the cost of IC fabrication, various kinds of VM methods have been proposed to address the fabrication quality prediction issues. For example, as for the prediction of average Silicon Nitride cap layer thickness, regression-based VM methods were developed as surveyed in [23]. Specifically, Susto et al. exploited the knowledge collected in the process steps to improve the accuracy of VM prediction via a multi-step strategy [27]. In addition, the demand of VM methods also triggers the development of theoretical techniques. The method proposed in [22], for instance, focused on the OPC mask design problem and modeled it as an inverse problem of optical microlithography. Optical lithography is a process used for transferring binary circuit patterns onto silicon wafers, and related discussions about lithography techniques can be found in [21]

. Recently, people have been attempting to integrate machine learning methods with IC implementation and VM

[34, 36, 13, 4, 35, 38]. For example, Yang et al. proposed a generative adversarial network (GAN) based [7] inverse method to estimate the optimal mask used in the fabrication process from an OPC simulation result [35]. However, Yang et al.’s design concentrates only on the OPC-to-Layout problem, which operates in an opposite direction of our Layout-to-SEM prediction. Therefore, to the best of our knowledge, there is no existing technique focusing simultaneously on both Layout-to-SEM (lithography simulation) and SEM-to-Layout (mask optimization) image translation problems. We deem that a hybrid method of image-to-image translation or feature mapping techniques could compose a naive solution to these two prediction problems.

Ii-B Lithography Simulation

Recently, there have been lithography simulation methods developed based on machine learning techniques. For instance, Watanabe et al. proposed a fast and accurate lithography simulation by determining an appropriate model function via CNN [34], and Ye et al. developed a GAN-based end-to-end lithography modeling framework, named LithoGAN, to map directly the input mask pattern to the output resist pattern [36]. Specifically, LithoGAN models the shape of the resist pattern based on a conditional GAN (cGAN) model and predict the center location of the resist pattern via a CNN model. LithoGAN has a dual learning framework, and similarly our LithoNet also adopts a dual learning framework.

Fig. 3: Block diagram of the proposed two-step framework for cross-domain image-to-image matching and translation. The upper step adopts CycleGAN to transfer the training SEM images to another reference domain as ground-truth binary labels. The lower LithoNet step estimates the deformation maps between the input layout patterns and their corresponding ground-truth binary labels.

As will be detailed in Section III, we formulate the Layout-to-SEM prediction as a cross-domain image-to-image translation problem in the LithoNet design. Recent image-to-image translation methods can be divided into two groups. One requires pairwise training images, e.g., [11, 32], and the other supports training on unpaired data, e.g., [15]. The method proposed in [15] was based on GANs [7] and VAEs [14], but it was designed for unsupervised image-to-image translation tasks, which could be considered as a conditional image generation model. Furthermore, the Pix2pix model [11]

consists of a Unet-architectured generator and a “PatchGAN” discriminator. Pix2pix uses the “PatchGAN” discriminator to model high-frequencies by classifying if each patch in an image is real or fake. Therefore, it can be adopted in various applications, such as the conversion of a cartoon map to a satellite image and the conversion of a sketch to a natural image, and becomes a benchmark in this field. The Pix2pix method was further enhanced in

[32] by taking advantages of a course-to-fine generator, a multi-scale discriminator, and a robust adversarial learning objective function so as to generate high-resolution photo-realistic images. However, none of above methods addresses the shape correspondence or the deformation field between two different domains of images, and neither do other representative image-to-image translation methods, such as CycleGAN [40], DualGAN [37], Coupled GANs [16], and [15, 3, 9].

However, existing image-to-image translation methods are usually inappropriate for this Layout-to-SEM image translation problem for the IC-fabrication VM purpose. Because characterizing the deviations on metal lines in a product IC from their source layouts is a critical point in IC-industry, those traditional image-to-image translation methods that lack a mechanism for precisely estimating a deformation field or the shape correspondence between the layout and SEM images are not applicable to this problem. To serve the above purpose, the proposed LithoNet model performs cross-domain image-to-image translation via learning the shape correspondence between paired training images so as to output a predicted deformation map for further VM applications.

Ii-C Mask Optimization

There also exist machine learning-based mask optimization approaches. For example, the GAN-OPC method proposed in [35] takes source layout patterns and their OPC simulation results as training inputs, and accordingly for an input layout design, predicts a corrected photomask that minimizes the deviations on the (simulated) fabricated circuit shapes from its original design. In order to facilitate the training process and guarantee convergence, GAN-OPC involves a pre-train procedure that trains jointly the neural network and the inverse lithography technique (ILT) [6]. After GAN-OPC converges, the obtained quasi-optimal photomask is further used as a good enough initial for further ILT operation. In addition, Yu et al.’s method can perform simultaneously sub-resolution assist feature (SRAF) [5] and edge-based OPC according to a DNN framework [38]. However, both the two methods require a collection of photomask images, such as those suggested by OPC or historical data gathered during actual fabrication process, as the ground-truth dataset for training. Because it is expensive and time-consuming to collect qualified mask images, the cardinality of the training dataset forms a performance bottleneck of these two methods. To eliminate such bottleneck, powered by LithoNet, we propose the OPCNet model for mask optimization. Because OPCNet and LithoNet are the inverse function to each other, OPCNet can be trained directly by using the SEM-styled images predicted by LithoNet without the need of using expensive photomask patterns, as will be elaborated later.

Iii LithoNet: A CNN-Based Simulator of Lithography

Fig. 4: Two heterogeneous domains of images. (a) Layout designs. (b) SEM images.

As illustrated in Fig. 3, the proposed LithoNet consists of a CycleGAN-based [40] domain transfer network and a deformation prediction network. LithoNet is designed to learn how an IC wafer fabrication process deforms the shape contours of a layout pattern. It thus can simulate the fabrication process to predict the shape deformation caused by the fabrication process for further virtual metrology applications based on i) a given layout and ii) a set of fabrication parameters. One major difficulty in learning the deformation model between a layout pattern and the corresponding SEM image of its fabricated circuitry lies in the fact that they are from heterogeneous domains. Specifically, an SEM image is a high-resolution, 8-bit, gray-scaled image with deep DOF (depth of field), whereas a layout is no more than a man-made binary pattern with only rectangular regional objects on it. As a result, the goal of LithoNet is to predict the contour shapes by learning the pixel-wise shape correspondence between every paired layout and SEM images. Nevertheless, due to the poor contrast and scanning pattern noise in SEM images, it is usually difficult to capture edge contours correctly from SEM images, on which a 1-pixel-drift corresponds to a nanometer-scale displacement on real IC products. Therefore, transferring the domain of SEM images to another intermediate domain without the above-mentioned contrast and noise problems would be beneficial.

To this end, we propose a two-step framework. In the first step, we use CycleGAN [40] to transfer a gray-scale SEM image to an intermediate domain, where images have SEM-styled shape contours and layout-styled clear background. Then, in the second step, given a source layout along with fabrication parameters, LithoNet predicts the shape deformation due to the fabrication process. In sum, Step-I learns to remove the difference between the SEM image and its man-made binary shape so that Step-II can learn the shape correspondence between the SEM image and its original layout. In the following subsections, we will introduce our design in details.

Iii-a Step I: Image domain transfer

Because SEM and layout images are of heterogeneous domains (styles), as demonstrated in Fig. 4, we adopt an image domain transfer technique to align their domains. By removing the interference introduced the SEM imaging process, such as bias in brightness/contrast and scan-line noise in the background, via CycleGAN [40]

, the processed SEM image can be regarded as in the same domain as the layout. That is, the processed SEM image retains its curvilinear shape boundaries yet is binarized as if it were a layout.

To this end, we train CycleGAN using i) a set of product-ICs’ SEM images and ii) their associated segmentation masks. The second set of images can be derived by applying either advanced thresholding [26, 19], interactive segmentation [18, 31], or pseudo-background subtraction [2] on the source SEM images. Note that in order to guarantee the performance of domain transfer, segmentation masks with incorrect segmentation results are discarded under user-supervision. Finally, we utilize the well-trained CycleGAN to transfer source SEM images into the layout style, and these processed SEM images are further taken as reference ground-truths to train LithoNet in Step-II.

Employing CycleGAN for domain transfer has two advantages. First, CycleGAN is an unpaired image-to-image translation method, and hence it can learn the majority decision of many image segmentation algorithms, including the analysis software provided by the SEM vendor, for SEM images based on a large collection of segmentation results of different methods. Second, utilizing a ”U-net Generator” to translate images, CycleGAN is essentially a U-net-based segmentation method [24] supervised by its built-in ”Discriminator” through an adversarial loss, thereby suggesting a more reliable segmentation result than U-net, a state-of-the-art segmentation benchmark. Additionally, we can simply discard some rare unreliable CycleGAN segmentation results by quick human-inspection to prevent our LithoNet from learning incorrect contour correspondences.

Iii-B Step II: Shape Deformation Prediction

To learn the shape correspondence and the deformation field between SEM and layout images, LithoNet is trained by image pairs, each containing a layout and a ground-truth segmentation mask, i.e., a processed SEM image, generated in Step-I described in Section III-A.

As shown in Fig. 3, LithoNet consists of a generator and a warping module. The generator is a U-net [24]

like network that outputs a 2D dense correspondence map depicting the deformation field between the paired training images. Then, using the sampling strategy used in the spatial transformer network (STN)

[12], the warping module synthesizes a warped version of the given input layout to simulate a wafer-fabricated circuitry based on the deformation map. STN is a differentiable module designed for enabling neural networks to actively spatially transform feature maps so that neural network models can learn invariance to translation, scale, rotation, and warpings. Consequently, we adopt the sampling strategy of STN to benefit our LithoNet.

In contrast with common image generation networks like [11, 30], the advantages of LithoNet are twofold. First, LithoNet can generate and visualize a predicted deformation field, and therefore what have been learned by the network, i.e., the shape correspondences between input training image pairs, can be verified straightforwardly. Second, based on the visualized deformation field, it would be easier to identify possible impacts (e.g., defects), no matter global or local, caused by the layout and the configuration parameters during fabrication process, on the physical appearance of an IC’s metal layer. Concisely, the deformation field generated by our LithoNet is beneficial for clarifying both global and local shape correspondences between a layout and the SEM image of its product IC.

Iii-C Training Loss Functions

The training loss function

of LithoNet is primarily defined in the following form


where, denotes the reconstruction loss that measures the dissimilarity between the training ground-truth and the synthetic SEM-styled image . Meanwhile, measures the variability difference between a paired training image pair, and guarantees the smoothness of the deformation map. Finally, is used to penalize large displacements on the deformation map, and is the regression loss of fabrication parameters.

A) Reconstruction Loss:

The reconstruction loss term is defined as the loss between the training ground-truth and the synthetic SEM-styled image as follows:


where denotes the number of pixels. We derive by the following steps: i) sampling densely pixel positions on the to-be-generated ; ii) locating the correspondences of them on the input layout according to the deformation map that records the mapping relationship between pixels on onto their counterparts on

; iii) using backward interpolation to estimate the sampled pixel values on

, i.e., with non-integer positions ; and finally, iv) generating an estimated via bilinear interpolation111
where and denote ceiling and floor functions, respectively.
to calculate .

B) Total Variation Loss:

The total variation loss is defined as the total variation [25] of the signed difference between and , that is


This term is designed to align the shape contours of with those of . Without this term, the loss function might be dominated by the reconstruction loss described in (2), and consequently LithoNet would generate a bizarre synthetic image , which can produce a high overlap ratio compared with ground-truth image but has unnaturally jiggling contours. In other words, aims to retain the shape similarity.

C) Smoothness Loss

The smoothness loss is a penalty term defined as the -norm of the weighted gradient of the deformation map:


where denotes the Hadamard product, and is an edge-aware weighting matrix defined as


Note that contour edges on the input layout and the ground-truth layout-styled SEM image result in discontinuities in the deformation map . Because such discontinuities contribute to unnecessary smoothness penalty, should be suppressed appropriately according to the gradient information of both layout and SEM images.

D) Regularization Loss

The regularization loss is defined as the norm of deformation map :


This term reflects the fact that the deformation caused by wafer fabrication tends to be small, as will be discussed in Section V-C2.

E) Regression Loss for Fabrication Parameters

Because the configuration parameters of a fabrication process are continuous variables that influence the physical appearance of a wafer layer, we formulate the relationship between the fabrication parameters and the appearance of wafer layer as a regression problem. The regression loss is defined as


where is the reference IC shape segmented from the ground-truth SEM image used for training; is the fabrication parameter vector corresponding to , and respectively denote the input layout and input fabrication parameter vector for prediction, and is the predicted deformed IC shape. Therefore, this loss term aims to train i) a generator able to predict a synthesized SEM-styled image based on the given and , and ii) a discriminator able to estimate the fabrication parameter vector associated with .

Iv OPCNet: A CNN-Based Photomask Corrector Based on LithoNet

As described in Section II-C, the major challenge in developing a learning-based mask optimizer is to collect a comprehensive amount of ground-truth mask data corresponding to various layout patterns, e.g., well OPC-corrected photomasks leading to desired shapes of fabricated circuitry. This is, however, very costly and time-consuming. To overcome this difficulty, as shown in Fig. 2, we utilize a pre-trained LithoNet as an auxiliary module to train our photomask optimizer, OPCNet. Given an IC layout pattern, OPCNet aims to predict an OPC-corrected mask pattern so that, after being deformed by the lithography and etching processes that are simulated by LithoNet, the predicted deformed shape will be as close as the original layout pattern. Therefore, OPCNet can be regarded as the inverse model of LithoNet. As a result, for a desired layout pattern, we can use its predicted outputs of LithoNet as the input of OPCNet, and the desired layout itself as the corresponding output of OPCNet. Given a collection of such input-output pairs, we can train OPCNet without the need of collecting the ”ground-truth” OPC-corrected photomask patterns.

Specifically, given a layout design pattern , OPCNet aims to generate a photomask , whose lithography and etching simulation result predicted by LithoNet best matches . This design makes our OPCNet “groundtruth-free” during the training stage should LithoNet have been already well-trained. In addition, with the design of the input-output consistency loss used to measure the dissimilarity between a layout design pattern and its lithography simulation result

, OPCNet becomes an self-supervised learning method. The whole pipeline of our mask optimization method is illustrated in Fig.

2. Note that i) the pretrained LithoNet is fixed while training OPCNet, and ii) OPCNet is intrinsically a generator for translating a layout pattern into its optimal photomask based on the wafer fabrication model learned by LithoNet.

Iv-a Training Loss Functions for OPCNet

The overall training loss of OPCNet is defined as


where, denotes the input-output consistency loss measuring the dissimilarity between input layout and LithoNet’s output , represents the total variation loss on the difference between and , and denotes the mask smoothness loss for ensuring the smoothness of the obtained photomask patterns .

A) Input-Output Consistency Loss:

The input-output consistency loss aims to guide the learning of OPCNet so that the shape predicted by LithoNet best matches the desired input layout , provide that the source layout is OPC-corrected by the learned OPCNet. The loss term is defined as follows:


where denotes the number of pixels.

B) Total Variation Loss:

Similar to (3), the total variation loss is defined as the total variation of singed difference between the input layout and the prediction of LithoNet .


which is again an empirical term used to avoid unnatural patterns on the predicted shapes. prevents from being dominated by the I/O-consistency loss . Without this term, the OPCNet may produce a unnatural correction.

C) Mask Smoothness Loss:

The mask smoothness loss is defined to be the -norm of the gradient of the mask prediction, that is,


This term penalizes the discontinuity on the corrected photomask to guarantee the smoothness of shape contours of . Note that does not collaborate with an edge-aware weighting matrix since there are no ground-truth masks that define true contour edges in the training dataset.

V Experimental Results

Fig. 5: Comparison between the segmentation masks obtained by CycleGAN [40] trained on UMC dataset #1 and traditional Otsu thresholding.
Fig. 6: Comparison of the input layout patterns, predicted deformation maps, the predictions of fabricated IC shapes based on the deformation maps, and the ground-truths of fabricated IC shapes extracted from their associated SEM images.
Fig. 7: Prediction results by LithoNet trained on UMC dataset #1 without the smoothness loss term .

V-a Dataset and Settings

Images demonstrated in this paper are selected from two datasets provided by United Microelectronics Corporation (UMC). Both these two UMC datasets consist of pairs of images, each containing one layout image patch and its wafer’s SEM image patch. UMC dataset #1 contains SEM images taken from wafers fabricated with the same fabrication parameters, and UMC dataset #2 contains SEM images taken from wafers fabricated with seven various normalized parameter settings ranging from to . In total, UMC dataset #1 contains (i) a 942-pair training subset and (ii) a 100-pair blind testing subset, whereas UMC dataset #2 contains (i) a subset comprising pairs222there are 1,057 layouts and 7 different settings per layout, totally 7,399 pairs of images. for training and (ii) another subset comprising pairs for blind testing. All images in the blind testing set are collected from historical fabrication data; compared with those in the training sets, the blind test images are of much larger dimension and contain unseen design patterns. We trained CycleGAN for style-transfer in Step-I on UMC dataset #1, and LithoNet on UMC datasets #1 and #2. As for OPCNet, it was trained on paired data, each of which contains (i) a layout image in the first dataset and (ii) its fabricated IC shape predicted by feeding into a pre-trained LithoNet. As a result, OPCNet can be trained in an unsupervised manner. In our experiments, all image patches are downscaled from to to reduce the computational complexity. The five loss terms described in (1) are weighted empirically by .

V-B Performance Metrics

The performance of our model is evaluated objectively in terms of some widely-used similarity metrics, including Intersection Over Union (IOU), SSIM [33], and per pixel error rate. We will demonstrate in detail that our model outperforms other image-to-image translation methods and the standard OPC approach.

V-C LithoNet

V-C1 Image domain transfer

In Fig. 5, we compare our image domain transfer results with images derived by the traditional Otsu’s method [19]

. Obviously, the source SEM images contain typical complications from SEM imaging process, such as bias in brightness/contrast probably due to gain-shift and scanning-pattern noise. It is thus difficult for common methods to threshold an SEM image appropriately. By exploiting a well-trained translator, e.g., CycleGAN

[40], an SEM image can be transferred into a layout-styled format with its contour shapes keeping unchanged.

V-C2 Prediction Results

Fig. 6 illustrates the deformation map predicted from the input layout, the predictions of fabricated IC shapes based on the deformation map, and the corresponding ground-truths of fabricated IC shapes extracted from their associated SEM images. The deformation maps show that LithoNet successfully learns to widen lines within open areas and to condense lines otherwise. Because such information is the key to the metrology applications, such as layout scoring and OPC simulation described in Fig. 1

, this experiment also demonstrates that LithoNet can be used to bridge computer vision techniques with both fields of semiconductor manufacturing and computer-aided-design.

V-C3 Ablation Study of Loss Terms

Here we examine and discuss the effectiveness of individual loss terms in (1). First of all, we made numerical comparisons among different loss settings in Table I and Table II, each of which corresponds to a different dateset. The results shown in Table I were derived by LithoNet trained on UMC dataset #1, whereas Table II shows the performance of LithoNet trained on a small subset of UMC dataset #1 containing 480 training patches (obtained from 16 image samples through data augmentation). From Tables I and II, we can observe that the total-variation loss, , contributes significantly to the performance improvement. Moreover, is beneficial to improve the objective performance when only a very limited amount of training samples is provided, as shown in Table II. On the contrary, as listed in Table I, contributes not so effectively to the objective performance, when a comprehensive enough training dataset is given. We demonstrate the SEM-styled images predicted according to small training dataset without using the smoothness loss in Fig. 7, where unexpected artifacts are highlighted in red rectangles. This experiment set shows the necessity of , especially in cases of a small trainingset.

The visual effect brought by the total-variation loss is demonstrated in Fig. 8, where the “Baseline” column demonstrates images derived using , whereas the “Full” column shows predictions synthesized using . This experiment set shows how improves the visual quality of synthetic SEM-styled images. Take regions highlighted by red rectangles in Fig. 8 for example. Without , LithoNet tends to produce straight-line edges and sharp corners, although there are no such patterns on the training images produced by a real IC fabrication process, as shown in “Ground truth” column. By adding to the total loss function, such artifacts can be largely mitigated, thereby more faithfully predicting the shapes of segmented SEM images.

Fig. 8: Subject visual quality comparison of LithoNet with and without the total-variation loss , where the “Baseline” column demonstrates images derived using and the “Full” column shows predictions synthesized using .
Fig. 9: Subjective visual quality comparison between Pix2pix and the proposed LithoNet both trained on UMC dataset #1.
Fig. 10: Subjective visual quality comparison between Pix2pix and LithoNet, both trained on UMC dataset #1, for some unseen layout patterns of a different observation scale.

V-C4 Comparison with Pix2pix

As LithoNet is kind of image-to-image translation schemes, we compare it with Pix2Pix [11], a representative GAN-based image-to-image translation method. This experiment set was designed for two purposes. One is to verify if LithoNet is able to learn special shape correspondence between layout and SEM images, and the other is to check if LithoNet is more advantageous than Pix2Pix in this regard.

As shown in Table I, Pix2pix achieves slightly higher objective metric values than LithoNet. This situation, however, lies in the fact that these objective metrics mainly reflect the effect of the reconstruction loss term solely. Nevertheless, compared to Pix2pix, our total loss function described in (1) contains several additional loss terms, including , , and , which do actually lead to better visual quality as will be explained later.

Fig. 11: Predictions by LithoNet trained on UMC dataset #2 driven by different configuration parameter values for wafer fabrication. We focus on one configuration parameter which is inversely proportional to the degree of etching: the larger the parameter value, the lower the degree of etching, and the wider the metal lines. Those parameters values used in the training dataset are colored black, whereas those values not used in training are colored red.
Fig. 12: Illustrations of interrelationship between the shapes of metal lines and their local neighborhood.
Fig. 13: Illustrations of border effects. At image borders the shape deformations due to the lithography and etching processes behalf differently from those in non-border regions.

As illustrated in Fig. 9, Pix2pix produces artifacts like blurred and jiggled contour edges, whereas LithoNet is able to generate clear and smooth ones. Since both Pix2pix and LithoNet utilize -norm to guarantee a global shape similarity, this phenomenon would be probably due to the different control strategies over local shapes. Specifically, LithoNet makes use of the total-variation loss, smoothness loss, and regularization loss to control the local deformations, whereas Pix2pix relies on its discriminator architecture, the so-called PatchGAN design that penalizes a structure at the scale of patches, to handle local deformations. Consequently, because PatchGAN does not put any penalty on blurred and jiggled edges and learns only to classify if each generated patch looks realistic, such artifacts are reasonable trade-offs of Pix2pix’s PatchGAN design.

Loss Avg IOU Avg SSIM Avg Error
Pix2pix 0.8868 0.8784 0.0361
-- 0.8846 0.8730 0.0371
- 0.8789 0.8658 0.0392
- 0.8849 0.8720 0.0368
(LithoNet) 0.8820 0.8701 0.0380
TABLE I: Ablation study of different loss settings on UMC dataset #1
Loss Avg IOU Avg SSIM Avg Error
-- 0.8419 0.8109 0.1556
- 0.8462 0.8155 0.1502
0.8506 0.8223 0.1445
0.8514 0.8208 0.1440
TABLE II: Ablation study of different loss settings on a small subset of UMC dataset #1
Method Avg IOU Avg SSIM Avg Error
Pix2pix 0.6587 0.6396 0.1358
LithoNet 0.7107 0.6906 0.1170
TABLE III: Comparison between LithoNet and Pix2pix, both trained on UMC dataset #1, for unseen layout patterns of a different scale

Fig. 10 compares the prediction results of feeding LithoNet and Pix2pix with test images containing significantly distinct layout patterns from those in the training image set. Moreover, the source dimension of these testing images is much larger than the training data. Therefore, through this experiment we can appraise the reliability and robustness of LithoNet and Pix2pix in mimicking an IC fabrication process when the input layout is a brand new, unseen pattern of a different scale. We can observe from Fig. 10 that, for unseen layout patterns of a different scale, LithoNet significantly outperforms Pix2pix in terms of the clarity and integrity of shape boundaries, although the predictions of LithoNet still cannot perfectly match the ground-truth for lack of suitable training samples. Finally, Table III lists the numerical comparisons between LithoNet and Pix2pix for this case.

(a) (b)
Fig. 14: Prediction results of LithoNet: (a) Comparison between a layout and the prediction based on the layout, and (b) conceptual illustration of “Necking” and “Rounding” where the necking effects are highlighted by red boxes and arrows and the rounding effects are indicated by blue arrows.

V-C5 Fabrication parameters

Fig. 11 compares the predictions by LithoNet trained on UMC dataset #2 driven by different configuration parameter values for wafer fabrication. We focus on one configuration parameter which is normalized to the range of and is inversely proportional to the degree of etching: the larger the parameter value, the lower the degree of etching. Those parameters values used in the training dataset are colored black, whereas those values not used in training are colored red. This experiment shows that the proposed LithoNet, thank to the regression loss term described in (7), does learn the relationship between the line width and the fabrication parameter used to control the degree of etching in the fabrication process. Concisely speaking, the larger the parameter is, the wider the metal line should be. Hence, our LithoNet model is able to mimic the fabrication process and generate parameter-dependent prediction results. This is an important aspect of LithoNet design, and such design makes LithoNet suitable for semiconductor manufacturing simulations.

Fig. 15: Illustrations of masks predicted by the mask generator and their lithography simulation outputs.
Fig. 16: Mask prediction results and their lithography simulation outputs.

V-C6 Model generality

We examine here LithoNet’s range of applicability. The image pair in the top row of Fig. 12 shows that, in an open area, the general fabrication process typically produces a metal line wider than its layout design, as highlighted by the red rectangle. The predicted image shown in the bottom row of Fig. 12 tells that LithoNet learns the shape correspondence between paired training images, so it predicts a wider line in an open area and a narrower one in between two neighboring lines. In addition, the highlighted regions in Fig. 13 demonstrate that at image borders the predictions by LithoNet are different from the ground-truths. This is because, at image borders the shape deformations due to the lithography and etching processes behalf differently from those in non-border regions, but LithoNet treats them regularly. For example, LithoNet regards a line reaching a patch boarder should extend to the adjacent patch rather than shrink from the boarder. Such border effects can be easily handled by collecting enough training data at image borders along with an additional label signifying whether a region is a border one. Consequently, LithoNet can be expected to forecast fabrication results as long as a large enough amount of training data is given.

Finally, we design another experiment to show that LithoNet can well learn the ”necking” and ”rounding” effects that usually occur in IC fabrication, as highlighted by red rectangles in Fig. 14(a) and indicated by the red and blue arrows in Fig. 14(b). Necking is a high-risk pattern caused by either a tip-to-line or a line-end too close to another line on the layout design. As illustrated in Fig. 14(b), such situations may result in a line narrower than designed after fabrication. Hence, this experiment set evidences again that a well-trained LithoNet is capable of mimicking the semiconductor lithography and etching procedures.


V-D1 Impacts of Loss Functions

As described in Section IV, given a layout design pattern , OPCNet aims to generate a mask whose lithography simulation result predicted by LithoNet is best similar to . The OPCNet is controled jointly by the IO-consistency loss , the total-variation loss , and the mask smoothness loss . The former two loss terms measures the dissimilarity between and , whereas the third one focuses on the smoothness of . We here examine how and contribute to the mask prediction task.

Demonstrated in Fig. 15 are three columns of images, each of which corresponds to one loss setting. Comparing the mask predicted by using with that by , we can find that guarantees the quality of shape contour in the lithography simulation. No matter the of LithoNet or the of OPCNet, such total variation loss accounts for the difference between predicted contours and their ground-truth and focuses on pixels around the contour pixels. This term helps guarantee the similarity between the input layout and the lithography simulation and also avoid unexpected artifacts at contours. Finally, comparing the mask predicted by with that by , we find that can globally suppress unexpected artifacts on the predicted mask image. The mask prediction derived by described in (8) can thus be artifact-free and smooth.

V-D2 Mask Prediction Results

Finally, demonstrated in Fig. 16 are the masks predicted by OPCNet. Given a well-trained and accurate lithography simulator LithoNet, Fig. 16 evidences that our mask optimizer OPCNet can successfully perform the mask optimization task in a self-supervised learning manner without the need of collecting ground-truth OPC-corrected masks. With OPCNet, a layout pattern can be adequately corrected so that the resulting circuit shape best matches the source layout pattern, after an IC-fabrication process.

Vi Conclusions

In this paper we proposed a data-driven framework involving two convolutional neural networks: LithoNet and OPCNet. First, given a layout for virtual metrology, LithoNet mimics the lithography and etching processes in IC fabrication to predict the shape of a fabricated IC circuitry of a given layout design. By learning the shape correspondence between paired training images, i.e., IC layout designs and their fabricated IC SEM images, LithoNet can predict the shape deformation field of the layout and then generate a lithography simulation result. Second, with pre-trained LithoNet, OPCNet can learn a mask optimization model without ground-truth OPC-corrected masks based on the proposed input-output consistency loss. Experimental results evidently demonstrate that, in the lithography simulation issue, our method is more appropriate than existing image-to-image translation schemes and outperforms the standard compact model-based simulations. In the mask optimization problem, OPCNet can correctly predict the mask that its lithography simulation image is close to the expected layout. One on-going extension of this work is to establish a scoring system, based on the deformation map or SEM-styled image derived by our method, so that a virtual metrology system for IC circuit layout quality assessment can be developed.


  • [1] K. Aberman, J. Liao, M. Shi, D. Lischinski, B. Chen, and D. Cohen-Or (2018) Neural best-buddies: sparse cross-domain correspondence. ACM Trans. Graphics 37 (4), pp. 69. Cited by: §I.
  • [2] O. Barnich and M. Van Droogenbroeck (2010) ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Proc. 20 (6), pp. 1709–1724. Cited by: §III-A.
  • [3] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In

    Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

    pp. 3722–3731. Cited by: §II-B.
  • [4] L. Cao, J. Zhang, D. N. Power, and E. S. Parent (2018-November 8) Prediction of process-sensitive geometries with machine learning. Google Patents. Note: US Patent App. 15/588,984 Cited by: §II-A.
  • [5] A. H. Gabor, J. A. Bruce, W. Chu, R. A. Ferguson, C. A. Fonseca, R. L. Gordon, K. R. Jantzen, M. Khare, M. A. Lavin, W. Lee, et al. (2002) Subresolution assist feature implementation for high-performance logic gate-level lithography. In Optical Microlithography XV, Vol. 4691, pp. 418–426. Cited by: §II-C.
  • [6] J. Gao, X. Xu, B. Yu, and D. Pan (2014) MOSAIC: mask optimizing solution with process window aware inverse correction. In Proc. ACM/EDAC/IEEE Design Autom. Conf., pp. 52:1–52:6. Cited by: §II-C.
  • [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Proc. Adv. Neural Inf. Process. Syst., pp. 2672–2680. Cited by: §II-A, §II-B.
  • [8] T. Hsu (2001-February 27) Optical proximity correction (opc) method for improving lithography process window. Google Patents. Note: US Patent 6,194,104 Cited by: §I.
  • [9] X. Huang, M. Liu, S. Belongie, and J. Kautz (2018) Multimodal unsupervised image-to-image translation. In Proc. European Conf. Comput. Vis., pp. 172–189. Cited by: §II-B.
  • [10] M. Hung, T. Lin, F. Cheng, and R. Lin (2007) A novel virtual metrology scheme for predicting cvd thickness in semiconductor manufacturing. IEEE/ASME Trans. Mechatronics 12 (3), pp. 308–316. Cited by: §II-A.
  • [11] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017)

    Image-to-image translation with conditional adversarial networks

    In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1125–1134. Cited by: §II-B, §III-B, §V-C4.
  • [12] M. Jaderberg, K. Simonyan, A. Zisserman, et al. (2015) Spatial transformer networks. In Proc. Adv. Neural Inf. Process. Syst., pp. 2017–2025. Cited by: §III-B.
  • [13] A. B. Kahng (2018) Reducing time and effort in ic implementation: a roadmap of challenges and solutions. In Proc. ACM/ESDA/IEEE Design Autom. Conf., pp. 1–6. Cited by: §II-A.
  • [14] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §II-B.
  • [15] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. In Proc. Adv. Neural Inf. Process. Syst., pp. 700–708. Cited by: §II-B.
  • [16] M. Liu and O. Tuzel (2016) Coupled generative adversarial networks. In Proc. Adv. Neural Inf. Process. Syst., pp. 469–477. Cited by: §II-B.
  • [17] K. D. Lucas, H. Tanabe, and A. J. Strojwas (1996) Efficient and rigorous three-dimensional model for optical lithography simulation. J. Optical Society America: A 13 (11), pp. 2187–2199. Cited by: §I.
  • [18] K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool (2018) Deep extreme cut: from extreme points to object segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Cited by: §III-A.
  • [19] N. Otsu (1979) A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, Cybern. 9 (1), pp. 62–66. Cited by: §III-A, §V-C1.
  • [20] O. Otto, J. Garofalo, K. K. Low, C. Yuan, R. Henderson, C. Pierrat, R. Kostelak, S. Vaidya, and P. K. Vasudev (1994) Automated optical proximity correction: a rules-based approach. In Optical/Laser Microlithography VII, Vol. 2197, pp. 278–294. Cited by: §I.
  • [21] D. Z. Pan, B. Yu, and J. Gao (2013) Design for manufacturing with emerging nanolithography. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 32 (10), pp. 1453–1472. Cited by: §II-A.
  • [22] A. Poonawala and P. Milanfar (2007) Mask design for optical microlithography—an inverse imaging problem. IEEE Trans. Image Process. 16 (3), pp. 774–788. Cited by: §II-A.
  • [23] H. Purwins, B. Barak, A. Nagi, R. Engel, U. Höckele, A. Kyek, S. Cherla, B. Lenz, G. Pfeifer, and K. Weinzierl (2014) Regression methods for virtual metrology of layer thickness in chemical vapor deposition. IEEE/ASME Trans. Mechatronics 19 (1), pp. 1–8. Cited by: §II-A.
  • [24] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing Computer-Assisted Intervention (MICCAI), pp. 234–241. Cited by: §III-A, §III-B.
  • [25] L. Rudin, S. Osher, and E. Fatemi (1992) Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1-4), pp. 259–268. Cited by: §III-C.
  • [26] P. K. Saha and J. K. Udupa (2001) Optimum image thresholding via class uncertainty and region homogeneity. IEEE Trans. Pattern Anal. Mach. Intell. 23 (7), pp. 689–706. Cited by: §III-A.
  • [27] G. A. Susto, S. Pampuri, A. Schirru, A. Beghi, and G. De Nicolao (2015) Multi-step virtual metrology for semiconductor manufacturing: a multilevel and regularization methods-based approach. Computers & Operations Research 53, pp. 328–337. Cited by: §II-A.
  • [28] Synopsys, Inc.. Note: Cited by: §I.
  • [29] A. Taflove and S. C. Hagness (2005) Computational electrodynamics: the finite-difference time-domain method. Artech house. Cited by: §I.
  • [30] C. Wang, H. Zheng, Z. Yu, Z. Zheng, Z. Gu, and B. Zheng (2018) Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proc. European Conf. Comput. Vis., pp. 770–785. Cited by: §III-B.
  • [31] G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin, et al. (2018) DeepIGeoS: a deep interactive geodesic framework for medical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 41 (7), pp. 1559–1572. Cited by: §III-A.
  • [32] T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8798–8807. Cited by: §II-B.
  • [33] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13 (4), pp. 600–612. Cited by: §V-B.
  • [34] Y. Watanabe, T. Kimura, T. Matsunawa, and S. Nojima (2017) Accurate lithography simulation model based on convolutional neural networks. In Optical Microlithography XXX, Vol. 10147. External Links: Document, Link Cited by: §I, §II-A, §II-B.
  • [35] H. Yang, S. Li, Y. Ma, B. Yu, and E. F. Young (2018) GAN-opc: mask optimization with lithography-guided generative adversarial nets. In Proc. ACM/ESDA/IEEE Design Autom. Conf., pp. 1–6. Cited by: §II-A, §II-C.
  • [36] W. Ye, M. B. Alawieh, Y. Lin, and D. Z. Pan (2019) LithoGAN: end-to-end lithography modeling with generative adversarial networks. In ACM/IEEE Design Autom. Conf., pp. 107:1–107:6. Cited by: §I, §II-A, §II-B.
  • [37] Z. Yi, H. Zhang, P. Tan, and M. Gong (2017) Dualgan: unsupervised dual learning for image-to-image translation. In Proc. IEEE Int. Conf. Comput. Vis., pp. 2849–2857. Cited by: §II-B.
  • [38] B. Yu, Y. Zhong, S. Fang, and H. Kuo (2019) Deep learning-based framework for comprehensive mask optimization. In Proc. Asia and South Pacific Design Autom. Conf., pp. 311–316. Cited by: §II-A, §II-C.
  • [39] T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and A. A. Efros (2016) Learning dense correspondence via 3d-guided cycle consistency. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 117–126. Cited by: §I.
  • [40] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE Int. Conf. Comput. Vis., pp. 2223–2232. Cited by: §II-B, §III-A, §III, §III, Fig. 5, §V-C1.