## 1 Introduction

Metasurfaces, which constructed of 2-D artificial patterns of various materials in the subwavelength scale, have received enormous attention due to its ability of unprecedented control over the intrinsic properties of light, including the amplitude^{cheng2015structural, proust2016all}, phase^{khorasaninejad2016metalenses, khorasaninejad2017metalenses}, polarization^{yu2012broadband, guo2016broadband} and the orbital angular momentum^{devlin2017arbitrary}. The most critical feature of metasurfaces is that the spatially varying patterns or material compositions provide high freedom in designing spatial inhomogeneity over an optically thin interface. A number of planar optics such as filters^{cheng2015structural, proust2016all}, lenses^{khorasaninejad2016metalenses, khorasaninejad2017metalenses}, polarizers^{yu2012broadband, guo2016broadband} and absorbers^{liu2017experimental, azad2016metasurface} have been enabled by multifarious reflective or transmissive metasurfaces, featuring high optical performance as well as compact structures. Two central problems arise in the designing process of the metasurfaces. The first is to obtain an accurate prediction of the optical spectrum given a structure, named "Forward Simulation". Traditional methods settle this problem by solving Maxwell’s equations approximately using advanced calculations, including rigorous coupled wave analysis (RCWA), finite-difference time-domain method (FDTD), finite-element modeling (FEM) method and so on. The second is to find an optimal structure based on actual demands, where a nanostructure is generated when given a desired optical response as input, named "Inverse Design". Inverse design of photonic structures were conventionally demonstrated using adjoint sensitivity analysis^{piggott2015inverse, piggott2017fabrication, frandsen2016inverse, cao2003adjoint, phan2019high, molesky2018inverse}. However, effective methods are still time-consuming and lack generality in most cases, people have to find a near-optimal solution through a traversal in a limited database, which contains a finite parameter space and corresponding spectrums generated by forward simulation.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in computer vision, natural language processing, speech recognition ,and other applications^{lecun2015deep}. Deep learning has also been successfully applied to conventional science and engineering fields outside of computer science, such as condensed matter^{carrasquilla2017machine}, particle physics^{baldi2014searching}, chemical syntheses^{segler2018planning}, microscopy^{chen2016deep} and proteomics^{zhou2017pdeep, gessulat2019prosit}. The strong fitting ability of deep neural networks has also caused quite a stir in the optical community, neural networks (NN) can be used to simulate the optical response of a component (Forward Simulation) as well as design a topology given a desired optical response (Inverse Design).

### Related Work

In recent years, remarkable achievements with deep learning technologies have been made in the inverse design of optical devices ^{liu2018training, ma2018deep, malkiel2018plasmonic, peurifoy2018nanophotonic, asano2018optimization, tahersima2018deep, liu2018generative, an2019novel, jiang2019global, jiang2019free}, as well as several optical implementations of NN^{tait2017neuromorphic, mehrabian2018pcnna, lin2018all}.

Many applications of deep learning use feedforward neural network architectures. To go from one layer to the next, a set of units compute a weighted sum of their inputs from the previous layer and pass the result through a non-linear function^{lecun2015deep}

. The fully connected layer, convolutional layer and transpose convolutional layer are the most basic components of feedforward neural network. Fully connected network (FC) is composed of several fully connected layers, which is the simplest neural network structure and capable of handling one-dimensional vectors. Convolutional neural network (CNN) mainly consists of convolutional layers, which is often used for feature extraction tasks of multidimensional data. The transpose convolution can be considered as an upsampling process as opposed to the convolution process, and the transpose convolution layer is often embedded in many networks related to the generation pattern, like generative adversarial networks (GAN)

^{goodfellow2014generative}and fully convolutional networks (FCN)

^{long2015fully}.

Optimization problems in the field of optics are often very complex, but the problem can be modeled in a more regular and simple way by using several one-dimensional parameters, whether the problem itself is one-dimensional, two-dimensional or three-dimensional. D. Liu *et al.* used FC to learn non-unique electromagnetic scattering of alternating dielectric thin films with the varying combination of thickness and materials^{liu2018training}. They proposed a tandem architecture combining forward simulation and inverse design to overcome the issue of data inconsistency and slow training process, which has become the dominant architecture for solving similar problems. J. Peurifoy *et al.*

adopted a FC having 4 hidden layers, 100 neurons of each to approximate light scattering of multilayer shell nanoparticles of

and^{peurifoy2018nanophotonic}. I. Malkiel

*et al.*expounded the relationship between the spectral complexity and design feasibly, then provided an FC with dozen huge layers and multiple input entrance. Their method can be applied to direct on-demand engineering of plasmonic structures and metasurfaces

^{malkiel2018plasmonic}. Tahersima

*et al.*built a robust deeper network on the base of FC with a intensity shortcut proposed in deep residual networks (ResNet)

^{he2016deep}to inverse design integrated photonic devices, beam splitters for example, whose design space is considerably large

^{tahersima2018deep}. Recently, the approach brought by S. An

*et al.*overcomes three key challenges that have limited previous neural-network-based design schemes: input/output vector dimensional mismatch, accurate EM-wave phase prediction, as well as adaptation to 3-D dielectric structures

^{an2019novel}.

There is still some work focused on multidimensional representations to meet better requirements. T. Asano *et al.* provided a four-layer neural network including a convolutional layer for prediction of the quality factor in two dimensional photonic crystals^{asano2018optimization}. Z. Liu *et al.* proposed a GAN and a simulation neural network that efficiently discovers and optimizes unit cell patterns of metasurfaces in response to user-defined, on-demand spectra at the input^{liu2018generative}. W. Ma *et al.* reported a multi-task model, i.e. broke the task down into a primary task and an auxiliary task, comprising two bidirectional neural networks assembled by a partial stacking strategy, to automatically design and optimize three-dimensional chiral metamaterials ^{ma2018deep}. J. Jiang *et al.* showed that GAN can train from images of periodic, topology-optimized metagratings to produce high-efficiency, topologically complex devices operating over a broad range of deflection angles and wavelengths^{jiang2019free}

. Their another work transformed a GAN into a global optimizer by replacing the traditional discriminator with adjoint-based optimization algorithm and used gradient estimation method for back propagation, then combined both as a physics-driven, data-free neural network

^{jiang2019global}.

### Our Work

As shown in Figure (a)a, we primarily focus on the filtering function of resonant nanostructures which have been intensively studied, to be specific, the design of color filters based on 2-D periodic grating structures in the visible band. The parameters of devices such as various materials, shapes and layered permutations^{liu2018training}

determine the degree of freedom. Under the same restriction conditions, a higher degree of freedom granted to the unit cell enhances the probability of generating a qualified spectrum. Enlightened by previous research

^{liu2018generative, malkiel2018plasmonic, an2019novel}, we improved the degree of freedom a lot because the shape of device is described as an arbitrary binary picture representing the pattern of material on a substrate rather than a regular shape. However, both the requirements on network performance and the computational cost grow at the same time, illustrated in Figure (b)b. Our target is to implement the inverse design with the highest degree of freedom and the least computational cost.

On the grounds of the wide range of wavelength involved in the previous research^{malkiel2018plasmonic, an2019novel, liu2018training, liu2018generative}, more suitable spectra for the input constraints can be found to a certain extent. However, we focus on the visible band with a comparatively much narrower range of wavelength here, hence the speed of generating simulation data slows down so that the results of previous studies are prone to miss the sub-optimal solution due to the limited groping ability of the network. Aiming at this difficult problem, we proposed an encoding method called contrast vector to collect the information beneficial for training and two dedicated networks working in series. Moreover, we also proposed a semi-random method to improve performance, which will be discussed later. Qualitative analysis of traditional and our proposed methods can be found in Figure (c)c.

To conclude, our simulator can take place of traditional algorithms such as RCWA with more efficiency. In the same give design space, our generator can produce a generated spectrum for a real or artificial desired spectrum.

## 2 Methods

Our purpose is to acquire an optimized structure satisfying the specified response in the visible band, which inspires us to expand the search space by increasing the degree of freedom. Generally, a generative model named generator is realized elaborately. Additionally, we adopt a novel contrast vector as well as another neural network named simulator to educate the generator. The generator distills the input spectrum information as guidance to generate a structure satisfying the expected electromagnetic response; the simulator extracts information from the input structure and then gets regression estimation of the spectrum. They two can respectively solve forward simulation and inverse design problem efficiently with small error in the design space for artificial spectral input.

Polycrystalline silicon is chosen as the material of the 2D pattern with a fixed thickness of 500 nm, on a substrate made of silicon dioxide, because of its high refractive index and relatively low loss. Considering prior knowledge and actual manufacture requirements, the period ranges from 200nm to 400nm, and the shape is described by a 64 64 pixelated binary image. Besides, 29 points are used to uniformly quantify a single transmittance spectrum where the wavelength ranges from 400nm to 680nm. These limitations do not affect the universality of our method, which will be discussed later. Apparently, each pair of TE and TM responses at 400-680nm, represented by and with 29 points respectively, can de spliced into a spectrum . So the problem can be abstracted as following: given a specified , how to use algorithms to generate a structure described by a binary image and a period , whose response is that equals to . In other words, it equals to the that makes the similarity between and maximum.

To avoid missing the sub-optimal solution due to the limited wavelength range, we use a special encoding method to extract the spectrum information which benefits network training as well. We define the contrast of a certain range in a spectrum as the ratio of the maximum transmittance within this range to its counterpart outside. After that, a contrast vector of a particular spectrum can be obtained by sequentially concatenating several contrasts together. Because we are apt to keep all the values in one range small, we pay special attention to the maximum which represents the change in the spectrum. For a given defined above, let be the value of the th contrast, contrast vector is obtained according to the following algorithm 1. Then we have , and calculate identically. Finally, splice , in the same way that spliced , to get .

As shown in Figure (a)a, contrast vectors accentuate the peaks and valleys, while ignoring the effect of tiny vibrations in the spectrum. In essence, this weakens the strong correlation between the expected spectrum and the network input in an intelligible way, such that one definite network input can correspond to a great quantity of spectra, thus helping the network understand the meaning of artificial input.

Referring to the network structure of deep convolutional GAN (DCGAN)^{radford2015unsupervised}, the conditional inputs of conditional GAN (cGAN)^{mirza2014conditional} and the shortcut of residual network (ResNet)^{he2016deep}

, we design a network using Pytorch

^{paszke2017automatic}that can turn noise input (random seed for generation) into a qualified unit cell (one 2D binary image with one parameter) according to different guide conditions (target spectrum or target contrast vector). In order to simplify the training process, we use the structural similarity index (SSIM)

^{wang2004image}to evaluate the similarity between two images instead of a discriminator. As illustrated in Figure (b)b, we transform the routine GAN training into a supervised and non-alternating one. Detailed network configurations and training methods are presented in the Supporting Information A.1.

The simulator is trained with the data produced by the RCWA^{RETICOLO}, in which we accomplish augmentation for better performance. The data contain the corresponding spectra, shapes and periods. The inputs of the simulator are the period as well as the shape , and the output is the predicted spectrum

. The data augmentation method is that we rotate the shapes by 90, 180 and 270 degrees respectively, then exchange TE and TM responses for 90 and 270 degrees respectively. This allows the dataset to be expanded to improve the generalization of the model. We train the simulator to replace the traditional RCWA, because RCWA is relatively slow and breaks the gradient backpropagation. The loss function is the mean square error (MSE) between predicted spectrum

and real spectrum as bellow, where .(1) |

The generator produces the patterns based on the desired spectrum. The input of the generator is a contrast vector transformed from the desired spectrum and a random noise . The outputs are the generated shape and the period . It is worth noting that must be refined into a binary image in the process of predicting because our RCWA algorithm only accepts 0 or 1 to specify a material, which means air or polycrystalline silicon. The shape and period are the inputs of the simulator, which can predict the spectrum and guide the generator correctly. Data augmentation is not applied in the training process of the generator, because it makes the generator converge worse and the training period can be faster.^{liu2018training}. Spectrum loss is the MSE between the real spectrum from training data and the simulated spectrum from simulator; shape loss is the SSIM between the real shape and the generated shape ; period loss is the MSE between the real period and the generated period . The generator loss comprises these three parts, as shown below. and

are the two hyperparameters describing the relative importance of the three, which can be fine-tuned accordingly.

(2) | ||||

High-freedom inverse design is hard to achieve with merely a 2-D array or picture denoting the device since the information is limited. Therefore, several other essential parameters of the metasurface are considered as well in our method. In order to assist the simulator in feature extraction while underlining the importance of periods, the shape along with the period are supplied to the simulator at the same time. However, if we feed the shape as a 2-D array and the period as a real number, the simulator architecture will be asymmetric, leading to an impractical convolutional operation. Thus, the period is duplicated and expanded to a 2-D array in the same size as the shape, then concatenated with the shape to constitute multiple channels for the convenience of simultaneous convolution later. On the other hand, to help the generator extract higher-dimensional features, the noise vector, as well as the contrast vector, are expanded and concatenated. Inside the generator, the shape is produced by transposed convolution layers first, after which the period is generated by fully connected layers. We do not design a generator to get the shape and period with a single network module at the same time, because obtaining these two properties of the metasurface are the two different tasks that cannot use the same network structure with the same weights.

Our network provides a high degree of freedom to the structure to be optimized. It is more efficient than the traditional traversal method when we search the global design space. Because the initial network can map the random noise into the full design space, the probability of finding the optimal solution is enhanced. For our metasurface filter, the 1-D parameter considered is the period of the unit cell, but it can generally involve any combination of design parameters in the design problems including structure thickness, refractive index, or polarization light, etc. Moreover, as long as the database is sufficient, our network structure is suitable for all metasurface design problems that can be represented as 2-D binary graphs with parameters such as period and thickness.

## 3 Results and Discussion

To measure the performance of the simulator, we feed it with a randomly generated polygon. The real spectrum and the simulated spectrum are plotted in Figure (a)a. Statistically, the MSE between the real spectrum and the generated spectrum is 4% to 5% when we feed 5,200 paired shapes and periods beyond the training set. As for the generator, we feed a contrast vector calculated from the real spectrum into it. The generated and ground truth shape, as well as the period, are shown in Figure (b)b. Likewise, the MSE between the desired spectrum and the simulated spectrum of the generated structure is approximately 5% when the validation set contains 1,300 other real spectra consisting of 58 points each.

Our models perform well on the validation set. However, it is possible that the desired spectrum is different from any one in our data. To illustrate the universality and superiority of our design, we also carry out several comparative experiments. We still use an upside-down Gaussian function (mean=600, variance=40, amplitude=0.9) as the desired spectrum

and search for the most similar one by traversing in both our training set and validation set based on the MSE criterion. The detected spectrum with the smallest MSE and the matched device are shown in Figure (c)c. To testify the effectiveness of the contrast vector, we train another generator without using contrast vectors while fixing all other training hyperparameters. We feed in the same desired spectrum and the generated device and spectrum are shown in Figure (d)d. Using our well-designed generator mentioned in methods and feeding in the contrast vector converted from , the generated device and the spectrum are obtained, plotted in Figure (e)e. Considering that Polycrystalline Silicon is more lossy for blue light compared with red light, we adjust slightly, applying minor jitter to every single value while maintaining the overall tendency of the previous vector. For every randomized , the smallest value of contrast is 0.01, while the other ones are selected from 0.4, 0.5 or 0.6 with equal probability. Figure (f)f is the best one of the results obtained by feeding four different revised contrast vectors to the generator. From Figure (c)c to Figure (f)f, we can see that the obtained spectrum is getting closer to . It is worth noting that does not exist in our possible design space, according to Figure 6. That is why Figure (d)d-(f)f are not so satisfying as Figure (b)b. Nevertheless, our methods still produce an acceptable result. More results from the generator for artificial desired spectra can be found in Figure 4.We have found in the above experiments that if the desired spectrum is not a spectrum that real exists, i.e. the input set artificially according to the subjective demand, it is possible that the generated shapes vary greatly from the ones in our training set. For example, in Figure (e)e, the generated shape has not only one part in the middle but also another isolated one in the upper left corner. It shows that our neural network understands the mapping between devices and spectra during the training process so that it can produce a shape that is totally different from those in the training set.

As mentioned above, we transform the generated shape into a binary array in the process of testing but not implement it in our training process, since the mandatory binarization of an image makes gradients too steep which is not helpful for training. In order to solve this problem, previous work trained a simulator with a noise similar to the generated patterns to circumvent the binarization and smoothing during the predicting process

^{liu2018generative}. It is less efficient but can ensure that the input image of the simulator is binary and not so complex for manufacture. We purposely experiment to figure out how this post-process affects our results. We feed a non-binary generated image and a binarized version of it to the simulator respectively. Compared with the result given by RCWA, in Figure 6, spectra of these two shapes are both very similar to the ground truth. Besides, after training the generator, the mean Manhattan distance of pixels in all generated shapes is approximately 0.01. In other words, the intensity of each pixel ranges from 0 to 0.01 or 0.99 to 1 in an average sense. Thus, binarization can be ignored in the training process.

Since SSIM is utilized the training process, an unsupervised learning task is changed into a supervised one labelled with shape

. We also investigate whether SSIM should be applied as a part of the loss function. As shown in Figure 7, SSIM influences both the appearance and the contrast (equals to the degree of binarization here) of the shape generated by the generator, so its existence expedites the network convergence towards a specific direction. Considering that one spectrum can correspond to multiple structures when the generator is in different epochs, it is likely to generate different shapes for the same spectrum. If SSIM is not used, the same loss from distinct shapes will be given to the network in such case, and the network will be confused and difficult to converge.

In the future, we intend to utilize semi-supervised methods to decrease the demands of data. The architecture of neural networks can also be ameliorated and simplified because we observe that the gradients disappear under certain circumstances. Other possible extensions are worth further explorations, such as applying our models in other bands, using more shortcuts and try other descriptive ways other than contrast vectors.

## 4 Conclusions

In summary, for "Forward Simulation" problem, when the required precision is not high, our simulator can substitute the traditional simulation method with a great improvement in efficiency. For the generator in the same given design space, such as the arbitrary shapes of the metasurface and the periods within a specific range, our neural network model can generate a roughly optimal solution for the desired spectrum, even if the desired spectrum does not exist in the design space. Another advantage of our model is that the degree of freedom of the generated device is relatively high, for it is capable of synchronously optimizing the shape and the period of the metasurface. Compared with the traditional traversal method and other artificial intelligence methods, the diversity of our generated devices significantly increases under the premise of ensuring speed and accuracy. In addition to the visible band, our model can also be applied to other bands theoretically. Last but not least, as shown in Figure 3b, our generator can find different structures satisfying almost the same spectrum. Since our generator tends to generate regular patterns, it can be used to improve a relatively complex structure to meet the actual processing requirements better.

The methodology we have developed is readily to be used into migration application for the pursuit of desired complex values of the reflection and transmission coefficients, which is essential to the metasurface design where the amplitude, the phase and the incident angle of the light waves matter. In the future, our models can be ameliorated and simplified to adapt to problems where multiple one-dimensional parameters work together rather than considering a two-dimensional image with single one-dimensional parameter merely. Further subsequent improvement on network performance can also start from incorporation with other deep learning methods, such as using recurrent neural network of natural language processing to extract sequential information from spectrum, imitating the attention mechanism of computer vision to make image generation more explicable. To reduce the burden of generating training data brought by the demand on more parameters, the application of reinforcement learning and unsupervised learning in this field is also vital to mitigate the data’s dependency on human prior knowledge. We envision deep learning being widely applied to the optimization of metasurfaces, and even to the entire field of optics, so that scientists and engineers will be greatly relieved from the tedious process of trial and error methods and will focus more on truly creative thinking, just leaving repetitive tasks to machines.

## Appendix A Supporting Information

### a.1 Details about neural network

We generated approximately 6500 pieces of data by RCWA^{RETICOLO}. 80 percent of them are training set and 20 percent are validation set. Each piece of data includes a shape, a period and the corresponding spectrum. The spectrum consists of 58 points, two halves of which respresent TE and TM reponse sampled with equally spaced intervals between 400nm and 680nm. The shape is a 6464 binary image. The period is a integer between 200 and 400nm.

The simulator consists of convolution layers extracting information from images and fully connected layers converting images into vectors. For the simulator, the data is augmented before being fed. Adaptive moment estimation (Adam)

^{kingma2014adam}is used to update the gradient. Learning rate becomes smaller from 0.02 as the epoch becomes larger. We use a Telsa K80 GPU to train simulator with 500 epochs for approximately half an hour.

The generator consists of the deconvolution layers to generate images from sequences and the fully connected layers to obtain features from images. The shape is generated first after deconvolution layers, then the period is produced after fully connected layers. Besides, we use a shortcut to make the training process more stable and fast. The input noise is sampled from a uniform distribution between 0 and 1. Adam is used to update the gradient, and the learning rate becomes smaller from 0.02 as the epoch becomes larger. We use a Telsa K80 to train the generator with 1000 epochs for approximately an hour.

The detailed hyperparameters we used are listed in Table 1, and loss curves of simulator and generator are shown in Figure 8.

Simulator | Generator | |
---|---|---|

Epoch | 500 | 1000 |

Batch size | 1024 | 256 |

Optimizer | Adam (=0.5, =0.999) | Adam (=0.5, =0.999) |

Learning rate | Initial=0.02, Step=100, =0.5 | Initial=0.02, Step=200, =0.5 |

Initial strategy | Mean=0, Std=0.02 | Mean=0, Std=0.02 |

Loss parameter | None | =0.05, =0 |

Color | Yellow | Cyan | Lavender | |
---|---|---|---|---|

Function | Convolution | Transpose Convolution | Full Connection | |

Color | Blue | Orange | Purple | Red |

Function | Normal Vector | Leaky ReLu (p=0.2) |
Tanh | Sigmoid |

### a.2 Algorithm for generating random structures

### a.3 Contrast vectors

If we use the spectra with 58 points instead of the contrast vectors, the loss curve of the generator suggests that the spectrum loss is smaller, approximately 4% error rate. Nevertheless, if we feed an artificial spectrum to it, the output results deteriorate a lot. As we discussed in the paper, the suboptimal solution cannot be found because the network aims at minimizing MSE ultimately. In practice, the network pays more attention to the contrasts instead of inconsequential details of a spectrum, thus leading to a better result. Besides, we are indifferent to the detailed features of the desired spectrum. So we need a more intuitive method to describe it. It makes senses that the use of contrast vectors is essentially a trade-off for universality at the expense of network prediction accuracy.

Comments

There are no comments yet.