Inverse design of nanophotonics structures, i.e. obtaining a geometry for a desired photonic function, has been a challenge for decades. Due to the highly nonlinear nature of this optimization problem, it requires, , when applying evolutionary or topology optimization algorithms, hundreds to several thousands of iterations for a single design task. Recently, modern machine learning algorithms have been applied to the inverse problem in nanophotonics and demonstrated great promise.
The interaction of light with nano-scale material, embedded in dielectric, can be characterized by various properties of the outgoing light [somRef4]. Figure 1a illustrates such optical response, for which a white light (containing all the colors) interacts with a metallic subwavelength geometry. This interaction results with partial transmission due to absorption and scattering. The partial transmission entails that these nano-scale geometries cannot be observed by a conventional microscope. This property is also known as the diffraction limit, which stipulates that optical information smaller than roughly half the illumination wavelength is not retrievable.
Predicting the optical response of a nanostructure geometry requires solving the full set of Maxwell equations. This problem, denoted by ’direct problem’ in figure 1a, is also considered as the more feasible problem, and can be solved via simulations. The more challenging direction is the ’inverse problem’ of inferring the nanoscale geometry from a measured or desired spectra.
The major contributions that have been published so far to design nanostructures by utilizing machine learning techniques, can be categorized into three categories as far as the designed structures are concerned. The first, and the most fundamental one, is obtaining a model that is capable to design nanostructures from the same shape and material it was trained on, but with different properties, such as sizes, angles, host material and so on. Work such as [peurifoy2018nanophotonic, sajedian2019finding, liu2018training, ma2018deep, malkiel2018plasmonic, malkiel2018deep] fall within this category where the general structure is maintained (eight alternating shells particle or m alternating layer of thin films) and the ML algorithm works to provide optimized parameters of the structure. The second category incorporates models that are able to generalize and designing geometries with shapes that differ from the set of shapes used during training, but are still considered to be in the same family, i.e. the model can generalize to other shapes that are similar but not identical to the set of shapes it was trained on. For example, in this work, we showcase that our model can infer a “L” shaped nanostructure, given matched spectra, while the model was trained on different shapes, such as “H”, “h”, “n”, etc. Additional attempts to devise such a model have been recently presented in [liu2018generative], where the authros tried to test the generalization ability of a model by training a model on a set of digist, leaving one digit as a test set, however in their work, the model designed a shape from the set of shapes it was trained on, and didn’t seem to generalize as expected. The third category incorporates models that are able to design any geometry, with any shape, achieving the ultimate generalization capability. The generalization ability of such models should be verified via a proper holdout test set, comprising structures sampled from a completely different distribution the model was trained on. To this end, studies that argue to provide a model that is able to design nanostructures for any spectra, should put extra care in constructing a test set that would verify the generalization level of the model at hand.
The above three categories, illustrated in Figure 1b, are ordered by the complexity of the underlying physical problem. Whereas the most desirable capability is of course the latest category, which can design any geometry with any shape.
To verify such a property, one may need to harvest a synthetic dataset with great diversity that spans the entire distribution of supported geometries. However such a dataest is not yet available in the community, and may take hundreds of simulation hours to harvest. In this work, we utilize the dataset from [malkiel2018deep, malkiel2018plasmonic], and use it to veirfy that our suggested model can generalize at least to the level of category two described above.
Under the context of the model at hand and the assumption that a large volume of data is available for learning, the first immediate step to achieve such a generic capability is to design a model, that have enough degrees of freedom to allow the design of any geometry.
Previous work [peurifoy2018nanophotonic, sajedian2019finding, liu2018training, malkiel2018deep, malkiel2018plasmonic, malkiel2017deep]
, introduced a model that can be classified under the first category above, i.e. the model is able to infer geometries of the same or similar shapes it was trained on, which have variable sizes, angels and epsilon host materials. However, in order to be able to design any geometry, one should allow a larger degree of freedom. Specifically, in[malkiel2018deep, malkiel2018plasmonic]
the model architecture was designed to retrive coding vectors that encode the geometry shape to the H shape family. To somehow circumvent the inherent limitation of this encoding, further degrees of freedom are obtained as the authors asked the model to predict each edge presence, the length of the edge and the angle between the inner edge and the top right edge.
As we look to expand these capabilities to the second or even third more desirable categories, perhaps the most direct way to allow a model to design any geometry, is to adopt an architecture that supports generating any shape.
In this work, we adapt a few key properties from the pix2pix architecture. Our model receives spectra as an input, and retrieves a 2D image, for which the pixels forms the designed nanostructure geometry shape. Hence, we name our method spectra2pix.
Spectra2pix model aims to expand the capabilities of previous work to the second or even third more desirable categories. The model focuses on solving the inverse problem of inferring a nanostructure geometry from a given spectra and material properties. Differently from the previous bi-directional model [malkiel2018deep, malkiel2018plasmonic], the spectra2pix architecture supports the generation of any geometry, by training the model to regress the raw pixel values of the 2D images of the geometries at hand. The training task is being enforced by optimizing the spectra2pix model to minimize a pixelwise loss term, applied on the generated image with the ground truth image. In this work, we have published a new version of the dataset introduce in [malkiel2017deep, malkiel2018deep, malkiel2018plasmonic], incorporating the 2D images of the geometries. The dataset and the code can be found on https://github.com/ItzikMalkiel/spectra2pix.
Our contribution is three-fold: (1) We introduce spectra2pix, a model that conceptually can design any 2D nanostructure geometry. (2) We are the first to report a successful generalization ability of the model, exhibiting the design of geometries sampled from a fairly different distribution that the model was trained on. Which is associated with level 2 described above. (3) We transform the dataset from [malkiel2017deep, malkiel2018deep, malkiel2018plasmonic] to 2D image representation, and publish the new version to the community.
In the method section, we present the model architecture and training setting. Next, in the results section, we show qualitative results of our spectra2pix model, showcasing the generalization ability of our model. The results we show here hold great potential for the more general goal which will, however, require more extensive, broader, and more generic dataset. This can be addressed in further research where, possibly, the learning dataset could be crowdsourced from the ML nanophotonics growing community.
2 Related Work
Malkiel at el [malkiel2017deep, malkiel2018deep, malkiel2018plasmonic] introduced the first neural based model for the design of nanostructures. In this work, the authors proposed a bi-directional neural network architecture, which is able to solve both the inverse problem of designing nanostructure and the direct problem of inferring the optical characteristics of the designed geometry. The advantages of the bi-directional model are twofold. First, a bi-directional model is able to streamline the design process, by retrieving an immediate prediction for the optical properties of the designed nanostructure. That way, the designer can match the desired spectra with the recovered spectra, which can be used also in understanding the confidence level of the model for the specific design. Second, a bi-directional model allows co-adaptation between the networks of both directions, leading to better robustness and higher stability for the predictions. The introduced model was trained on synthetic data centered around different variants of the H shape, and was also applied on measured spectra form nano-fabricated materials conducted in lab. This model architecture is inherently limited to H family. This is to date the only fabricated experimental demonstration of the geometry prediction capability of deep learning network.
In the same studies above [malkiel2017deep, malkiel2018deep, malkiel2018plasmonic], the authors showcase the ability of the model to infer geometries of the same or similar shapes it was trained on, which have variable sizes, angles and epsilon host materials. This experiment corresponds to category one, as described above. However, in order to design any geometry of any shape, one should allow a larger degree of freedom. Specifically, the bi-directional architecture was designed to retrieve coding vectors that encode the geometry shape of the “H” family.
In [liu2018generative] the authors propose a generative adversarial network (GAN) for generating 2D nanostructure images from spectra. The authors created a synthetic dataset of geometries associated with multiple families, such as squares, circles, sectors, crosses and shapes from the MNIST dataset (which incorporate handwritten digits). Then, the authors showcased the ability of the model to randomly design test samples from each one of the families above, using a model that was trained with samples from all families. This evaluation corresponds to category one presented above, as the model task is to infer geometries from the same template it already saw in the training (this time, only with different attributes such as sizes, angles etc).
During the second evaluation described in [liu2018generative], the authors tested a higher level of generalization, which correlates to the second category described above. In this evaluation, the authors used a holdout test set comprises of a complete sub-family set of geometries. Specifically, the authors decided to keep all the samples that corresponds to digit “5” from the MNIST family in a holdout test set, and trained their model on the rest of the dataset. As reported in [liu2018generative], the topologies of the predicted geometry and the ground truth geometry differ considerably (the predicted geometry composed a variation of the digit ‘3’), but the overall spectra of the predicted geometry possess somehow similar features to the input spectra, with some discrepancies in few specific locations. In addition, the authors also argue that without GAN training, their model collapses, and generates images of random pixels. When optimizing an inverse function of a single network, one can often obtain a solution that satisfies the inversion criteria, but which, however, does not create a valid input, as has been shown in the case of Adversarial examples [goodfellow2014explaining]. This is why, similarly to the mapping between MNIST and SVHN digits results presented in [taigman2016unsupervised], a GAN loss is needed. The one-digit-left-out experiments are also very much in line with those presented for the digit mapping when one of the digits is removed, and are, therefore, more indicative of the generalization power of deep networks than on the specific physics problem. An alternative way to GANs to improve generalization may be to rely on activations from multiple layers of the direct network, as is done in the perceptual loss [johnson2016perceptual].
Compared to [liu2018generative], in this work, we utilize our spectra2pix model and showcase the ability of our model to converge without GAN training, and more importantly, we demonstrate a successful generalization ability of the model to design a complete unseen sub-family set of geometries, taken from fairly different distribution the model was trained on. This generalization capability is associated with category two described above.
In [ma2018deep], the authors introduce a model incorporating two bi-directional networks along with a synthetic dataset composed of vectorized representation of geometries associated with materials, reflection spectra and circular dichroism spectra. The dual bi-directional model comprises two networks, primary network and auxiliary network. The primary network predicts back and forth the geometry encoding vector and its associated reflection spectra Fig. 4 The auxiliary network, predicts back and forth the geometry encoding vector and its associated circular dichroism (CD) spectra. Both networks are separately trained using the dataset above. The authors show that a model that combines both the auxiliary network and the primary network yields more accurate predictions.
In [sajedian2019finding], the authors suggest a Neural Network that solves the direct problem of inferring spectra for a given geometry. This problem can be solved via (slow) simulations, and is considered to be more feasible compared to the ill-posed inverse problem, of inferring a geometry for on-demand spectra.
3 Proposed Method
The section presents the problem setup, the spectra2pix architecture and loss function.
3.1 Problem Setup
Let be the set of all supported spectra. Let be the set of binary 2D square images of all geometries, where , and is the dimension of the images. Each geometry image is associated with a valid pair of spectra , where . Each element in the pairs of spectra is associated with different polarization (vertical or horizontal). Let be the set of all supported materials. In this study, without loss of generality, we use one material (gold), and a real valued epsilon host .
We will define to be a model that maps pairs of spectra associated with material properties, into a 2D image that comprises the matched geometry. Given a set of quadruplet training elements
our goal is to train a model M such that for all , the generated image
approximates the label image g with a high accuracy. To this end, we utilize a training procedure that minimizes a loss function, applied between the generated images and the ground truth images.
3.2 The Loss Function
Our loss function defined as:
which solely rely on the pixelwise comparison between the generated image and the ground truth image.
By employing a pixelwise loss function on the generated images, our spectra2pix model learns to approximate the hidden inverse function between (1) spectra and material properties and (2) geometries.
3.3 Model Architecture
The architecture of spectra2pix is composed of two parts. The first part receives the vectorized representation of the pair of spectra and the material properties as input and apply a set of parallel sequences of fully connected layers. Each sequence of fully connected layers receives a different part of the input data (different polarizations and host material), and utilize a different set of learnt weights.
The second part of the model architecture, receives the three outputs of the last fully connected layers from the first part, concatenate these three intermediate vectors into one unified representation. The unified vector is then transformed into a higher dimension, by utilizing a fully connected layer. Next, the higher dimensional vector is reshaped into a matrix, and forward through a sequence of three convolutional layers, each followed by a bias and ReLU activations. Each convolutional layer incorporates ten filters with kernel size of, except of the last layer which utilizes a single filter. The output of the last convolutional layer is the generated image, denoted by . Then the loss function between the ground truth g and the generated image is applied. The Spectra2Pix model is illustrated in Fig 2.
4 Experimental Results
4.1 The Dataset
In this work, we utilize the dataset from [malkiel2018deep, malkiel2018plasmonic] . This dataset comprises 13K samples of synthetic experiments. Each sample associated with a geometry, a single polarization (vertical or horizontal) and material properties. By pairing the polarization, we formed 6K experiments comprising the quadruplet . The original dataset contains four materials: Gold, Silver, Palladium and Aluminium. Since most of the experiments are associated with Gold or Silver, and since both materials show a strong correlation in their spectra, in this study, we utilize only the experiments where the nanostructures are made of gold, without loss of generality. Nonetheless, we keep the variable values for the host material, each experiment is associated with. The epsilon host dielectric values vary in the range [1.0,3.0].
The geometries are composed of different combinations of edges, which together forms a template of the shape “H”. All three data parts, geometry, spectrum and material properties, are represented as vectors. Specifically, for the geometries, an eight-dimensional encoding is used. Five dimensions encode the presence of each one of the five edges of the H shape (binary values). Two dimensions encode the size of (1) the outer edges (which share the same size) and (2) the inner edge. The last dimension represents the angle between the top left outer edge with the inner edge (angles are between 0 to 90).
We transformed the above geometries representation into 2D binary images. A sample of the transformed images, along with the matched spectra of each geometry can be seen in Fig.3. The transformed dataset is attached as supplementary and is available for the public at https://github.com/ItzikMalkiel/spectra2pix.
4.2 Towards Generalization
To study the ability of Spectra2Pix to generalize, we split the above dataset into train, test and validation sets. The test set contains all the geometries of the shape L and their variants, i.e. the test contains all L shape geometries with different angles for the top left outer edge, including geometries that are relatively similar to L, such as ’U’ with a top left angle that is bigger than 70 degrees. The train set contains all the rest of the experiments, leaving 5% as a holdout validation set. In summary, the size of the train, validation and test sets used in this study are 3.3K, 150, 700, respectively. A representative sample of test set can be seen in Fig.4.
We train the Spectra2Pix network for 1M training steps, with a batch size of 64. Adam optimizer is being used with a learning rate of , , , and . We use the validation set for early stopping.
At the end of the training, we used the model to infer geometries for the test set. Figure 5 exhibits a representative sample from the test set predictions. Each row represents a different query. The left column exhibits the input spectra (both vertical and horizontal polarization), the middle presents the generated geometry and the right showcase the ground truth label g. For the first row, a spectra of L shape geometry, along with an epsilon host of 1.0, was fed into the Spectra2Pix model. The model generated an image of L shape with a somehow similar size and angle, but with different orientation. In the second row, the model was able to generate an L geometry with a similar size and angle, but with a symmetric pose, which does not affect the spectra. In addition, for this sample, and since we plot the raw values of the generated image, it can be seen that the model is not confident enough about the size of the bottom edge, as some artifacts are presented in the left side of the generated image. For the third experiment, the model was able to infer a fairly accurate L shape. Overall, these results showcase the ability of spectra2pix to generalize to unseen geometries sampled from a fairly different distribution the model was trained on.
4.3 Learning host material
Figure 6 showcase the ability of Spectra2Pix model to learn the dependence of the epsilon host material. In this figure, we queried the network with two different pairs of spectra that are associated with the same geometry but with different epsilon host material. The first pair corresponds to a “L” geometry, hosted in an epsilon dialectic of 1.0. The second pair associated with an identical geometry, but hosted in an epsilon dielectric of 2.0.
Additionally, we explored some of the limitation of the above method and dataset. Figure 7 showcase three representative samples of geometries for which the model produced non-optimal designs.
The observed discrepancies can be categorized into three classes. The first corresponds to low confidence of the model, for the existence of some edges. For example, the first row in Fig. 7 presents a design of L shape, where the model was able to infer somehow a relatively solid design for a symmetric L (which shares the same spectra as the ground truth), but the generated image comprises artifacts in few places, especially in the area of the continuation of the inner edge upon the left direction, which looks like an extra edge which form a superposition of two L shapes (each one is a mirror of the second one). The second class relates to the existence of extra small edges. For example, in the second row of Fig.7, the model was able to generate a similar L shape, somehow with a non-accurate angle and an extra small edge at the bottom. The extra small edge at the bottom has a negligible affect on the spectra for both polarizations, and might compromise for the discrepancy in the angel of the top left outer edge of the predicted geometry. The third class incorporate failure cases where the model designs a different geometry than intended. These designs should follow a verification using simulations, since they might yield similar spectra to the input spectra, although they share different shape compared to the ground truth geometry. Alternatively, a bi-directional model can shade light on the verification of such designs. An example of this category can be seen in the third row of Fig.7, for which, it also seems that the model predicted a superposition of two ’L’ shapes.
We attribute the above discrepancies to: (1) the complexity of the underlying physical process. (2) The difficulty to generalize to completely unseen geometries, given a relatively smaller-sized training set comprising only eight different shapes with 4K variants (where a big portion of the samples give emphasize to the different variants of epsilon host, rather than the richness of the geometries). (3) The existence of multiple valid functions, since the same spectra and material properties can be matched with multiple geometries. For example, for some geometries such as L shapes, flipping the geometry doesn’t affect the spectra.
The second difficulty mentioned above can be solved by (A) the utilization of a larger and richer dataset, that would ease the generalization and robustness of such a design model, or by (B) leveraging a bi-directional architecture, which utilizes an extra model (say “pix2spectra” architecture) that predicts back the matched spectra of the generated image. The bi-directional model can regularize the training, and encourage the spectra2pix model to produce images with higher confidence and less artifacts, since given an accurate pix2Spectra model, image artifacts and low confidence of the existence of edges would yield higher penalty in the predicted spectra.
The third difficulty above, which indicates that the hidden function between spectra and geometry is not a well-defined function, can be solved by leveraging Generative Adversarial Networks (GANs). A GAN based model, incorporating a discriminator network, may be able to detect low confidence of existence of edges, image artifacts and superposition of the same edge, as generated image, which will then encourage the generator model to avoid such behavior. GANs can also be used to produce a set of different geometries that matches a single input spectra. In this work we leave the above for farther investigation.
The use of machine learning techniques and deep learning in particular has spawned huge interest over the past few years in the nanophotonics communities due to the great promises these techniques offer for the inverse design of novel devices and functionalities. In this paper, we introduce spectra2pix, a model comprising of ultimate degrees of freedom, that conceptually allows the design of any 2D geometry. In addition, we present the ability of spectra2pix to successfully generalize for the designing a set of completely unseen sub-family of geometries. Our results highlight the importance and the generalization ability of Deep Neural Networks, towards the goal of inverse design of any nanostructure with at-will spectral response. To our knowledge, and compared to other work in the field, spectra2pix is the first model to present a generalization ability of designing a completely unseen sub-family of geometries sampled from a fairly different distribution the model was trained on.