Designing compact integrated color filters with ultra-narrow bandwidth is of great importance for realizing practical multispectral and hyperspectral imaging. Each pixel of a hyperspectral imaging device records the spectrum of light from the environment, providing significantly more information in comparison with conventional imaging techniques, that can help with, e.g., materials identification or objects detection. Such devices can find a wide range of applications in different areas of science and technology, including medicine, material science, astronomy, environment monitoring to name a few.
Surface plasmon polaritons (SPPs) allow for extreme miniaturization of integrated photonic devices via strong light confinement that can result in very small wavelength of light (potentially dozens of times smaller than in free space) . Hence, such plasmonic devices look very promising as a platform for designing ultra-compact integrated narrow-band photonic filters [26, 3]. Periodic arrays of subwavelength holes or nanoslits in metal films enable efficient excitation of SPPs by satisfying momentum-matching with the addition of a grating wavevector. The grating materials, geometry, and symmetry control the excitation efficiency . In particular, periodic arrays of subwavelength apertures passing through an optically thick metal film exhibit enhanced transmission exclusively at conditions corresponding to constructive mutual interference between incident light and SPPs traveling along the surface between adjacent slits and acting as a band-pass color filter . However, periodicity of slits can be effectively achieved by placing reflective mirrors around a slit.
Designing such plasmonic mirror filters sets up a non-trivial optimization challenge: number of independent parameters can easily exceed a few dozens, optimization landscape itself is non-convex with many local minima. Additional challenge arises from the fact that the derivation of analytical model is nearly impossible due to near-field effects and complicated geometry, hence numerical simulations of underlying physical processes (governed by Maxwell’s equations) are required.
Here we present an application of an in-house developed optimization strategy based on multi-fidelity Gaussian processes  to this nanophotonics design problem. Our simulation setup allows us to easily control the fidelity of numerical calculations by changing geometric mesh size and total time duration of simulated physical processes. We compare it with a conventional Gaussian Processes approach and a commonly used algorithm—Particle Swarm Optimization, which is implemented in Lumerical commercial nanophotonics software111https://kb.lumerical.com.
2 Background and Related Work
2.1 Background on Nanophotonics Optimization Problem
We employ numerical methods to solve for an optimized design of this plasmonic color filter. We use Lumerical commercial implementation of the finite-difference time-domain (FDTD) method to simulate the transmission spectra of such devices. In post-processing analysis, we extract observable scalar figures of merit corresponding to the goodness of the spectrum such as transmission peak amplitude, its offset from designed wavelength, full-width half-maximum, signal-to-noise ratio, and combine them into one weighted figure of merit (FOM). Using this FOM, the evolution of the design can then be pursued as a minimization problem over a geometric parameter space, which can be driven using any of a variety of iteration schemes. Here, we will search over a five parameter domain, describing the geometry of our devices.
The filters center wavelength, linewidth, and amplitude are determined by the interaction of several physical processes including the amplitude and phase of photon-plasmon coupling at the slits, the strength of mutual coupling between the propagating waveguide channels, and the effective index of each participating electromagnetic mode. Due to the interplay of these several physical resonances, the FOM corresponding to our filters exhibits oscillations in parameter space that are likely to trap a local directed search method in a globally non-optimal local solution. Therefore, gradient descent or other local optimization modalities are excluded for this purpose. Conversely, the relatively large computational cost of the FDTD forward problem limits the applicability of purely stochastic approaches like evolutionary methods. Instead, we seek methods which execute global, derivative-free search with an efficient iteration strategy that calls the forward problem solver a limited number of times (under defined budget).
2.2 Numerical Approaches to Black-box Optimization
In this section, we review related numerical approaches for black-box optimization.
2.2.1 Classical Approaches on Single-fidelity Blackbox Optimization
There are many stochastic heuristics for finding approximate solutions of non-convex optimization problems, such as simulated annealing10], particle swarm optimization (PSO)  and many others. In this paper, we are using PSO as one of the baselines for comparison, as it has a wide use in nanophotonics community [18, 20] and it is implemented in Lumerical commercial nanophotonics software.
In the particle swarm algorithm, the potential solutions, called particles, are initialized at random positions and velocities, and then move within the parameter search space. The particles are subject to three forces as they move: spring force towards the personal best position ever achieved by that individual particle, spring force towards the global best position ever achieved by any particle, a frictional force, proportional to the velocity. At each iteration velocities are stochastically updated based on these forces and previous velocity values, then new particles locations are calculated as the old ones plus the velocities, modified to keep particles within bounds. The algorithm proceeds until a specified stopping criterion is met.
PSO is inspired by the behavior of animal aggregations like flocks of birds or insects swarming. Each particle is attracted to some degree to the best location it has found so far, and also to the best location any member of the swarm has found. After some steps, the population can coalesce around one location, or can coalesce around a few locations, or can continue to move.
Gaussian processes optimization
Optimizing an unknown and noisy function is a common task in Bayesian optimization. In real applications, such functions tend to be expensive to evaluate, for example tuning hyperparameters for deep learning models, so the number of queries should be minimized. As a way to model the unknown function, Gaussian process (GP)  is an expressive and flexible tool to model a large class of functions. A classical method for Bayesian optimization with GPs is GP-UCB  which treats Bayesian optimization as a multi-armed bandit problem and proposes an upper-confidence bound based algorithm for query selections. The authors provide a theoretical bound on the cumulative regret that is connected with the amount of mutual information gained through the queries.  directly incorporates mutual information into the UCB framework and demonstrated the empirical values of their method.
Entropy search  represents another class of GP-based Bayesian optimization approach. Its main idea is to directly search for the global optimum of an unknown function through queries. Each query point is selected based on its informativeness in learning the location for the function optimum. Predictive entropy search  addresses some computational issues from entropy search by maximizing the expected information gain with respect to the location of the global optimum. Max-value entropy search [25, 24] approaches the task of searching the global optimum differently. Instead of searching for the location of the global optimum, it looks for the value of the global optimum. This effectively avoids issues related to the dimension of the search space and the authors are able to provide regret bound analysis that the previous two entropy search methods lack.
2.2.2 Multi-fidelitiy Bayesian Optimization
Multi-fidelity optimization is a general framework that captures the trade-off between cheap low-quality and expensive high-quality data (cf. Fig. 1). There have been several works on using GPs to model functions of different fidelity levels. Recursive co-kriging [7, 15]
consider an autoregressive model for multi-fidelity GP regression, which assumes that the higher fidelity consists of a lower fidelity term and anindependent GP term which models the systematic error for approximating the higher-fidelity output. Therefore, one can model cross-covariance between the high-fidelity and low-fidelity functions using the covariance of the lower fidelity function only. Virtual vs Real 
extends this idea to Bayesian optimization. The authors consider a two-fidelity setting (i.e., virtual simulation and real system experiments), where they model the correlation between the two fidelities through co-kriging, and then apply entropy search to optimize the target output. Zhang et al. (2017) model the dependencies between different fidelities with convolved Gaussian processes , and then apply predictive entropy search (PES)  to efficient exploration.
Although these multi-fidelity heuristics have shown promising empirical results on some experimental datasets, little is known about their theoretical performance. Recently, Kandasamy et al. (2016) propose MF-GP-UCB (Multi-fidelity GP-UCB) 
, a principled for multi-fidelity Bayesian optimization. In particular, the authors consider an iterative two-stage optimization procedure and view each fidelity as an independent component, and at each iteration update the estimate of each fidelityonly based on observations from the corresponding fidelity. In a follow-up work , the authors address the disconnect issue by considering a continuous fidelity space and performing joint updates to effectively share information among different fidelity levels. However, as elaborated on in , these approaches are likely to pick sub-optimal actions in some pessimistic cases, due to the modeling assumption and the two-stage query selection criteria. In this paper, we focus on MF-MI-Greedy, a principled multi-fidelity algorithm as recently proposed in . We describe the details of the algorithm in §3, and evaluate it against the MF-GP-UCB algorithm as well as other single-fidelity baselines in §4.
3 Multi-fidelity Optimization for Photonic Nanostructure Design
3.1 Preliminary and Problem Formulation
Consider the problem of maximizing an unknown payoff function . We can probe the function by directly querying it at some and obtaining a noisy observation , where
denotes i.i.d. Gaussian white noise. In addition to the payoff function, we are also given access to oracle calls to some unknown auxiliary functions ; similarly, we obtain a noisy observation when querying at . Here, each could be viewed as a low-fidelity version of for . For example, if represents the actual reward obtained by running a real physical system with input , then may represent the simulated payoff from a numerical simulator at fidelity level .
We assume that multiple fidelities
are mutually dependent through some fixed, (possibly) unknown joint probability distribution. In particular, we model with a multiple output Gaussian process; hence the marginal distribution on each fidelity is a separate GP, i.e., , where specify the (prior) mean and covariance at fidelity level .
Let us use to denote the action of querying at . Each action incurs cost , and achieves reward
That is, performing (at the target fidelity) achieves a reward . We receive the minimal immediate reward with lower fidelity actions for , even though it may provide some information about and could thus lead to more informed decisions in the future. W.l.o.g., we assume that , and .
Let us encode an adaptive strategy for picking actions as a policy . In words, a policy specifies which action to perform next, based on the actions picked so far and the corresponding observations. We consider policies with a fixed budget . Upon termination, returns a sequence of actions , such that . Note that for a given policy , the sequenceand the (random) observations of the selected actions. Given a budget on , our goal is to maximize the expected cumulative reward, so as to identify an action with performance close to as rapidly as possible. Formally, we seek
3.2 The MF-MI-Greedy Algorithm
We briefly describe MF-MI-Greedy proposed in , a mutual information based multi-fidelity Gaussian process optimization algorithm. It consists of two components: an exploratory procedure to gather information about the target level fidelity function via querying lower fidelity functions; an exploitative procedure to optimize the target level fidelity with the previously gathered information.
MF-MI-Greedy considers an information-theoretic selection criterion for choosing low fidelity queries. The quality of a low fidelity query is measured as the information gain per unit cost, defined as the amount of entropy reduction in the posterior distribution of the target fidelity function divided by the cost of the query: . Here, denotes the set of previously selected actions, and denote the observation history. As shown in Algorithm 2, this criterion is used greedily to select queries for low fidelity functions. To ensure that the algorithm does not explore excessively, we consider the following stopping conditions: (i) when the budget is exhausted (Line 2), (ii) when a single target fidelity action is better than all the low fidelity actions in terms of the benefit-cost ratio (Line 2), and (iii) when the cumulative benefit-cost ratio is small (Line 2). Here, the parameter is set to be where is the allocated budget.
At the end of the exploration phase, MF-MI-Greedy updates the posterior distribution of the joint GP using the full observation history, and searches for a target fidelity action via the (single-fidelity) GP optimization subroutine SF-GP-OPT (Line 1). Here, SF-GP-OPT could be any off-the-shelf Bayesian optimization algorithm, such as GP-UCB , GP-MI , EST  and MVES , etc. Different from the previous exploration phase which seeks an informative set of low fidelity actions, the GP optimization subroutine aims to trade off exploration and exploitation on the target fidelity, and outputs a single action at each round. MF-MI-Greedy then proceeds to the next round until it exhausts the preset budget, and eventually outputs an estimator of the target function optimizer.
3.3 Practical Implementation
In Algorithm 2 and the algorithm used for SF-GP-OPT, we need to find the argmax of a function. For the photonic nanostructure experiment in §4, this optimization is over a discrete set of candidate queries. Naively, we would need to evaluate the function at each query point in order to determine the optimizer, which is a costly operation. Instead, we devise an approximate optimization step to address this computational challenge. We first directly optimize the function over its continuous domain and obtain an optimizer. Then we project the optimizer down to the candidate set by selecting the closest available query point based on Euclidean distance. This approximation scheme takes advantage of existing fast optimizers for continuous functions and becomes necessary for large candidate size.
4 Experimental Design for Photonic Nanostructures Discovery
Our nanophotonic structure is characterized by the five geometric parameters. For each parameter setting, we use a score, commonly called a figure-of-merit (FOM), to represent how well the resulting structure satisfies the desired color filtering property. By minimizing FOM, we can find a set of high-quality design parameters. Traditionally, FOMs can only be computed through the actual fabrication of a structure and subsequent measurements of its various physical properties, which is a time-consuming process. Alternatively, simulations can be utilized to estimate what physical properties a design will have, e.g. using the Lumerical software. By solving a 2D variant of Maxwell’s equations, we could simulate the transmission spectrum of a given nanophotonic device and then compute FOM from it. We could obtain different fidelity level data by controlling aspects of the numerical solution process.
We experiment with three design tasks for filtering light with wavelengths of 550 nm, 650 nm and 750 nm. For each task, we vary the conformal mesh size and the time-domain solver total time duration of simulated physical processes to obtain two sets of multi-fidelity data, each with three fidelity levels on 4983 designs.
The first set of data is based on different conformal mesh sizes. The mesh size determines how accurate the final results are, with finer meshes lead to more accurate results. We generated the lowest fidelity data using a mesh size of , the middle fidelity and the target fidelity . The costs, CPU time, are inverse proportional to the mesh size, so we use the following costs for our three fidelity function evaluation, respectively.
The second set of data is based on the different total time duration of simulated physical processes for the time-domain solver. Since the transmission spectrum is calculated through Fourier transform of the electromagnetic pulse, that is passed through the color filter, we expect more accurate solutions with longer physical simulation time duration. We generated the lowest fidelity data using a simulation time of 40 fs (femtoseconds), the middle fidelity 70 fs and the target fidelity 100 fs. The costs are proportional to the simulation time, so we use the following costsfor our three fidelity function evaluation, respectively.
4.2 Experimental Setup
To model the relationship between a low fidelity function and the target fidelity function , we use an additive model. Specifically, we assume that for all fidelity levels where is an unknown function characterizing the error incurred by a lower fidelity function. We use Gaussian processes to model and . Since is embedded in every fidelity level, we can use an observation from any fidelity to update the posterior for every fidelity level. We use square exponential kernels for all the GP covariances, with hyperparameter tuning scheduled periodically during optimization. Following prior work on practical Bayesian optimization , we use 10% of the total budget for initialization. For multi-fidelity methods, the initialization budget is spent on randomly querying the lowest fidelity function. For the single-fidelity method, it is spent on randomly querying the target fidelity function. For all experiments, we use a total budget of 100 times the cost of target fidelity function call
. Every method is run 20 times to compute its mean and standard error.
4.3 Compared Methods
Our framework is general and we could plug in different single fidelity Bayesian optimization algorithms for the SF-GP-OPT procedure in Algorithm 1. In our experiment, we choose to use GP-UCB as one instantiation. We compare with MF-GP-UCB  and GP-UCB . MF-GP-UCB relies on several hyperparameters in the algorithm, we keep the same approach to choosing them as described in .
Besides the Bayesian optimization based method, we also compare with a common heuristic called Particle Swarm Optimization, which is inspired by the social behavior of animals and is used for nanophotonic structure designs [18, 20]. We use built-in MATLAB implementation of this algorithm. We specify a population of 5 particles and run Swarm optimization for 20 iterations, totaling evaluations of the target fidelity function. All other algorithm parameters are kept at default MATLAB values.
4.4 Optimizing Figure of Merit
Figure 2 and Figure 3 show the results of this experiment. As usual, the -axis is the cost and -axis is Figure of Merit and smaller is better. After a small portion of the budget is used in initial exploration, MF-MI-Greedy (red) is able to arrive at a better final design compared with MF-GP-UCB, GP-UCB and Particle Swarm. MF-MI-Greedy tends to have a worse figure of merit at the beginning because the initial explorations in the lower fidelity do not yield FOM scores on the target fidelity, so essentially it has a late start in all the plots because it starts querying the target fidelity late. However, the advantage of exploring lower fidelities becomes apparent once the exploitation phase starts in the target fidelity level, as seen by the rapid convergence to low FOM designs.
In this paper, we investigate the problem of optimizing the transmission properties of plasmonic mirror color filters, where we have access to multiple numerical simulators with different fidelity levels and computational costs. We considered several derivative-free global optimization algorithms, including a commonly used approach in the nanophotonics community, and two recently developed multiple-fidelity Bayesian optimization approaches. Our results on several pre-collected nanophotonics datasets demonstrate the compelling performance of the multiple-fidelity Bayesian optimization approach. These experiments suggest that there is a great potential in utilizing cheap, multi-fidelity simulations to aid the discovery of the optimal photonic nanostructures.
-  Mauricio Alvarez and Neil D Lawrence. Sparse convolved gaussian processes for multi-output regression. In Advances in neural information processing systems, pages 57–64, 2009.
-  Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
-  Jing Chen, Jian Yang, Zhuo Chena, Yi-Jiao Fang, Peng Zhan, and Zhen-Lin Wang. Plasmonic reflectors and high-q nano-cavities based on coupled metal-insulator-metal waveguides. AIP Advances, 2:012145, 2012.
Emile Contal, Vianney Perchet, and Nicolas Vayatis.
Gaussian process optimization with mutual information.
International Conference on Machine Learning, pages 253–261, 2014.
-  Thomas W. Ebbesen, Henri J. Lezec, H. F. Ghaemi, Tineke Thio, and Peter A. Wolff. Extraordinary optical transmission through sub-wavelength hole arrays. Nature, 391:667–668, 1998.
-  Dagny Fleischman, Luke A. Sweatlock, Hirotaka Murakami, and Harry Atwater. Hyper-selective plasmonic color filters. Optics Express, 25:27386–27395, 2017.
-  Alexander I. J. Forrester, András Sóbester, and Andy J. Keane. Multi-fidelity optimization via surrogate modelling. In Proceedings of the royal society of london a: mathematical, physical and engineering sciences, volume 463, pages 3251–3269. The Royal Society, 2007.
-  Philipp Hennig and Christian J. Schuler. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(Jun):1809–1837, 2012.
-  José Miguel Hernández-Lobato, Matthew W. Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems, pages 918–926, 2014.
-  John H. Holland. Genetic algorithms and adaptation. Adaptive Control of Ill-Defined Systems. NATO Conference Series (II Systems Science), 16, 1984.
-  Kirthevasan Kandasamy, Gautam Dasarathy, Junier B. Oliva, Jeff Schneider, and Barnabás Póczos. Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems, pages 992–1000, 2016.
-  Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. Multi-fidelity bayesian optimization with continuous approximations. In International Conference on Machine Learning, pages 1799–1808, 2017.
James Kennedy and Russel C. Eberhart.
Particle swarm optimization.
Proceedings of the IEEE International Conference on Neural Networks. Perth, Australia, page 1942–1945, 1995.
-  Scott Kirkpatrick, Charles Daniel Gelatt, and Mario P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983.
-  Loic Le Gratiet and Josselin Garnier. Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. International Journal for Uncertainty Quantification, 4(5), 2014.
-  Stefan A. Maier, Mark L. Brongersma, Pieter G. Kik, Sheffer Meltzer, Ari A. G. Requicha, and Harry A. Atwater. Plasmonics—a route to nanoscale optical devices. Advanced materials, 13:1501–1505, 2001.
Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P. Schoellig, Andreas
Krause, Stefan Schaal, and Sebastian Trimpe.
Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with bayesian optimization.In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1557–1563, 2017.
-  James G. Mutitu, Shouyuan Shi, Caihua Chen, Timothy Creazzo, Allen Barnett, Christiana Honsberg, and Dennis W Prather. Thin film silicon solar cell design based on photonic crystal and diffractive grating structures. Optics express, 16(19):15238–15248, 2008.
-  Carl E. Rasmussen and Chris K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
-  Mehrdad Shokooh-Saremi and Robert Magnusson. Leaky-mode resonant reflectors with extreme bandwidths. Optics letters, 35(8):1121–1123, 2010.
-  Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
-  Jialin Song, Yuxin Chen, and Yisong Yue. A general framework for multi-fidelity bayesian optimization with gaussian processes. arXiv preprint arXiv:1811.00755, 2018.
-  Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. International Conference on Machine Learning (ICML), 2010.
-  Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. arXiv preprint arXiv:1703.01968, 2017.
-  Zi Wang, Bolei Zhou, and Stefanie Jegelka. Optimization as estimation with gaussian processes in bandit settings. In Artificial Intelligence and Statistics, pages 1022–1031, 2016.
-  Sozo Yokogawa, Stanley P. Burgos, and Harry A. Atwater. Plasmonic color filters for CMOS image sensor applications. Nano. Lett., ACS Nano, 7:10038–10047, 2013.
-  Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan Kankanhalli. Information-based multi-fidelity bayesian optimization. NIPS Workshop on Bayesian Optimization, 2017.