1 Introduction
Designing compact integrated color filters with ultranarrow bandwidth is of great importance for realizing practical multispectral and hyperspectral imaging. Each pixel of a hyperspectral imaging device records the spectrum of light from the environment, providing significantly more information in comparison with conventional imaging techniques, that can help with, e.g., materials identification or objects detection. Such devices can find a wide range of applications in different areas of science and technology, including medicine, material science, astronomy, environment monitoring to name a few.
Surface plasmon polaritons (SPPs) allow for extreme miniaturization of integrated photonic devices via strong light confinement that can result in very small wavelength of light (potentially dozens of times smaller than in free space) [16]. Hence, such plasmonic devices look very promising as a platform for designing ultracompact integrated narrowband photonic filters [26, 3]. Periodic arrays of subwavelength holes or nanoslits in metal films enable efficient excitation of SPPs by satisfying momentummatching with the addition of a grating wavevector. The grating materials, geometry, and symmetry control the excitation efficiency [5]. In particular, periodic arrays of subwavelength apertures passing through an optically thick metal film exhibit enhanced transmission exclusively at conditions corresponding to constructive mutual interference between incident light and SPPs traveling along the surface between adjacent slits and acting as a bandpass color filter [6]. However, periodicity of slits can be effectively achieved by placing reflective mirrors around a slit.
Designing such plasmonic mirror filters sets up a nontrivial optimization challenge: number of independent parameters can easily exceed a few dozens, optimization landscape itself is nonconvex with many local minima. Additional challenge arises from the fact that the derivation of analytical model is nearly impossible due to nearfield effects and complicated geometry, hence numerical simulations of underlying physical processes (governed by Maxwell’s equations) are required.
Here we present an application of an inhouse developed optimization strategy based on multifidelity Gaussian processes [22] to this nanophotonics design problem. Our simulation setup allows us to easily control the fidelity of numerical calculations by changing geometric mesh size and total time duration of simulated physical processes. We compare it with a conventional Gaussian Processes approach and a commonly used algorithm—Particle Swarm Optimization, which is implemented in Lumerical commercial nanophotonics software^{1}^{1}1https://kb.lumerical.com.
2 Background and Related Work
2.1 Background on Nanophotonics Optimization Problem
We employ numerical methods to solve for an optimized design of this plasmonic color filter. We use Lumerical commercial implementation of the finitedifference timedomain (FDTD) method to simulate the transmission spectra of such devices. In postprocessing analysis, we extract observable scalar figures of merit corresponding to the goodness of the spectrum such as transmission peak amplitude, its offset from designed wavelength, fullwidth halfmaximum, signaltonoise ratio, and combine them into one weighted figure of merit (FOM). Using this FOM, the evolution of the design can then be pursued as a minimization problem over a geometric parameter space, which can be driven using any of a variety of iteration schemes. Here, we will search over a five parameter domain, describing the geometry of our devices.
The filters center wavelength, linewidth, and amplitude are determined by the interaction of several physical processes including the amplitude and phase of photonplasmon coupling at the slits, the strength of mutual coupling between the propagating waveguide channels, and the effective index of each participating electromagnetic mode. Due to the interplay of these several physical resonances, the FOM corresponding to our filters exhibits oscillations in parameter space that are likely to trap a local directed search method in a globally nonoptimal local solution. Therefore, gradient descent or other local optimization modalities are excluded for this purpose. Conversely, the relatively large computational cost of the FDTD forward problem limits the applicability of purely stochastic approaches like evolutionary methods. Instead, we seek methods which execute global, derivativefree search with an efficient iteration strategy that calls the forward problem solver a limited number of times (under defined budget).
2.2 Numerical Approaches to Blackbox Optimization
In this section, we review related numerical approaches for blackbox optimization.
2.2.1 Classical Approaches on Singlefidelity Blackbox Optimization
Heuristics
There are many stochastic heuristics for finding approximate solutions of nonconvex optimization problems, such as simulated annealing
[14][10], particle swarm optimization (PSO) [13] and many others. In this paper, we are using PSO as one of the baselines for comparison, as it has a wide use in nanophotonics community [18, 20] and it is implemented in Lumerical commercial nanophotonics software.In the particle swarm algorithm, the potential solutions, called particles, are initialized at random positions and velocities, and then move within the parameter search space. The particles are subject to three forces as they move: spring force towards the personal best position ever achieved by that individual particle, spring force towards the global best position ever achieved by any particle, a frictional force, proportional to the velocity. At each iteration velocities are stochastically updated based on these forces and previous velocity values, then new particles locations are calculated as the old ones plus the velocities, modified to keep particles within bounds. The algorithm proceeds until a specified stopping criterion is met.
PSO is inspired by the behavior of animal aggregations like flocks of birds or insects swarming. Each particle is attracted to some degree to the best location it has found so far, and also to the best location any member of the swarm has found. After some steps, the population can coalesce around one location, or can coalesce around a few locations, or can continue to move.
Gaussian processes optimization
Optimizing an unknown and noisy function is a common task in Bayesian optimization. In real applications, such functions tend to be expensive to evaluate, for example tuning hyperparameters for deep learning models
[21], so the number of queries should be minimized. As a way to model the unknown function, Gaussian process (GP) [19] is an expressive and flexible tool to model a large class of functions. A classical method for Bayesian optimization with GPs is GPUCB [23] which treats Bayesian optimization as a multiarmed bandit problem and proposes an upperconfidence bound based algorithm for query selections. The authors provide a theoretical bound on the cumulative regret that is connected with the amount of mutual information gained through the queries. [4] directly incorporates mutual information into the UCB framework and demonstrated the empirical values of their method.Entropy search [8] represents another class of GPbased Bayesian optimization approach. Its main idea is to directly search for the global optimum of an unknown function through queries. Each query point is selected based on its informativeness in learning the location for the function optimum. Predictive entropy search [9] addresses some computational issues from entropy search by maximizing the expected information gain with respect to the location of the global optimum. Maxvalue entropy search [25, 24] approaches the task of searching the global optimum differently. Instead of searching for the location of the global optimum, it looks for the value of the global optimum. This effectively avoids issues related to the dimension of the search space and the authors are able to provide regret bound analysis that the previous two entropy search methods lack.
2.2.2 Multifidelitiy Bayesian Optimization
Multifidelity optimization is a general framework that captures the tradeoff between cheap lowquality and expensive highquality data (cf. Fig. 1). There have been several works on using GPs to model functions of different fidelity levels. Recursive cokriging [7, 15]
consider an autoregressive model for multifidelity GP regression, which assumes that the higher fidelity consists of a lower fidelity term and an
independent GP term which models the systematic error for approximating the higherfidelity output. Therefore, one can model crosscovariance between the highfidelity and lowfidelity functions using the covariance of the lower fidelity function only. Virtual vs Real [17]extends this idea to Bayesian optimization. The authors consider a twofidelity setting (i.e., virtual simulation and real system experiments), where they model the correlation between the two fidelities through cokriging, and then apply entropy search to optimize the target output. Zhang et al. (2017)
[27] model the dependencies between different fidelities with convolved Gaussian processes [1], and then apply predictive entropy search (PES) [9] to efficient exploration.Although these multifidelity heuristics have shown promising empirical results on some experimental datasets, little is known about their theoretical performance. Recently, Kandasamy et al. (2016) propose MFGPUCB (Multifidelity GPUCB) [11]
, a principled for multifidelity Bayesian optimization. In particular, the authors consider an iterative twostage optimization procedure and view each fidelity as an independent component, and at each iteration update the estimate of each fidelity
only based on observations from the corresponding fidelity. In a followup work [12], the authors address the disconnect issue by considering a continuous fidelity space and performing joint updates to effectively share information among different fidelity levels. However, as elaborated on in [22], these approaches are likely to pick suboptimal actions in some pessimistic cases, due to the modeling assumption and the twostage query selection criteria. In this paper, we focus on MFMIGreedy, a principled multifidelity algorithm as recently proposed in [22]. We describe the details of the algorithm in §3, and evaluate it against the MFGPUCB algorithm as well as other singlefidelity baselines in §4.3 Multifidelity Optimization for Photonic Nanostructure Design
3.1 Preliminary and Problem Formulation
Consider the problem of maximizing an unknown payoff function . We can probe the function by directly querying it at some and obtaining a noisy observation , where
denotes i.i.d. Gaussian white noise. In addition to the payoff function
, we are also given access to oracle calls to some unknown auxiliary functions ; similarly, we obtain a noisy observation when querying at . Here, each could be viewed as a lowfidelity version of for . For example, if represents the actual reward obtained by running a real physical system with input , then may represent the simulated payoff from a numerical simulator at fidelity level .We assume that multiple fidelities
are mutually dependent through some fixed, (possibly) unknown joint probability distribution
. In particular, we model with a multiple output Gaussian process; hence the marginal distribution on each fidelity is a separate GP, i.e., , where specify the (prior) mean and covariance at fidelity level .Let us use to denote the action of querying at . Each action incurs cost , and achieves reward
(3.1) 
That is, performing (at the target fidelity) achieves a reward . We receive the minimal immediate reward with lower fidelity actions for , even though it may provide some information about and could thus lead to more informed decisions in the future. W.l.o.g., we assume that , and .
Let us encode an adaptive strategy for picking actions as a policy . In words, a policy specifies which action to perform next, based on the actions picked so far and the corresponding observations. We consider policies with a fixed budget . Upon termination, returns a sequence of actions , such that . Note that for a given policy , the sequence
is a random variable, dependent on the joint distribution
and the (random) observations of the selected actions. Given a budget on , our goal is to maximize the expected cumulative reward, so as to identify an action with performance close to as rapidly as possible. Formally, we seek(3.2) 
3.2 The MFMIGreedy Algorithm
We briefly describe MFMIGreedy proposed in [22], a mutual information based multifidelity Gaussian process optimization algorithm. It consists of two components: an exploratory procedure to gather information about the target level fidelity function via querying lower fidelity functions; an exploitative procedure to optimize the target level fidelity with the previously gathered information.
Exploration
MFMIGreedy considers an informationtheoretic selection criterion for choosing low fidelity queries. The quality of a low fidelity query is measured as the information gain per unit cost, defined as the amount of entropy reduction in the posterior distribution of the target fidelity function divided by the cost of the query: . Here, denotes the set of previously selected actions, and denote the observation history. As shown in Algorithm 2, this criterion is used greedily to select queries for low fidelity functions. To ensure that the algorithm does not explore excessively, we consider the following stopping conditions: (i) when the budget is exhausted (Line 2), (ii) when a single target fidelity action is better than all the low fidelity actions in terms of the benefitcost ratio (Line 2), and (iii) when the cumulative benefitcost ratio is small (Line 2). Here, the parameter is set to be where is the allocated budget.
Exploitation
At the end of the exploration phase, MFMIGreedy updates the posterior distribution of the joint GP using the full observation history, and searches for a target fidelity action via the (singlefidelity) GP optimization subroutine SFGPOPT (Line 1). Here, SFGPOPT could be any offtheshelf Bayesian optimization algorithm, such as GPUCB [23], GPMI [4], EST [25] and MVES [24], etc. Different from the previous exploration phase which seeks an informative set of low fidelity actions, the GP optimization subroutine aims to trade off exploration and exploitation on the target fidelity, and outputs a single action at each round. MFMIGreedy then proceeds to the next round until it exhausts the preset budget, and eventually outputs an estimator of the target function optimizer.
3.3 Practical Implementation
In Algorithm 2 and the algorithm used for SFGPOPT, we need to find the argmax of a function. For the photonic nanostructure experiment in §4, this optimization is over a discrete set of candidate queries. Naively, we would need to evaluate the function at each query point in order to determine the optimizer, which is a costly operation. Instead, we devise an approximate optimization step to address this computational challenge. We first directly optimize the function over its continuous domain and obtain an optimizer. Then we project the optimizer down to the candidate set by selecting the closest available query point based on Euclidean distance. This approximation scheme takes advantage of existing fast optimizers for continuous functions and becomes necessary for large candidate size.
4 Experimental Design for Photonic Nanostructures Discovery
4.1 Datasets
Our nanophotonic structure is characterized by the five geometric parameters. For each parameter setting, we use a score, commonly called a figureofmerit (FOM), to represent how well the resulting structure satisfies the desired color filtering property. By minimizing FOM, we can find a set of highquality design parameters. Traditionally, FOMs can only be computed through the actual fabrication of a structure and subsequent measurements of its various physical properties, which is a timeconsuming process. Alternatively, simulations can be utilized to estimate what physical properties a design will have, e.g. using the Lumerical software. By solving a 2D variant of Maxwell’s equations, we could simulate the transmission spectrum of a given nanophotonic device and then compute FOM from it. We could obtain different fidelity level data by controlling aspects of the numerical solution process.
We experiment with three design tasks for filtering light with wavelengths of 550 nm, 650 nm and 750 nm. For each task, we vary the conformal mesh size and the timedomain solver total time duration of simulated physical processes to obtain two sets of multifidelity data, each with three fidelity levels on 4983 designs.
The first set of data is based on different conformal mesh sizes. The mesh size determines how accurate the final results are, with finer meshes lead to more accurate results. We generated the lowest fidelity data using a mesh size of , the middle fidelity and the target fidelity . The costs, CPU time, are inverse proportional to the mesh size, so we use the following costs for our three fidelity function evaluation, respectively.
The second set of data is based on the different total time duration of simulated physical processes for the timedomain solver. Since the transmission spectrum is calculated through Fourier transform of the electromagnetic pulse, that is passed through the color filter, we expect more accurate solutions with longer physical simulation time duration. We generated the lowest fidelity data using a simulation time of 40 fs (femtoseconds), the middle fidelity 70 fs and the target fidelity 100 fs. The costs are proportional to the simulation time, so we use the following costs
for our three fidelity function evaluation, respectively.4.2 Experimental Setup
To model the relationship between a low fidelity function and the target fidelity function , we use an additive model. Specifically, we assume that for all fidelity levels where is an unknown function characterizing the error incurred by a lower fidelity function. We use Gaussian processes to model and . Since is embedded in every fidelity level, we can use an observation from any fidelity to update the posterior for every fidelity level. We use square exponential kernels for all the GP covariances, with hyperparameter tuning scheduled periodically during optimization. Following prior work on practical Bayesian optimization [2], we use 10% of the total budget for initialization. For multifidelity methods, the initialization budget is spent on randomly querying the lowest fidelity function. For the singlefidelity method, it is spent on randomly querying the target fidelity function. For all experiments, we use a total budget of 100 times the cost of target fidelity function call
. Every method is run 20 times to compute its mean and standard error.
4.3 Compared Methods
Our framework is general and we could plug in different single fidelity Bayesian optimization algorithms for the SFGPOPT procedure in Algorithm 1. In our experiment, we choose to use GPUCB as one instantiation. We compare with MFGPUCB [11] and GPUCB [23]. MFGPUCB relies on several hyperparameters in the algorithm, we keep the same approach to choosing them as described in [11].
Besides the Bayesian optimization based method, we also compare with a common heuristic called Particle Swarm Optimization, which is inspired by the social behavior of animals and is used for nanophotonic structure designs [18, 20]. We use builtin MATLAB implementation of this algorithm. We specify a population of 5 particles and run Swarm optimization for 20 iterations, totaling evaluations of the target fidelity function. All other algorithm parameters are kept at default MATLAB values.
4.4 Optimizing Figure of Merit
Figure 2 and Figure 3 show the results of this experiment. As usual, the axis is the cost and axis is Figure of Merit and smaller is better. After a small portion of the budget is used in initial exploration, MFMIGreedy (red) is able to arrive at a better final design compared with MFGPUCB, GPUCB and Particle Swarm. MFMIGreedy tends to have a worse figure of merit at the beginning because the initial explorations in the lower fidelity do not yield FOM scores on the target fidelity, so essentially it has a late start in all the plots because it starts querying the target fidelity late. However, the advantage of exploring lower fidelities becomes apparent once the exploitation phase starts in the target fidelity level, as seen by the rapid convergence to low FOM designs.
5 Conclusion
In this paper, we investigate the problem of optimizing the transmission properties of plasmonic mirror color filters, where we have access to multiple numerical simulators with different fidelity levels and computational costs. We considered several derivativefree global optimization algorithms, including a commonly used approach in the nanophotonics community, and two recently developed multiplefidelity Bayesian optimization approaches. Our results on several precollected nanophotonics datasets demonstrate the compelling performance of the multiplefidelity Bayesian optimization approach. These experiments suggest that there is a great potential in utilizing cheap, multifidelity simulations to aid the discovery of the optimal photonic nanostructures.
References
 [1] Mauricio Alvarez and Neil D Lawrence. Sparse convolved gaussian processes for multioutput regression. In Advances in neural information processing systems, pages 57–64, 2009.
 [2] Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
 [3] Jing Chen, Jian Yang, Zhuo Chena, YiJiao Fang, Peng Zhan, and ZhenLin Wang. Plasmonic reflectors and highq nanocavities based on coupled metalinsulatormetal waveguides. AIP Advances, 2:012145, 2012.

[4]
Emile Contal, Vianney Perchet, and Nicolas Vayatis.
Gaussian process optimization with mutual information.
In
International Conference on Machine Learning
, pages 253–261, 2014.  [5] Thomas W. Ebbesen, Henri J. Lezec, H. F. Ghaemi, Tineke Thio, and Peter A. Wolff. Extraordinary optical transmission through subwavelength hole arrays. Nature, 391:667–668, 1998.
 [6] Dagny Fleischman, Luke A. Sweatlock, Hirotaka Murakami, and Harry Atwater. Hyperselective plasmonic color filters. Optics Express, 25:27386–27395, 2017.
 [7] Alexander I. J. Forrester, András Sóbester, and Andy J. Keane. Multifidelity optimization via surrogate modelling. In Proceedings of the royal society of london a: mathematical, physical and engineering sciences, volume 463, pages 3251–3269. The Royal Society, 2007.
 [8] Philipp Hennig and Christian J. Schuler. Entropy search for informationefficient global optimization. Journal of Machine Learning Research, 13(Jun):1809–1837, 2012.
 [9] José Miguel HernándezLobato, Matthew W. Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of blackbox functions. In Advances in neural information processing systems, pages 918–926, 2014.
 [10] John H. Holland. Genetic algorithms and adaptation. Adaptive Control of IllDefined Systems. NATO Conference Series (II Systems Science), 16, 1984.
 [11] Kirthevasan Kandasamy, Gautam Dasarathy, Junier B. Oliva, Jeff Schneider, and Barnabás Póczos. Gaussian process bandit optimisation with multifidelity evaluations. In Advances in Neural Information Processing Systems, pages 992–1000, 2016.
 [12] Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. Multifidelity bayesian optimization with continuous approximations. In International Conference on Machine Learning, pages 1799–1808, 2017.

[13]
James Kennedy and Russel C. Eberhart.
Particle swarm optimization.
Proceedings of the IEEE International Conference on Neural Networks. Perth, Australia
, page 1942–1945, 1995.  [14] Scott Kirkpatrick, Charles Daniel Gelatt, and Mario P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983.
 [15] Loic Le Gratiet and Josselin Garnier. Recursive cokriging model for design of computer experiments with multiple levels of fidelity. International Journal for Uncertainty Quantification, 4(5), 2014.
 [16] Stefan A. Maier, Mark L. Brongersma, Pieter G. Kik, Sheffer Meltzer, Ari A. G. Requicha, and Harry A. Atwater. Plasmonics—a route to nanoscale optical devices. Advanced materials, 13:1501–1505, 2001.

[17]
Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P. Schoellig, Andreas
Krause, Stefan Schaal, and Sebastian Trimpe.
Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with bayesian optimization.
In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1557–1563, 2017.  [18] James G. Mutitu, Shouyuan Shi, Caihua Chen, Timothy Creazzo, Allen Barnett, Christiana Honsberg, and Dennis W Prather. Thin film silicon solar cell design based on photonic crystal and diffractive grating structures. Optics express, 16(19):15238–15248, 2008.
 [19] Carl E. Rasmussen and Chris K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
 [20] Mehrdad ShokoohSaremi and Robert Magnusson. Leakymode resonant reflectors with extreme bandwidths. Optics letters, 35(8):1121–1123, 2010.
 [21] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
 [22] Jialin Song, Yuxin Chen, and Yisong Yue. A general framework for multifidelity bayesian optimization with gaussian processes. arXiv preprint arXiv:1811.00755, 2018.
 [23] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. International Conference on Machine Learning (ICML), 2010.
 [24] Zi Wang and Stefanie Jegelka. Maxvalue entropy search for efficient bayesian optimization. arXiv preprint arXiv:1703.01968, 2017.
 [25] Zi Wang, Bolei Zhou, and Stefanie Jegelka. Optimization as estimation with gaussian processes in bandit settings. In Artificial Intelligence and Statistics, pages 1022–1031, 2016.
 [26] Sozo Yokogawa, Stanley P. Burgos, and Harry A. Atwater. Plasmonic color filters for CMOS image sensor applications. Nano. Lett., ACS Nano, 7:10038–10047, 2013.
 [27] Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan Kankanhalli. Informationbased multifidelity bayesian optimization. NIPS Workshop on Bayesian Optimization, 2017.
Comments
There are no comments yet.