I Introduction
Integrated photonic devices have led to game changing new capabilities in applications ranging from highspeed communications [1] to nextgeneration quantum computing platforms [2] and machine learning hardware accelerators [3]. These platforms stand to benefit from photonic components with improved performance, lower losses, larger bandwidths, and more compact footprints. However, achieving such multifaceted specifications is extremely challenging through conventional intuitionbacked design methodologies. In contrast, computational inverse design techniques can efficiently explore complex and highdimensional design landscapes which are inaccessible to human designers. Enabled by adjoint variable methods [4, 5], gradientbased optimization techniques have proven to be among the most successful at producing photonic devices with high performance, with examples including wavelengthselective elements (e.g. multiplexers, filters, and resonators), signal routing elements (e.g. waveguide bends and couplers), and active components (e.g. modulators) [6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16].
The approach in gradientbased design strategies is to first formulate an objective function, calculated in terms of a device’s simulated performance. Adjoint methods allow these objective functions and their gradients to be efficiently computed at a cost of only two fullwave simulations [4, 5]
, independently of the number of degrees of freedom in a design. When used in the context of modern automatic differentiation frameworks (e.g. JAX and TensorFlow), these techniques enable powerful endtoend device optimization frameworks
[14, 17]. However, guaranteeing that an optimized design yields the desired performance, while simultaneously satisfying the numerous design rules of modern semiconductor manufacturing processes [18], still poses a major challenge. Particularly relevant for integrated photonic devices are the capabilities of the lithographic systems, which define constraints such as the minimum feature length scale that can be reliably printed. Elements of a design which are smaller than this length scale may print inconsistently or may be completely absent in the fabricated structure. Moreover, inclusion of a design feature with subresolution size can have broad yield implications. Thus, semiconductor foundries specify design rules which define the minimum width and minimum spacing of design features. Compliance with these rules, as validated by a design rule check (DRC), is typically a prerequisite for fabrication.There have been many proposals for DRCcompliant inverse design strategies. One class of strategies selects a design parameterization with intrinsic geometric length scale guarantees, e.g. by optimizing the placement of geometric primitives which already satisfy the desired minimum feature size constraints [19]. A related approach uses a parameterization which enables analytic constraints to be applied directly to the design degrees of freedom, e.g. by optimizing the coordinates of polygon vertices comprising shapes in the design mask [20]. However, the drawback in both of these strategies is that they offer up a limited landscape of potential designs. For example, when optimizing the vertices of a polygon, an optimizer will be unable to alter the topology of a design by closing or opening holes.
In contrast, topology optimization strategies open up a much larger design landscape by parameterizing a design as a pixelated image, which is transformed by a sequence of convolutional filters and pixelwise nonlinear functions [21, 22, 23, 24, 16]. These socalled density or threefield parameterizations allow a gradientbased optimization algorithm (e.g. LBFGS or MMA [25]
) to modify any pixel within the design and apply arbitrary changes to the topology. In practice however, there is typically a tradeoff between managing design feasibility (binarization of the pixels and feature size constraints) and attaining high performance because a continuous optimizer must traverse
infeasible regions of the design landscape before reaching feasible regions, which may or may not have high performance. The most common strategy in topology optimization is to begin with a continuous optimization phase before gradually enabling binarization and applying constraints. However, if binarization is ramped up too quickly, a continuous optimizer will stall and fail to make progress towards improving performance. Similarly, constraints or penalties on the feature size must be introduced in a way which will not push the optimizer away from high performance regions of the parameter space, where no guarantees remain on it being able to recover. Often, the detailed strategy and schedule for successfully enforcing feasibility is somewhat of an art and, moreover, depends on both the device being optimized as well as the performance objective, making it challenging to scale these strategies.In this work we propose an always feasible gradientbased inverse design framework with strict guarantees on DRC compliance. The key features of our proposal are a conditional generator for feasible designs, which is combined with a straightthrough estimator (STE) to enable gradient backpropagation. The combination of these two components yields a differentiable transform which can be incorporated into an computational graph, much like a convolutional filter or projection operator in densitybased topology optimization. We apply our framework to several practical photonic design challenges (a waveguide bend, a beam splitter, a waveguide mode converter, and a demultiplexer), and demonstrate that the approach yields designs with high performance.
Ii Designs strictly satisfying length scale constraints
In this section, we introduce a mathematical framework for measuring whether a binarized pixel array is DRCcompliant, focusing on the minimum width and minimum spacing rules commonly required by semiconductor foundries. We consider binary arrays with values of and , and refer to these as void and solid, respectively. During the course of a fabrication workflow, such designs may be converted to contours by e.g. marching squares or dual contouring [26]. Conceptually, all such designs can be created by repeated placements and translations of void or solid brushes (i.e. kernels, or clusters of pixels), with the trivial singlepixel brush able to produce all designs. If a design is feasible with some brush , the condition
(1) 
holds, where is the morphological opening of with brush ; this is computed by , where and are dilation and erosion operators [27]. Intuitively, morphological opening removes small features from an image, and if is unchanged by the operation then it only contains solid and void features that can be created with .
Length scale in a topology optimization context is typically associated with the minimum diameter of circular features. Thus, we say that if a design can be realized with an approximately circular brush with diameter , it has a minimum length scale of at least . The minimum length scale of the design is the largest value of for which Eqn. (1) is satisfied with brush . When contours are obtained by dual contouring (i.e. outlining pixels, for binary designs), a design with minimum length scale will also have related minimum width and spacing, equal to the minimum width of the brush itself. An example circular brush with associated measurements is shown in Fig. 1a and an example design which is feasible with the circular brush is shown in Fig. 2a.
We turn now to minimum width and spacing constraints, as might result from foundry design rules. Given a minimum width and spacing constraint [28, 29], there are many brushes for which the associated feasible designs satisfy the constraint. The smallest of these is the notched square, i.e. a square brush which is solid everywhere except at the corners, as shown in Fig. 1b. An example design is in Fig. 2b. When the constraint requires a physical minimum feature width and spacing of , designs feasible with the notched square brush of width will satisfy the constraint, where is the design pixel pitch. Thus, if we can require that our designs are feasible with a notched square brush of width , we are also guaranteed to satisfy width and spacing rules of size . In Sec. III we utilize this to develop an inverse design method which produces designs that are guaranteed to satisfy width and spacing rules.
Although reasoning about minimum width and spacing is straightforward for dualcontoured designs feasible with the notchedsquare brush, we have also extensively validated that such designs are designrule compliant using industry standard design software: KLayout [28] and Siemens Calibre Physical Verification [29].
Iii Inverse design subject to strict length scale constraints
In this section we describe our method of inverse design subject to strict length scale constraints. We have shown that designs which are feasible with a certain brush (
feasible) will satisfy minimum length scale or width and spacing constraints. Properly, this can be considered a combinatorial optimization problem. However, our approach is to approximately solve this problem, by transforming it into an unconstrained stochastic gradient optimization problem. Critically, we use a conditional generator for feasible designs and a straightthrough estimator (STE) for backpropagation; these two components are discussed subsequently.
iii.1 Generator for feasible designs
In general, a binary design can be produced by dilation of binary arrays and with , i.e.
(2) 
The arrays and are termed the solid and void touches of . We consider the generation of feasible arrays as a process in which or are updated with a singleelement change at each step. In this context, we introduce several touch states and pixel states which may be associated with each location at any step in the process.
Specifically, defining these for the solid case: existing solid pixels are those which are assigned to solid by some touch in . Impossible solid touches are those which would assign an existing void pixel to solid. Valid solid touches are those which are not impossible and not already in . Possible solid pixels are those obtained by dilating all existing or valid touches with , i.e. they are those pixels which are or may be solid in a feasible array. Required solid pixels are those which are not existing and not possible for void. Resolving solid touches are those which are valid and would assign solid to a requiredsolid pixel. Finally, free touches are valid but only color existing pixels. These solid touches and pixels are computed by,
(3)  
(4)  
(5)  
(6)  
(7)  
(8)  
(9) 
Corresponding void touches and pixels can be computed with similar expressions. Fig. 3 shows an example of pixels and touches computed from the expressions above. A full example is worked through in the Appendix.
A scheme that generates feasible arrays is outlined in algorithm 1. The algorithm begins with empty and and loops until all pixels are existing for either solid or void. At each step, the pixel and touch states are updated. If free touches exist, all are selected. If resolving touches exist, one of these is selected; otherwise, any of the valid touches are selected.
If touch selection is random, the algorithm simply yields a random feasible design. In practice, we greedily select the best touch as computed from a pixel reward array ; the reward of a solid touch is the sum of elements set by the touch, while the reward of a void touch is the negative sum. With this change, the algorithm becomes a conditional generator of feasible designs, where biases the feasible design generation. The designs in Fig. 2 have been created by running the algorithm on the same random reward array.
To gain some intuition for why this scheme always produces feasible designs, consider that the algorithm will select exclusively solid or exclusively void until reaching a state where all remaining pixels are possible for both solid and void. In this state, no touch can lead to an invalid status. Since the starting state is one where all pixels are possible for both solid and void, we can see that the algorithm will never produce an invalid status.
We emphasize that the full space of feasible designs of brush is accessible by Algorithm 1. This can be clearly seen, since e.g. if is provided, where is any feasible design, the output of the algorithm will simply be .
iii.2 Straight through estimator
Algorithm 1
is not differentiable, and so it cannot be directly used with backpropagation in a gradientbased optimization setting. As a workaround, we draw inspiration from the field of binary and quantized neural networks, noting that binary optimization can be viewed as topology optimization with a minimum length scale equal to a single pixel. Binary and quantized neural networks often make use of a STE in training, to backpropogate gradients through binary activation functions or to latent weights from which lowprecision weights are obtained
[30]. Specifically, in this approach one substitutes the gradient of a nondifferentiable function with that of an estimator, i.e.(10) 
A typical estimator for binarization (i.e. with the sign function) is identity, although it has been found that estimators which approximate their forwardpass counterpart can yield better results [31].
iii.3 Computational graph
The complete computational graph for the inverse design problem is depicted in Fig. 4. A latent design is passed through a transform followed by the conditional generator, which produces a feasible design. We pass the design to an electromagnetic simulation engine which computes scattering matrix elements; these are the inputs to an objective function, which computes a scalar loss value. Optionally, we symmetrize the transform output to favor symmetric feasible designs. All operations in this graph are differentiable, with the exception of the generator — for which we use the STE. Thus, we can compute an estimated gradient of the loss with respect to the latent design.
For both the transform and the estimator, we have found that
(11) 
is a good choice, where
is a scalar hyperparameter in the range 28. The convolutional transform aids in the optimization, as it ensures that the reward array
is smooth and has an associated length scale, determined by the brush size. The latent design is randomly initialized with a bias so that the first feasible design is fully solid, and it is updated iteratively using the Adam scheme [32], which is a common choice for stochastic optimization problems in machine learning. We use a learning rate of 0.01, and andfor the decay of the gradient and second moment exponential moving averages, respectively.
Notably, our method does not require scheduling of hyperparameters or changes in parameterization during the course of an optimization, and gives a feasible design at every optimization step. We expect that this simplicity could be advantageous from the perspective of a photonic designer tasked with creating a new component.
iii.4 Objective function
In a practical design application, one is generally concerned with creating a component that satisfies a performance specification. We define the specification by defining a cutoff value for the magnitude for each relevant scattering parameter . There is also an implicit additional bound on the allowed values, imposed by the physics of the problem. For example, magnitudes greater than one are impossible for passive components. The range of allowed values is
. In general, the specification may constrain the behavior for various ports and multiple frequencies, and so each of these may be vectors. Our scalar objective function has the form,
(12) 
where is a vector of signs matching the size of , which are positive where the cutoff specifies the maximum value, and negative where it specifies the minimum value.
Iv Nanophotonic optimization problems
In this section we apply our proposed inverse design framework to several integrated photonic components operating in the Oband, demonstrating designs for a waveguide bend, a mode converter, a beamsplitter, and a wavelength demultiplexer. All components are optimized for their characteristics within two 10 nm wavelength bands centered at 1270 nm and 1290 nm. Performance targets for the components are specified in terms of their scattering parameters, and are summarized in Table 1. Simulations of the components are performed using Ceviche [17]
, an open source 2D finite difference frequency domain (FDFD) simulator. Silicon (
) and silicon oxide () are used for solid and void materials, respectively. All components are coupled to waveguides having 400 nm width, and we deal with the fundamental waveguide mode unless noted otherwise.






20 dB  20 dB  20 dB.  20 dB  
20 dB  20 dB  20 dB  20 dB  
0.5 dB  0.5 dB  3.5 dB  3 dB  
0.5 dB  0.5 dB  3.5 dB  20 dB  
—  —  3.5 dB  20 dB  
—  —  3.5 dB  3 dB  
—  —  20 dB  —  
—  —  20 dB  — 
The waveguide bend features a design region, with waveguides connecting to the left (port 1) and bottom (port 2). Given excitation from port 1, we aim to maximize transmission to port 2 while keeping backreflection low. Diagonal reflection symmetry is imposed, so that excitation from port 2 yields identical behavior.
The spatial mode converter features a design region, with waveguides connecting to the left and right. We seek designs that maximally convert the fundamental waveguide mode on the left (port 1) to the second order mode on the right (port 2), with minimal backreflection into port 1.
The beamsplitter design region is in size, with two waveguides connecting to the left (port 1 and port 4) and two waveguides connecting to the right (port 2 and port 3). Given excitation from port 1, the beamsplitter aims to divide power equally into port 2 and port 3 while minimizing backreflection into port 1 or transmission into port 4. We impose reflection symmetry along the horizontal and vertical axes, so that excitation from any other port yields identical behavior.
The wavelength demultiplexer features a design region, with one waveguide connecting to the left (port 1) and two waveguides connecting to the right (port 2 and port 3). Given excitation from port 1, wavelengths in the first band are directed to port 2 while wavelengths in the second band are directed to port 3.
In the optimization context, we consider three wavelengths per band — the center and extremal wavelengths. The specifications are considered to be fulfilled if the criteria in 1 are satisfied for all wavelengths. Our design resolution matches the 10 nm simulation resolution, so that e.g. a circular brush with diameter 10 corresponds to a 100 nm length scale.
V Results
v.1 Designs using 100 nm circular brush
Fig. 5 shows optimization results for the waveguide bend with 100 nm circular brush. Upon initialization, the design is entirely silicon with no enclosed oxide. The initial design is poor, with low transmission to the output port and substantial reflection back to the input port. Subsequent updates significantly modify the topology of the design, adding enclosed oxide features and developing isolated silicon features. We emphasize the key characteristic of our scheme — the design at every step is fully binary and satisfies the 100 nm length scale constraint.
The target performance specification is achieved in 28 optimization steps, and the design continues to improve with more iterations. The lowest loss in the first 160 steps is found at step 122. Notably, the topology of this design differs from that at step 28, due to the fusing of two void features and the elimination of a third, along with changes in the shape of some solid features. The ability of our scheme to change topology while remaining feasible stands in contrast to some other methods, where the optimizer is locked in to a fixed topology once the optimization reaches an advanced state.
For the mode converter, beamsplitter, and wavelength demultiplexer, the lowestloss design from the first 160 optimization steps and corresponding field magnitudes are shown in Fig. 6. The scattering spectra for each is shown in Fig. 7, where each satisfies the performance target laid out in Table 1 and the 100 nm length scale constraint.
As we have shown, the topology of a design can change substantially during the course of the optimization. Since the response of a device is generally discontinuous with major changes in the design shape or topology, a noisy loss trajectory is expected. This is illustrated in Fig. 8, which shows the normalized loss value versus iteration for the waveguide bend of Fig. 5 and the other three components. Points where the performance target is achieved are also highlighted. For each component, the loss generally decreases with step, but noise in the loss value and occasional regressions to higher loss can be seen.
The waveguide bend and mode converter require fewer than 40 optimization iterations to achieve the target. Meanwhile, the beamsplitter and demultiplexer appear to be relatively more difficult problems, requiring approximately 100 steps to first reach the target. However, this is still well within the range of optimization steps required in an inverse design scheme [16]. In general, difficulty is directly related to the target specifications, length scale constraints, the physical size of the component, and the configuration of connecting waveguides. Reducing the length scale or increasing the design size will expand the solution space, and generally allow satisfactory solutions to be found with fewer optimization steps.
v.2 Designs using 100 nm notched square brush
Next, we turn to component designs generated using a 100 nm notched square brush. With the design resolution of 10 nm, these designs strictly satisfy an 80 nm minimum width and spacing constraint. The normalized loss trajectories four the components are plotted in Fig. 9. Scattering spectra and designs are found in the appendix.
For all components, designs achieving the performance target from Table 1 are found. However, the loss trajectories differ in several respects. Specifically, the trajectories are noisier, they generally contain fewer points that achieve the spec, and the spec is achieved later in the optimization. We attribute this to the fact that the notched square is relatively larger in area, limiting the design space accessible to the optimizer and making the problem more challenging. We expect that optimal hyperparameters of the Adam algorithm used to drive our optimization are somewhat problemdependent, and in future work it would be interesting to explore various configurations. In particular, reducing the learning rate as the optimization progresses, which has been shown to aid in training of quantized neural networks [33], could be beneficial.
v.3 Reliability and effect of length scale
Our results indicate that our inverse design framework is capable of finding designs with desired performance for a variety of fabricationconstrained photonic inverse design problems. A useful method will find such solutions for a range of length scale constraints, and do so reliably. To evaluate this, we consider the optimization problems above for circular and notchedsquare brushes with 60 nm, 80 nm, 100 nm, and 130 nm size. To study reliability of the method, we run 20 separate 500step optimizations with different random initialization, generating 320,000 feasible designs in the process.
Our analysis proceeds as follows: for each step of an optimization run, we identify whether the target performance from table 1 is achieved at any step . Then, for each combination of component, brush shape, and brush size, we compute for each step the fraction of the 20 runs which achieve this target. The results are shown in Fig. 10.
Turning first to the 60 nm brushes: we observe that for all components and both brush shapes, we are able to consistently find solutions which achieve the performance target. Random initialization affects the optimization trajectory, and the 20 different runs produce a distribution in the number of steps required to achieve the target. Comparing the circular and notched square brush, we find that with the circular brush the target performance is achieved slightly earlier, consistent with the hypothesis that the notched square presents a more challenging optimization problem. This finding will be repeated for other brush sizes discussed below. Across all component types, the range of steps required to achieve the target performance is at most tens of steps wide, and in practical application a single optimization run may actually be sufficient to find a good solution.
With the 80 nm brushes, again we are able to consistently achieve the target performance. In general, the number of steps required is larger than for the 60 nm brush. However, in some cases — such as the wavelength demultiplexer with circular brush — the steps required follow similar distribution as with the 60 nm brush. This suggests that for some problems, decreasing the length below a threshold value has no benefit. We also see some outlier examples where the target fails to be achieved.
As the brush sizes are increased to 100 nm, the general pattern is repeated. The target performance is achieved with good consistency, but more steps are required and a larger number of outlier optimization runs fail to reach the target within 500 steps.
At 130 nm, the pattern is repeated yet again. With this brush size, designs for the waveguide bend and mode converter which achieve the target continue to be consistently found. For the beamsplitter and wavelength demultiplexer, however, approximately half or fewer optimization runs achieve the target. Therefore, in a practical application of our inverse design method with such large length scales, it may be beneficial to launch multiple optimization runs to ensure that at least one can reach the target performance.
Much of the data supports the conclusion that larger brushes present more challenging optimization problems. We expect this also for theoretical reasons, since the design space with small brushes is strictly larger than that for small brushes. However, with 130 nm brush we observe that some of the beamsplitter designs achieve the target performance ahead of the best 80 nm and 100 nm brush runs. This warrants further investigation in future work.
Vi Conclusion
We have described a novel inverse design method which produces designs that are guaranteed to satisfy length scale constraints throughout the course of an optimization. Our method uses a conditional generator for feasible designs, and adopts the straightthrough estimator commonly used for quantized neural network training. This enables us to pose the problem of lengthscaleconstrained topology optimization as a simple unconstrained stochastic gradient optimization problem.
Our method is shown to reliably find good solutions to a variety of 2D nanophotonic inverse design problems. We believe the good performance of the method and its simplicity could make it a useful new scheme in practical applications of inverse design.
In the future, it would be important to use a realistic 3D electromagnetics simulator to design components that can be manufactured in commercial foundries. It would also be interesting to apply the method to topology optimization problems in other domains.
We also see possible improvements to the underlying method. Specifically, conditional generators for designs satisfying additional constraints would be valuable — such as minimum solid and void area, which are frequently included in foundry design rules. Even for width and spacing constraints addressed by the current scheme, it would be useful to develop new generators which are not conservative, i.e. which do not require a brush larger than the target minimum width. Finally, the estimator and transform functions warrant additional study, and we see potential for learned estimators.
References
 Marpaung, Yao, and Capmany [2019] D. Marpaung, J. Yao, and J. Capmany, “Integrated microwave photonics,” Nature photonics 13, 80–90 (2019).
 Arrazola et al. [2021] J. M. Arrazola, V. Bergholm, K. Brádler, T. R. Bromley, M. J. Collins, I. Dhand, A. Fumagalli, T. Gerrits, A. Goussev, L. G. Helt, J. Hundal, T. Isacsson, R. B. Israel, J. Izaac, S. Jahangiri, R. Janik, N. Killoran, S. P. Kumar, J. Lavoie, A. E. Lita, D. H. Mahler, M. Menotti, B. Morrison, S. W. Nam, L. Neuhaus, H. Y. Qi, N. Quesada, A. Repingon, K. K. Sabapathy, M. Schuld, D. Su, J. Swinarton, A. Száva, K. Tan, P. Tan, V. D. Vaidya, Z. Vernon, Z. Zabaneh, and Y. Zhang, “Quantum circuits with many photons on a programmable nanophotonic chip,” Nature 591, 54–60 (2021).

Wetzstein et al. [2020]
G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature
588, 39–47 (2020).  Veronis, Dutton, and Fan [2004] G. Veronis, R. W. Dutton, and S. Fan, “Method for sensitivity analysis of photonic crystal devices,” Optics letters 29, 2288–2290 (2004).
 Molesky et al. [2018] S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vucković, and A. W. Rodriguez, “Inverse design in nanophotonics,” Nature photonics 12, 659–670 (2018).
 Sell et al. [2017] D. Sell, J. Yang, S. Doshay, R. Yang, and J. A. Fan, “LargeAngle, multifunctional metagratings based on freeform multimode geometries,” Nano letters 17, 3752–3757 (2017).
 Hughes et al. [2018] T. W. Hughes, M. Minkov, I. A. D. Williamson, and S. Fan, “Adjoint method and inverse design for nonlinear nanophotonic devices,” ACS Photonics 5, 4781–4787 (2018).
 Jin et al. [2018] W. Jin, S. Molesky, Z. Lin, K.M. C. Fu, and A. W. Rodriguez, “Inverse design of compact multimode cavity couplers,” Optics express 26, 26713–26721 (2018).
 Wang et al. [2018] J. Wang, Y. Shi, T. Hughes, Z. Zhao, and S. Fan, “Adjointbased optimization of active nanophotonic devices,” Optics express 26, 3236–3248 (2018).
 Lin et al. [2019] Z. Lin, V. Liu, R. Pestourie, and S. G. Johnson, “Topology optimization of freeform largearea metasurfaces,” Optics express 27, 15765–15775 (2019).
 Piggott et al. [2020] A. Y. Piggott, E. Y. Ma, L. Su, G. H. Ahn, N. V. Sapra, D. Vercruysse, A. M. Netherton, A. S. P. Khope, J. E. Bowers, and J. Vučković, “InverseDesigned photonics for semiconductor foundries,” ACS Photonics 7, 569–575 (2020).
 Yang et al. [2020] K. Y. Yang, J. Skarda, M. Cotrufo, A. Dutt, G. H. Ahn, M. Sawaby, D. Vercruysse, A. Arbabian, S. Fan, A. Alù, and J. Vučković, “Inversedesigned nonreciprocal pulse router for chipbased LiDAR,” Nature photonics 14, 369–374 (2020).
 Sapra et al. [2020] N. V. Sapra, K. Y. Yang, D. Vercruysse, K. J. Leedle, D. S. Black, R. J. England, L. Su, R. Trivedi, Y. Miao, O. Solgaard, R. L. Byer, and J. Vučkovicć, “Onchip integrated laserdriven particle accelerator,” Science 367, 79–83 (2020).
 Minkov et al. [2020] M. Minkov, I. A. D. Williamson, L. C. Andreani, D. Gerace, B. Lou, A. Y. Song, T. W. Hughes, and S. Fan, “Inverse design of photonic crystals through automatic differentiation,” ACS Photonics 7, 1729–1741 (2020).
 Tseng et al. [2021] E. Tseng, S. Colburn, J. Whitehead, L. Huang, S.H. Baek, A. Majumdar, and F. Heide, “Neural nanooptics for highquality thin lens imaging,” Nature communications 12, 6493 (2021).
 Hammond et al. [2021] A. M. Hammond, A. Oskooi, S. G. Johnson, and S. E. Ralph, “Photonic topology optimization with semiconductorfoundry designrule constraints,” Optics Express 29, 23916–23938 (2021).
 Hughes et al. [2019] T. W. Hughes, I. A. D. Williamson, M. Minkov, and S. Fan, “ForwardMode differentiation of maxwell’s equations,” ACS Photonics 6, 3010–3016 (2019).
 Orcutt and Ram [2010] J. S. Orcutt and R. J. Ram, “Photonic device layout within the foundry CMOS design environment,” IEEE Photonics Technology Letters 22, 544–546 (2010).
 Wang et al. [2020] K. Wang, X. Ren, W. Chang, L. Lu, D. Liu, and M. Zhang, “Inverse design of digital nanophotonic devices using the adjoint method,” Photonics Research 8, 528–533 (2020).
 Michaels and Yablonovitch [2018] A. Michaels and E. Yablonovitch, “Leveraging continuous material averaging for inverse electromagnetic design,” Optics express 26, 31717–31737 (2018).
 Frei, Tortorelli, and Johnson [2007] W. R. Frei, D. A. Tortorelli, and H. T. Johnson, “Geometry projection method for optimizing photonic nanostructures,” Optics letters 32, 77–79 (2007).
 Jensen and Sigmund [2011] J. S. Jensen and O. Sigmund, “Topology optimization for nanophotonics,” Laser & photonics reviews 5, 308–321 (2011).
 Wang, Jensen, and Sigmund [2012] F. Wang, J. S. Jensen, and O. Sigmund, “Highperformance slow light photonic crystal waveguides with topology optimized or circularhole based material layouts,” Photonics and Nanostructures  Fundamentals and Applications 10, 378–388 (2012).
 Zhou et al. [2015] M. Zhou, B. S. Lazarov, F. Wang, and O. Sigmund, “Minimum length scale in topology optimization by geometric constraints,” Computer methods in applied mechanics and engineering 293, 266–282 (2015).
 Svanberg [1987] K. Svanberg, “The method of moving asymptotes—a new method for structural optimization,” International journal for numerical methods in engineering 24, 359–373 (1987).
 Ju et al. [2002] T. Ju, F. Losasso, S. Schaefer, and J. Warren, “Dual contouring of hermite data,” ACM transactions on graphics 21, 339–346 (2002).
 Soille [2004] P. Soille, Morphological Image Analysis (SpringerVerlag, 2004).
 Koefferlein [2021] M. Koefferlein, KLayout Documentation (2021).
 [29] Calibre Verification User’s Manual.
 Bengio, Léonard, and Courville [2013] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” (2013), arXiv:1308.3432 [cs.LG] .
 Yin et al. [2019] P. Yin, J. Lyu, S. Zhang, S. Osher, Y. Qi, and J. Xin, “Understanding straightthrough estimator in training activation quantized neural nets,” (2019), arXiv:1903.05662 [cs.LG] .
 Kingma and Ba [2017] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2017), arXiv:1412.6980 [cs.LG] .
 Hubara et al. [2017] I. Hubara, M. Courbariaux, D. Soudry, R. ElYaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” J. Mach. Learn. Res. 18, 6869–6898 (2017).
Appendix
A1 Example feasible design generation
Figs. A1 and A2 illustrate the generation of a 68 feasible design in 12 steps, using Algorithm 1 and a notchedsquare brush having width 5.
A2 Designs using 100 nm notched square brush
Scattering spectra for components created with a 100 nm notched square brush are shown in Fig. A3. The corresponding designs are shown in Fig. A4. Each component is the lowestloss component found in the first 160 optimization steps of a randomlyinitialized optimization run.