A constrained optimization problem involves the minimization of an objective function subject to constraints that limit the set of possible feasible solutions. Training a neural network to solve constrained optimization problems in agenerative
way, (i.e., without a training set of problem parameters and optimal solutions) is as old as the field of deep learning itself[Hopfield and Tank, 1985]. Typically, the objective is modified by adding a barrier function that penalizes the objective value when solutions do not satisfy the constraints; the modified objective is then approximated with a neural network. This approach parallels interior point methods (IPMs), which are a family of classical operations research techniques for constrained optimization. In IPMs, a differentiable barrier function is first constructed before using gradient descent on the modified objective [Nemirovski and Nesterov, 1994]. IPMs have become indispensable for large-scale linear and quadratic convex optimization problems, due not only to complexity guarantees, but also because the number of iterations required scales gracefully with the problem size [Gondzio, 2012]. However, a drawback of IPMs is that the problem needs to be well-defined. Specifically, to obtain these complexity guarantees, the objective function and all the constraints must be known a priori and the feasible set must be well-behaved, e.g., convex [Bubeck and Eldan, 2014]. Furthermore, these methods typically search for a single optimal solution, while in practice, multiple optimal solutions may exist and need to be determined.
In this paper, we study the task of generatively learning the optimal solution set of a constrained optimization problem, including situations where the feasible set cannot be explicitly defined. Our method combines ideas from IPMs and generative adversarial learning [Goodfellow et al., 2014]. We consider a general class of problems where constraints may be non-convex or even not explicitly defined. Instead, the feasible region is learned via a dataset of given feasible solutions, modeling the situation where we observe actions of decision makers who may be adhering to private, non-observable constraints. Analogous to IPMs, which use a barrier function to enforce feasibility and find an iteratively improving solution, our generative adversarial network (GAN) trains a discriminator to act as a barrier function and a generator to return improving solutions. We prove that our approach recovers analogous theoretical guarantees of IPMs. Moreover, our methodology allows the generator network to simultaneously learn the entire set of optimal solutions rather than converging to a single point. This is particularly important if decision makers wish to explore the Pareto frontier of optimal solutions to consider various trade-offs before implementing a particular optimal solution.
Our work is motivated by a planning problem in radiation therapy (RT) treatment optimization. RT is used to treat more than 50% of cancer patients worldwide [Delaney et al., 2005]. The current treatment optimization paradigm consists of a human-driven, iterative process of solving an approximation to the true non-convex, constrained optimization problem. That is, while a treatemtn planner generates deliverable treatments, an oncologist provides feedback on clinical acceptability until final approval is given. This iterative process traverses the Pareto frontier of deliverable plans while the planner simultaneously learns the oncologists private preferences regarding the feasible set. As this process often requires several iterations over a span of days, there is significant interest in automating the pipeline using procedurally generated plans [McIntosh and Purdie, 2017]. The most common approach is to solve a convex approximation to the problem, and then measure the quality of the resulting solution against the true non-convex measures [Babier et al., 2018]. We propose a methodology that can directly produce deliverable plans. In our numerical results, we show that it is possible to generate high-quality solutions with these non-convex measures incorporated into a simplified treatment optimization model. To our knowledge, this is the first approach to RT treatment planning that directly tackles the core non-convexity of the problem using a deep learning approach.
Our specific contributions are as follows:
We develop a novel approach to solving non-convex optimization problems, using a data-only specification of the feasible set. Our approach can generate the entire optimal solution set, a critical task when decision makers are interested in evaluating multiple optimal solutions.
We provide the first integration of methodology from the domains of IPMs and GANs. Our proposed algorithm, IPMAN, replaces the traditional barrier with a discriminator and the optimal solution with the generator distribution. Our approach also provides a theoretical guarantee that by solving our modified problem with the discriminator as the barrirer, any resulting solution is -optimal with respect to the original, possibly non-convex problem.
We apply IPMAN algorithm to a problem in radiation therapy treatment planning. We demonstrate that the generated plans satisfy the non-convex problem constraints and, by analyzing various objectives, grade better as compared to a training set of deliverable plans.
2 Related Work
2.1 Interior Point Methods
Interior point methods (IPMs), or barrier methods, are standard techniques for solving constrained optimization problems [Nemirovski, 2004], defined by an objective function and a feasible set
in which the optimal decision vectormust reside. Consider the problem
Formulation (1) can be transformed into an unconstrained optimization problem by “dualization”, i.e., introducing a regularization parameter and a penalty function to address feasibility. For IPMs, this is achieved with a barrier function, , that satisfies two properties: (i) when ; and (ii) when . The resulting unconstrained optimization problem is
Given a differentiable barrier function and an initial solution , IPMs use the Newton method to iterate over and , until convergence to an optimal solution [Boyd and Vandenberghe, 2004]. These methods have found most success in linear and quadratic optimization, where it is possible to give theoretical guarantees on optimality, as well as fast empirical convergence rates [Gondzio, 2012]. For arbitrary convex feasible sets, Nemirovski and Nesterov  introduced the concept of universal self-concordant barriers, a family of functions with properties that guarantee convergence. The first explicit construction of such functions were only recently proposed [Bubeck and Eldan, 2014] and this has led to renewed interest in IPMs for more challenging convex optimization problems (e.g., Abernethy and Hazan, 2015, Karimi et al., 2017). We note that although the extant literature has primarily focused on problems with convex feasible sets, IPMs have also been adapted for problems with non-convex feasible sets [Vanderbei and Shanno, 1999, Benson et al., 2004, Hinder and Ye, 2018]. In this case, IPMs are more difficult to implement. Efficiency guarantees may not exist, and if they do, the functions must obey strict requirements (e.g., differentiability).
These approaches require that the constraints and objective are known to the optimizer a priori and characterized by differentiable functions. In this paper, we extend the literature on IPMs by proposing a methodology that does not require an explicit description of the set of constraints or require the constraint to obey any particular properties. Instead, we define the feasible region using only a set of observed data points representing observed decisions by a user or set of users.
2.2 Generative Adversarial Networks
GANs and their variants have revolutionized generative modeling for a variety of applications [Goodfellow et al., 2014, Taigman et al., 2016, Wu et al., 2016, Isola et al., 2017]. Let denote samples from a data distribution and denote a latent distribution. In a GAN, a generator and a discriminator compete in a min-max game, where the generator learns to generate samples
and the discriminator learns to classify them as belonging toor not (see Goodfellow 
for more details). The loss function for this game is given below:
A GAN that is trained to global optimality possesses the following properties. [Goodfellow et al. ] Consider a GAN of sufficient capacity trained on a data distribution . Let denote the generator output distribution.
For any fixed generator , the optimal discriminator is
For an optimal generator , the optimal discriminator is
While Lemma 2.2 provides powerful theoretical guarantees for performance, several recent results demonstrate that global convergence of both generator and discriminator is usually not attainable [Li et al., 2017, Heusel et al., 2017, Nagarajan and Kolter, 2017, Mescheder et al., 2017, Arjovsky and Bottou, 2017]. In particular, Li et al.  prove that there are trade-offs between generator and discriminator updates. Moreover, Arjovsky and Bottou  show that while an optimal discriminator can be achieved, an optimal generator usually cannot. Fortunately, this inherent weakness of GANs is less concerning in our setting. While an optimal generator/discriminator pair is ideal, our main result below only requires that the discriminator be optimal for any arbitrary generator.
3 An Interior Point Algorithm using Adversarial Networks
We propose a two-stage method for solving an arbitrary constrained optimization problem by combining IPMs with deep learning. We train a GAN and use the discriminator as a barrier and the generator to provide an initial set. In Section 3.1, we prove the effectiveness of the discriminator as a barrier. In Section 3.2, we re-purpose the generator to converge to the optimal set for a desired objective function. The pseudo-code for the complete IPMAN algorithm is given in Algorithm 1.
3.1 The discriminator as the barrier
Consider the constrained optimization problem (1) with an objective function that we assume, for simplicity, is bounded below. We do not explicitly define the feasible set and make no assumptions on structure. Let denote an optimal solution and denote the set of optimal solutions. Further, let denote an arbitrary “feasible distribution”, i.e., any distribution whose support is . In the first stage, we train a GAN using a dataset of samples .
Consider a GAN trained on . For any fixed generator , let be the optimal discriminator. Then, for any , there exist constants , such that
We first prove that (3) is bounded and feasible. Note that, by assumption, is bounded below. Further, by definition, there exists an optimal solution to (1). From Lemma 2.2, the optimal discriminator satisfies for all , confirming that (3) is feasible and bounded.
For classical IPMs, -optimality is proved by showing that the optimal solution to (3) satisfies the KKT conditions [Boyd and Vandenberghe, 2004]. However, in our data-driven setting, the feasible set is not explicitly defined by differentiable constraints and the standard approach (i.e., using the KKT conditions) cannot be applied. This necessitates the introduction of the additional parameter. Overall, the term results in a less “elegant” result, in exchange for generality. Fortunately, in practice, the extra parameter does not significantly affect implementation, as we describe later below.
Theorem 3.1 proves that the barrier function is capable of guaranteeing -optimality for any generator. That is, if the discriminator is trained to optimality, an explicit performance bound can be specified. In this context, the generator constructs an approximation of the feasible set and its purpose is analogous to the problem of generating a starting solution in an IPM. This approach sidesteps many of the difficulties of training a GAN to global optimality, as in practice, we need only train to an optimal discriminator and a sufficiently capable generator. Nevertheless, -optimality is guaranteed only for specific choices of and , i.e., those that satisfy the requirements in the proof. Training a generator to optimality relaxes these necessary conditions. Let be the globally optimal generator/discriminator pair for a GAN trained on . For any and , (3) and (1) have the same optimal solutions.
Several prior results (e.g., Li et al., 2017, Arjovsky and Bottou, 2017) demonstrate that, except under very specific conditions, an optimal generator/discriminator pair is generally not achievable. Thus, in practice, we must rely on the results from Theorem 3.1. This introduces two issues that must be addressed to ensure -optimality is preserved. First, must be carefully chosen. Second, for to be correctly specified, knowledge of the optimal solutions and are required.
To overcome these obstacles, note that training is an iterative procedure where and
can be viewed as hyperparameters. That is,and are chosen first and then held fixed while the GAN is trained. After training concludes, and are updated and the GAN is trained again. This iterative procedure continues until a sufficiently small objective function value is obtained. However, notice that during the training phase, the term is constant. Thus, it can be removed from the loss function without loss of generality (i.e., set ). Consequently, our iterative procedure (omitting ) finds the optimal discriminator and the regularization parameter that minimizes (3). Then, to guarantee -optimality, we choose where is a sufficiently small constant.
3.2 Generating the optimal set
Given a GAN trained on , the discriminator is a barrier function and the generator produces solutions from a distribution whose support is approximately . We now demonstrate how the generator can be used to learn . Consider a GAN trained on samples from until we reach an optimal discriminator. Suppose that we then freeze the discriminator weights and train the generator over the loss function:
There exists an optimal generating distribution whose support is .
For a generator with sufficient capacity, problem (4) is equivalent to
For any , there exists an optimal generating distribution with mass only on . This is observed by noting that is a global optimum and that the minimum of a set of values (in this case, objective function values) is smaller than or equal to the mean. Consider any distribution whose support is . By the same argument, the mean of the set equals the minimum because each point in attains the minimum. ∎
Classical methods for solving constrained optimization problems find a single optimal solution, although many may exist. Determining the complete optimal set is typically a difficult problem that relies on uses on enumerative or intelligent search techniques (e.g., Cornuejols and Trick, 1998, Tantawy, 2007, Guenther et al., 2014). However, by leveraging a GAN, we can use the generator to learn the distribution of the optimal set. Note that our approach provides no guarantee that the complete optimal set can be learned. However, any generating distribution supported on a subset of , by definition, is also an optimal solution of (2). Further, our numerical experiments suggest that we often converge to the full optimal set or a sufficiently large subset.
4 Numerical results
In this section, we demonstrate how to apply our methodology to compute the optimal solution set for two sets of examples. First, to visualize how the IPMAN algorithm performs, we solve a synthetic two-dimensional optimization problem with a non-convex feasible set using several linear and nonlinear objective functions (see Section 4.1). Then, in Section 4.2, we explore how the IPMAN algorithm can be applied to a realistic non-convex optimization problem associated with radiation therapy treatment optimization. Code for all experiments is provided at https://github.com/rafidrm/ipman.
4.1 Synthetic examples
Output of IPMAN over a non-convex feasible set for various different objectives. The beige dots represent feasible solutions. The blue dots represent realizations from the final generator distribution, i.e., the optimal set. To improve visibility, we removed outliers beyond thepercentile.
We trained a basic GAN with one hidden layer and leaky ReLU for both generator and discriminator networks[Greydanus, 2017] to learn the following L-shaped feasible set :
The shape of was chosen because it was non-convex, easy to visualize, and optimal solutions could be verified analytically using the KKT conditions.
To obtain an optimal discriminator, we applied several modifications to the training procedure [Chintala et al., 2016]. First, to generate , we sampled uniformly within a slightly smaller subset of and added Gaussian noise to smooth the distribution at the boundary; this helped stabilize training. Second, we used a dataset of i.i.d. samples of infeasible solutions (). In later iterations of stage 1, we periodically replaced generator samples with the infeasible samples in order to better update the discriminator. Finally, we updated the discriminator ten times more frequently compared to the generator. All models were trained using the Adam optimizer [Kingma and Ba, 2014].
After training the discriminator, we minimized several linear and and nonlinear objective functions over this non-convex feasible region by learning the optimal solution set. We chose and for the first three problems and and for the last. Finally, we generated samples for each problem and removed outliers beyond the percentile.
We evaluated each model based on three different measures. We first considered the absolute objective function value error , where is the known optimal solution. This error is the empirical analogue of the -optimality guarantee. We also measured the Value-at-Risk (VaR) at the percentile; just as calculates the mean, VaR measures the worst generated error. Finally, because the functions grow at different rates, we also calculated the average distance to the optimal set . The final generator distributions are displayed in Figure 1. We present the scores in Table 1 and summarize the results below:
Linear : Due to the smoothness of , the discriminator penalizes solutions where . As a result, the final distribution is cut off near the boundary. Nonetheless, the distribution converges to within with a few outliers exceeding .
Quadratic : The generator distribution quickly converged to the optimal solution, as the the optimal set is a singleton in the feasible set.
Bilinear : The objective is non-convex with two optimal solutions at opposite ends of the feasible set. Although is disproportionately high due to quadratic growth from the optimal solution value, the value of suggests that the generated samples are very close to the optimal solutions.
Rosenbrock : This is a standard test for non-convex optimization algorithms [Yang, 2010]. The function has a large easy-to-learn valley (there are many local minima) with a hard-to-find global minimum at . We quickly find the valley and slowly converge to the optimal solution.
In all cases, we observe that our method converges to the optimal solution, and when the are multiple global optima, our approach quickly produces many solutions in the optimal set. These examples demonstrate that the IPMAN algorithm is not only theoretically viable, but performs well empirically.
4.2 Radiation therapy treatment planning
In this section, we apply our methodology to the problem of generating RT treatment plans for prostate cancer. Given the computed tomography (CT) images and treatment specifications for a patient [Breedveld and Heijmen, 2017], we used the IPMAN algorithm to generate an optimal dose distribution . The objective penalizes excess dose to healthy tissue and insufficient dose to the target structure. Formally, the optimization problem is
where is the excess dose penalty and is the prescribed dose to the target. Constraints (1) and (2) are non-convex Value-at-Risk constraints whereas (3) and (4) can be modeled as linear constraints.
We used two methods to generate the training data. To generate feasible solutions in , we solved convex approximations of problem (5). To better train the discriminator in later iterations, we also generated infeasible solutions. This constituted the training samples used in stage 1 of IPMAN.
We implemented a 3D variant of the Style Transfer GAN [Isola et al., 2017] by modifying the U-net architecture for voxels [Wu et al., 2016]. The network learns a mapping from 3D CT images to 3D dose distributions. In stage 1, we trained for iterations using the same optimizer settings as in Isola et al. . In later iterations, we periodically replaced generator samples with samples from the infeasible dataset for the discriminator. In stage 2, we froze the discriminator and re-trained the generator for the objective function in (5) plus a barrier. We set , and trained for iterations. As we optimized for the same patient, we used the same CT images with added noise.
|(1)||VaR of dose to tumor||✓||✓||✓|
|(2)||VaR of dose to tumor||✓||✓||✓|
|(3)||Max of dose to urethra||✓||✗||✓|
|(4)||Max of dose to bladder||✓||✗||✓|
|Average objective function value||1.00||3.53||0.12|
In stage 1, we focused on ensuring that the GAN learned the non-convex VaR constraints at the cost of failing the convex constraints to the urethra and bladder. Using the objective function to penalize non-target coverage, we guided the GAN in stage 2 to correct the infeasibility in the urethra and bladder constraints while using the barrier function to ensure that the non-convex VaR constraints remained satisfied. After the second stage, we generated ten predictions (IPMAN) and compared them to all of the deliverable plans (DELIVERABLE) and the sample predictions after stage 1 only (STAGE-ONE). The results are presented in Table 2. Observe that the final generated dose distributions from the IPMAN algorithm satisfy all of the convex and non-convex criteria. Moreover, the objective function values significantly improve over the deliverable plans.
One drawback to the IPMAN approach is that areas of low dose, i.e., structures far away from the target, are smoothed (see Figure 2). This is because the IPMAN algorithm must learn to navigate a complex feasible region to obtain a globally optimal solution, i.e., the precise dose distribution to the target structure. This precision comes at a cost; information on dose delivery far from the target structure is lost. Nevertheless, the high dose region where IPMAN performs well is far more important to predict accurately, as this represents the intervention that treats the cancer.
We present a new methodology for solving constrained optimization problems that combines generative adversarial learning and interior point methods. Our approach extends previous IPM results to situations where the feasible set is non-convex or not even explicitly defined. Our proposed IPMAN algorithm achieves -optimality with respect to the original problem. Moreover, when there are multiple global optima, our algorithm learns a distribution over the optimal set. We demonstrate the effectiveness of our framework by applying it to a problem in radiation therapy treatment optimization where the feasible set is non-convex.
This work represents the first attempt at using generative adversarial networks to learn optimality criteria for constrained optimization problems. While there are many advantages over classical methods (e.g., data-driven, addresses non-convexity) there are several new challenges. First, our methods bypass the well-known weaknesses of using GANs by requiring only that the discriminator be optimal for any generator. While this is achievable in theory, obtaining a highly accurate discriminator requires significant fine tuning. One method we found successful was to augment training with infeasible points. Second, the contribution of the barrier function to the objective must be carefully managed by determining a satisfactory value for the dual parameter. Nevertheless, the positive results on a difficult optimization problem (i.e., RT treatment optimization) suggest that generatively learning optimality is a viable approach that warrants further investigation.
Support for this research was provided by the Natural Sciences and Engineering Research Council of Canada.
- Abernethy and Hazan  J. Abernethy and E. Hazan. Faster convex optimization: Simulated annealing with an efficient universal barrier. arXiv preprint arXiv:1507.02528, 2015.
- Arjovsky and Bottou  M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
- Babier et al.  A. Babier, J. J. Boutilier, M. B. Sharpe, A. L. McNiven, and T. C. Y. Chan. Inverse optimization of objective function weights for treatment planning using clinical dose-volume histograms. Phys Med Biol, 63(10):105004, May 2018. doi: 10.1088/1361-6560/aabd14.
- Benson et al.  H. Y. Benson, D. F. Shanno, and R. J. Vanderbei. Interior-point methods for nonconvex nonlinear programming: jamming and numerical testing. Mathematical programming, 99(1):35–48, 2004.
- Boyd and Vandenberghe  S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
- Breedveld and Heijmen  S. Breedveld and B. Heijmen. Data for trots - the radiotherapy optimisation test set. Data Brief, 12:143–149, Jun 2017. doi: 10.1016/j.dib.2017.03.037.
- Bubeck and Eldan  S. Bubeck and R. Eldan. The entropic barrier: a simple and optimal universal self-concordant barrier. arXiv preprint arXiv:1412.1587, 2014.
- Chintala et al.  S. Chintala, E. Denton, M. Arjovsky, and M. Mathieu. How to train a gan? tips and tricks to make gans work, 2016. URL https://github.com/soumith/ganhacks.
- Cornuejols and Trick  G. Cornuejols and M. Trick. Quantitative methods for the management sciences 45-760 course notes, 1998.
- Delaney et al.  G. Delaney, S. Jacob, C. Featherstone, and M. Barton. The role of radiotherapy in cancer treatment. Cancer, 104(6):1129–1137, 2005.
- Gondzio  J. Gondzio. Interior point methods 25 years later. European Journal of Operational Research, 218(3):587–601, 2012.
- Goodfellow  I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
- Goodfellow et al.  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- Greydanus  S. Greydanus. Mnist-gan, 2017. URL https://github.com/greydanus/mnist-gan.
- Guenther et al.  J. Guenther, H. K. H. Lee, and G. A. Gray. Finding and choosing among multiple optima. Applied Mathematics, 5(02):300, 2014.
- Heusel et al.  M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a nash equilibrium. arXiv preprint arXiv:1706.08500, 2017.
- Hinder and Ye  O. Hinder and Y. Ye. A one-phase interior point method for nonconvex optimization. arXiv preprint arXiv:1801.03072, 2018.
- Hopfield and Tank  J. J. Hopfield and D. W. Tank. “neural” computation of decisions in optimization problems. Biological cybernetics, 52(3):141–152, 1985.
- Isola et al.  P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint, 2017.
- Karimi et al.  M. Karimi, S. Luo, and L. Tunçel. Primal-dual entropy-based interior-point algorithms for linear optimization. RAIRO-Operations Research, 51(2):299–328, 2017.
- Kingma and Ba  D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Li et al.  J. Li, A. Madry, J. Peebles, and L. Schmidt. Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884, 2017.
- McIntosh and Purdie  C. McIntosh and T. G. Purdie. Voxel-based dose prediction with multi-patient atlas selection for automated radiotherapy treatment planning. Phys Med Biol, 62(2):415–431, Jan 2017. doi: 10.1088/1361-6560/62/2/415.
- Mescheder et al.  L. Mescheder, S. Nowozin, and A. Geiger. The numerics of gans. In Advances in Neural Information Processing Systems, pages 1823–1833, 2017.
- Nagarajan and Kolter  V. Nagarajan and J. Z. Kolter. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pages 5591–5600, 2017.
- Nemirovski  A. Nemirovski. Interior point polynomial time methods in convex programming. Lecture notes, 2004.
- Nemirovski and Nesterov  A. Nemirovski and Y. Nesterov. Interior-point polynomial methods in convex programming. Society for Industrial and Applied Mathematics, 1994.
- Taigman et al.  Y. Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200, 2016.
S. F. Tantawy.
Using feasible direction to find all alternative extreme optimal points for linear programming problem.Journal of Mathematics and Statistics, 3(3):109–111, 2007.
- Vanderbei and Shanno  R. J. Vanderbei and D. F. Shanno. An interior-point algorithm for nonconvex nonlinear programming. Computational Optimization and Applications, 13(1-3):231–252, 1999.
- Wu et al.  J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016.
- Yang  X.-S. Yang. Test problems in optimization. arXiv preprint arXiv:1008.0549, 2010.