A feasibility study of a hyperparameter tuning approach to automated inverse planning in radiotherapy

05/14/2021 ∙ by Kelsey Maass, et al. ∙ University of Washington 10

Radiotherapy inverse planning requires treatment planners to modify multiple parameters in the objective function to produce clinically acceptable plans. Due to manual steps in this process, plan quality can vary widely depending on planning time available and planner's skills. The purpose of this study is to automate the inverse planning process to reduce active planning time while maintaining plan quality. We propose a hyperparameter tuning approach for automated inverse planning, where a treatment plan utility is maximized with respect to the limit dose parameters and weights of each organ-at-risk (OAR) objective. Using 6 patient cases, we investigated the impact of the choice of dose parameters, random and Bayesian search methods, and utility function form on planning time and plan quality. For given parameters, the plan was optimized in RayStation, using the scripting interface to obtain the dose distributions deliverable. We normalized all plans to have the same target coverage and compared the OAR dose metrics in the automatically generated plans with those in the manually generated clinical plans. Using 100 samples was found to produce satisfactory plan quality, and the average planning time was 2.3 hours. The OAR doses in the automatically generated plans were lower than the clinical plans by up to 76.8 they were still between 0.57 they are clinically acceptable. For a challenging case, a dimensionality reduction strategy produced a 92.9 needed to optimize over the original problem. This study demonstrates our hyperparameter tuning framework for automated inverse planning can significantly reduce the treatment planner's planning time with plan quality that is similar to or better than manually generated plans.



There are no comments yet.


page 13

page 14

page 15

page 16

page 17

page 19

page 20

page 23

Code Repositories


Automatic parameter tuning for RayStation

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In inverse planning for radiotherapy, the treatment planner translates a patient’s clinical goals into a mathematical objective function, which is then minimized to determine the optimal treatment plan. The objective function typically contains terms corresponding to different clinical goals, such as maximum doses or dose–volume objectives, and each term is assigned a weight factor to specify the relative importance of treating or sparing particular structures. Unfortunately, the treatment plan that minimizes a given objective function may not be clinically acceptable or optimal, either because clinical goals have not been met, or because further improvement can still be made, with respect to better planning target volume (PTV) coverage or lower doses for organ-at-risk (OAR). To find a suitable plan, treatment planners often modify the objective function parameters through an inefficient trial-and-error approach, where the influence of individual parameters on the resulting treatment plan is not always intuitive and can depend on a variety of factors including model formulation, optimization algorithm, and patient anatomy. This means that plan quality is often determined by the skills of an individual planner and the time available to try different parameter configurations [bohsung2005imrt, chung2008can].

1.1 Literature survey

To improve efficiency and plan quality, many efforts have been made to either guide or automate this “manual planning” step. One approach is to explore the relationships between the objective function parameters and the resulting dose distributions by means of an effect–volume histogram (EVH) [alber2002toolsI] or an analytical sensitivity analysis [alber2002toolsII, krause2008role, sobotta2008tools]. In the first case, the EVH can identify organ subvolumes where weight adjustment can have the greatest impact, detect conflicting objectives, and help determine if goals can be met by either changing weights or including additional objective terms. In the second case, a sensitivity analysis of the objective and constraint parameters can quantify the influence of each parameter with respect to the objective function, which can then be used to predict changes to the resulting plan without re-optimization. This requires knowledge of the specific form of the objective and its Lagrangian function, which may not be available for all treatment planning systems. In this case, an empirical sensitivity analysis can be performed by computing treatment plans for randomly sampled objective function parameters [lu2007reduced]

. This sensitivity information can then be used to classify parameters into three sets: a fixed set (often normalized to meet clinical specifications), a sensitive set, and an insensitive set, which allows planners to reduce the dimensionality of the problem by restricting their search to the sensitive set.

Another approach, often referred to as Knowledge-based Planning (KBP), uses a library of previously optimized plans to predict the dose–volume histograms (DVHs), dose objectives, and weight factors for a new patient. For example, the dose distributions and DVHs from reference plans can be used to guide the optimization of new plans via the constraint that OAR doses cannot be worse than the reference [fredriksson2012automated]. In addition, correlations between the spatial relationships amongst OARs and PTVs, such as what is captured in the overlap–volume histogram (OVH), and the dose delivered in prior plans can be leveraged to develop models to predict expected OAR DVHs from patient geometry. This can in turn be used to initialize planning goals for new patients [wu2011data] or to develop quality control tools that can automatically detect suboptimal plans [appenzoller2012predicting]. Features of patient anatomy can also be incorporated into models such as

-nearest neighbors or multinomial logistic regression to predict objective function weights to warm start the manual planning process

[boutilier2015models]. While these tools can reduce planning time by identifying a good starting point for a particular patient, or by indicating when an acceptable plan has been found, they still require the planner to manually tune plans by trial and error. Furthermore, they need access to previously optimized plans with similar patient geometry and objectives, which may not always be available, and the resulting plan quality can be dependent upon the quality of the training set [landers2018fully].

In an effort to automate the search for objective functions that result in clinically acceptable plans, it is helpful to view objective term weights as tunable hyperparameters, taking advantage of recent advances in hyperparameter optimization from the machine learning (ML) community. In the ML setting, hyperparameters can include values related to the structure of a learning model (e.g., regularization weights, number of features, or the architecture of a neural network) or the optimization approach used to train the model (e.g., learning rate, mini-batch size, or number of epochs). This approach seeks to minimize a loss function with respect to its hyperparameters, which can be challenging due the the fact that the function is typically nonconvex, expensive to evaluate (e.g., it may involve training an instance of a neural network), and does not have a closed-form expression or access to its derivatives


. In the same way that treatment plan quality often depends on the experience of the planner, the manual tuning of ML hyperparameters can be considered a “black art” requiring expert intuition and heuristics

[bergstra2012random, snoek2012practical].

To improve efficiency, results, and reproducibility, ML researchers have developed a variety of approaches for automated hyperparameter optimization. For a small number of hyperparameters, a naive grid search can be effective, but it suffers the curse of dimensionality and is thus infeasible for most problems

[bergstra2012random, hazan2017hyperparameter]. In some cases, when certain continuity and differentiability conditions are satisfied, gradients with respect to continuous hyperparameters can be calculated using back-propagation and automatic differentiation, which can allow for the optimization of a large number of hyperparameters [bengio2000gradient, maclaurin2015gradient]. Two important classes of hyperparameter optimization methods include model-free and model-based approaches. Some examples of model-free approaches include evolutionary [hansen2001completely, li2019generalized] and random searches [bergstra2012random, li2016hyperband], which are amenable to parallel computing. In contrast, in Sequential Model-based Optimization (SMBO), the loss function is approximated by a surrogate that is cheaper to evaluate, and new hyperparameters are chosen by optimizing an acquisition function based on the surrogate. Examples include Bayesian optimization using Gaussian processes [snoek2012practical, bergstra2011algorithms]

or random forests

[hutter2011sequential, thornton2013auto], which we discuss in more detail in Section 3.

Using a treatment plan utility function or other clinical criteria to quantitatively compare different treatment plans, the hyperparameter optimization approach for radiotherapy treatment planning iteratively selects new parameters to improve plan quality using various search methods. For example, [barbiere2002parameter]

applied a combination of grid search and linear interpolation to find parameter values that improved PTV coverage while satisfying OAR constraints,

[lu2007reduced] developed a recursive random search method to optimize a plan score function based on clinical objectives, [wang2017development] used a library of optimized plans with similar patient geometry to guide weight selection and adjustment to improve the plan DVH, [landers2018fully] used Bayesian optimization to update weights to meet or improve upon dose objectives predicted using KBP, and [shen2019intelligent]

used deep reinforcement learning to automate the trial-and-error process of tuning weights to lower OAR doses. Other aspects of the treatment planning workflow have also been explored, including a combination of KBP for personalized patient objective selection and an automated search for multileaf collimator (MLC) jaw configurations to lower OAR doses


While many of these studies have focused on automated methods to tune weight factors, it is not clear that these are always the most effective parameters of the fluence map optimization (FMO) objective function to consider. For example, [hunt2003treatment, hunt2002evaluation]

found that increasing the weight of an OAR term was more likely to worsen the PTV coverage than to lower the normal-tissue dose, and that decreases to the OAR dose only occurred for severe dose constraints. Furthermore, they noted that optimization results are often more sensitive to changes in the constraint dose, which tend to contribute quadratically to the objective, as apposed to changes to the linear weight factors. Rather than increasing the OAR weights, they suggested that the best plans result from objective functions where the OAR weights are equal to or less than the PTV weights and the user-specified dose values for the PTVs and OARs are more stringent than the actual clinical goals. Additionally, equalizing the importance of PTV and OAR weight factors may also reduce the probability of finding local minima in the search space

[rowbottom2002configuration], which can be a concern for some formulations involving dose–volume objectives. Of course, the influence of weight factors and dose parameters are determined by the specific model formulation used. However, even in approaches where updating weights and dose parameters is mathematically equivalent, modifying references doses is typically more natural for planners because the weights lack an intuitive clinical interpretation [wu2003treatment].

1.2 Contribution

In this paper, we explore the feasibility of using hyperparameter optimization to improve treatment plan quality and efficiency for patients treated with stereotactic body radiation therapy (SBRT) to their peripherally located lung tumors using the RayStation treatment planning system (TPS) [bodensteiner2018raystation]. Specifically, we automatically tune objective function parameters using random and Bayesian search routines implemented in the Python package Scikit-Optimize [head2020scikit], focusing on objective function dose values rather than weight factors. We compare our automatically generated plans for six patients with the clinical plans created by medically certified dosimetrists. Using intuitive utility functions that can be personalized to reflect individual patient goals, we demonstrate that it is often possible to decrease doses to OARs below clinical goal limits, resulting in treatment plans that are as good as or better than manually-tuned plans. Our approach can be applied to any TPS with scripting capabilities, does not require access to previously optimized plans, and can improve upon manual plans by exploring a larger parameter space than what is typically considered in clinical settings. Furthermore, Gaussian process models built within the Bayesian optimization routine can provide a posteriori sensitivity information in cases where access to specific optimization function information is restricted. All project code is made openly available on GitHub [maass2021figures].

The rest of this paper is organized as follows. We introduce our model formulation in Section 2, including information on clinical goals, TPS objective functions, and treatment plan utility functions. Next, we provide details on our solution method in Section 3 with a brief overview of Bayesian optimization and Gaussian processes. In Section 4, we demonstrate the feasibility of our approach through various numerical examples on lung tumors treated with SBRT using the RayStation TPS. Finally, we conclude and discuss directions of future work in Section 5.

2 Problem Formulation

In the inverse treatment planning workflow, clinical goals are specified by the physician, often in terms of minimum or maximum doses, dose–volume objectives, average doses, or uniform doses. It is then the job of the treatment planner to translate these goals into a mathematical objective function, which is optimized by a TPS to determine beamlet intensities that deliver an acceptable dose distribution to the patient. The TPS often calculates beamlet intensities by solving an FMO problem of the form x ≥0f(x; w, θ) := ∑_i=1^k w_i f_i(x; θ_i), where constituent functions penalize beamlet configurations that violate a particular clinical goal. Positive weight factors are used to specify the relative importance of treating or sparing particular structures, while nonnegative dose parameters are chosen based on prescription values . Some constituent function types, such as those corresponding to dose–volume objectives, require additional parameters related to volume limits. Examples of constituent functions formulated with quadratic penalties are given in Table 1, where maps the beamlet intensity values to the voxel doses for the th region of interest (ROI). Depending upon the TPS, the specific penalties used for each type of constituent function may not be available to the planner.

ROI Clinical Goal Constituent Function
PTV Uniform dose
PTV Minimum dose
OAR Maximum dose
Table 1: Examples of constituent functions using quadratic penalties.

If the treatment plan that minimizes a given objective function is not clinically acceptable, the planner will update the objective by modifying weights or parameters, adding constituent functions, or even creating optimization-only ROIs (often corresponding to hot or cold spots) [hunt2003treatment]. After re-optimizing with the new objective, the planner will again evaluate the plan based on their clinical goals and experience. Due to time constraints, this iterative process often ends once an acceptable plan is identified, even if more improvements could be made. To automate this trail-and-error process for improved efficiency and treatment plan quality, we formulate it as a hyperparameter optimization problem of maximizing a treatment plan utility function with respect to the objective function parameters.

For a fixed set of constituent functions, let

denote the vector of weight factors and

the vector of dose parameters. Viewing the results of the TPS as a black-box function, we let return the solution to (2) for a given and . We define the treatment plan utility function to quantify treatment plan quality with respect to the clinical goal reference doses encoded in the vector . Our hyperparameter optimization problem can then be expressed as [2] w¿0, θ≥0g[x(w,θ); γ] x(w,θ) ∈arg min_x≥0f(x; w, θ). Depending on the patient, the planner may choose to optimize over only a subset of the weight or dose vectors. For example, treatment plans are often normalized to meet specific prescriptions, such as the PTV D95 (the dose that 95% of the PTV volume receives), or the planner might know from experience that solutions are relatively insensitive to changes in a given variable. In this case, the dose parameters in these constituent functions could be set to their corresponding goal values .

3 Algorithmic Approach

The optimization problem given in (1) is challenging for a number of reasons, many of them due to the fact that the utility function depends on the results of the TPS. First, there is no closed-form expression for or its derivatives with respect to or . This means that there is no a priori information about the properties of the function (e.g., continuity or convexity) or the influence of the decision variables. Furthermore, each evaluation of requires the TPS to solve an instance of (2), which can be expensive. Depending upon the difficulty of the case and the settings of the TPS solver (e.g., stopping tolerance or maximum number of iterations), calculating the beamlet intensities typically takes 1–10 minutes. While this is nothing compared to the time it takes to train a deep neural network, the treatment planner has a fixed time budget and multiple cases to consider. Finally, the values sampled from the utility may be noisy. This uncertainty can arise from various sources, such as the TPS solver failing to converge to a solution before reaching the maximum number of iterations, or from the dose calculation algorithm used to approximate the resulting dose distribution [chen2014impact].

These challenges make (1) a good candidate for a Bayesian optimization approach, where the search for good parameters is guided by a surrogate model that is sequentially queried and refined, as described in Algorithm 1

. Specifically, starting with a prior over our utility landscape, we iteratively sample from the utility function and refine our surrogate using a Bayesian posterior update. To determine which points to sample (i.e., which parameters to use next in the TPS objective function), we optimize an acquisition function that quantifies the benefit of sampling from different locations. There are many choices for acquisition functions, such as Thompson sampling, probability of improvement, expected improvement, upper confidence bounds, and entropy search

[shahriari2015taking]. In essence, these functions are designed so that their optima reflect the dual objectives of exploration (sampling where the surrogate’s uncertainty is high) and exploitation (sampling where the predicted utility is high). Through this approach, we can simultaneously learn about the sensitivity of treatment plan quality with respect to the objective function parameters via the posterior surrogate model, while also identifying parameters that result in clinically superior treatment plans. For a more detailed overview of Bayesian optimization and its applications, see [shahriari2015taking, agnihotri2020exploring, jones1998efficient, mockus1989bayesian].

Specify surrogate model, acquisition function, and number of iterations .
for  do
     Optimize acquisition function to determine .
     Sample by solving (2) with TPS.
     Update surrogate model using data .
end for
Algorithm 1 Bayesian optimization for in (1).

In our implementation, we approximate the utility function using the nonparametric Gaussian process surrogate model. A Gaussian process, defined in terms of mean and covariance functions, is a generalization of a multivariate Gaussian probability distribution, which is specified in terms of a mean vector and covariance matrix. While a Gaussian distribution models random scalars or vectors, a Gaussian process describes a distribution over functions. For the treatment planning problem, our Gaussian process is a collection of random treatment plan utility variables

, any finite subset of which are jointly Gaussian distributed. In particular, for any input

, there is an associated random variable

with mean

and standard deviation

. For simplicity, the mean function is initially assumed to be zero, and the covariance is defined by a covariance function or kernel

, which specifies assumptions about the similarity between nearby points. Popular kernel choices include radial basis functions, Matérn covariance functions, rational–quadratic kernels, exp–sine–squared kernels, and dot product kernels

[scikit-learn]. For more details about Gaussian processes and kernel functions, see [gortler2019a, osborne2009gaussian, rasmussen2006gaussian].

4 Numerical Simulation

In this section, we demonstrate the feasibility of using hyperparameter optimization to automate the search for objective function parameters for SBRT with the RayStation TPS. Using Bayesian optimization routines from Scikit-Optimize, we automatically tune objective function dose values to calculate beamlet intensities for six peripherally located lung tumor cases. For all cases, the objective function in (2) consists of eleven constituent functions corresponding to the clinical goals given in Table 2, chosen according to RTOG 0915 [videtic2015nrg], with tunable weights and dose parameters . The MaxDVH type has the goal that the specified volume of an ROI does not exceed in dose. The D2cm is defined as a region that is at least 2 cm away from the tumor, and therefore the maximum dose in the D2cm indicates how quickly the dose falls off outside the PTV. While we leave the volume values of the dose–volume objectives fixed in this simulation, these could also be tuned using our approach. We treat the PTV D95 dose level as a fixed parameter, with all treatment plans normalized so that exactly 95% of the PTV receives 48 Gy.

ROI Type Dose () Volume
1 Chestwall MaxDVH 30 Gy 30 cm
2 D2cm MaxDose 24-30 Gy
3 Esophagus MaxDVH 18.8 Gy 5 cm
4 Lungs MaxDVH 11.6 Gy 1500 cm
5 Lungs MaxDVH 12.4 Gy 1000 cm
6 PTV MinDVH 48 Gy 95%
7 PTV MaxDose 80 Gy
8 Ribs MaxDVH 32 Gy 1 cm
9 Ribs MaxDose 40 Gy
10 Spinal Cord MaxDVH 13.6 Gy 1.2 cm
11 Spinal Cord MaxDose 26 Gy
Table 2: Clinical goals used in numerical examples.

Our treatment plan utility function in (1) is defined as a linear combination of terms corresponding to each clinical goal,


though the term corresponding to the PTV D95 could be omitted due to normalization. We use a combination of linear and linear–quadratic utility terms to reward plans that decrease doses below (or increase doses above) the clinical goal limits. Our linear utility terms can be interpreted as percent decrease (or increase) relative to the reference dose values. For example, we use linear terms


for ROIs with maximum-dose objectives, depicted as the solid blue line in Figure 1, where calculates the maximum dose delivered to the th ROI. While the linear utility terms have an intuitive interpretation, they do not distinguish between doses that meet a clinical goal from doses that do not. Specifically, a decrease to doses above is rewarded the same as an equal decrease to doses below (i.e., the slope of the utility is the same). For clinical goals with higher priority, we can instead use a linear–quadratic utility,


depicted as the dashed orange line in Figure 1, which prioritizes meeting these goals over decreasing the doses to ROIs that have already met their respective goals.

[scale=0.7]figures/fig0.png 2323.52424.52550-5-10-15-20Linear utilityLinear–quadratic utilityD2cm MaxDose (Gy)

Figure 1: Linear and linear–quadratic utility terms corresponding to a maximum-dose clinical goal of 24 Gy for the D2cm, an optimization-only ROI that includes all voxels within 2 cm of the PTV.

Our examples are implemented with RayStation 8B, which provides scripting capabilities in CPython 3.6.5. A call to our utility function updates the RayStation objective function weight factors and dose parameters, runs RayStation’s optimization routine, retrieves the resulting ROI doses, and returns the calculated utility value. Our hyperparameter optimization problem is solved using the SMBO package Scikit-Optimize, which is built upon Python packages NumPy, SciPy, and Scikit-Learn [scikit-learn, harris2020array, 2020SciPy-NMeth]. We compare the results of uniform random sampling to Bayesian optimization based on Gaussian processes using the functions dummy_minimize and gp_minimize

, respectively, though we note that Scikit-Optimize also provides methods that utilize decision trees (e.g., random forest and extra trees regressors) and gradient-boosted trees. Except for the number of iterations and initial random points (used to initialize the surrogate model), we use the default settings for

gp_minimize. This includes a Matérn kernel with hyperparameters (e.g., length scale, covariance amplitude, and additive Gaussian noise) automatically tuned, and the acquisition function gp_hedge, which randomly selects either the lower confidence bound, expected improvement, or probability of improvement acquisition function each iteration. While the default settings yield promising results, further improvements could be made by modifying the kernel and acquisition function settings.

We first explore the relative sensitivity of ROI doses with respect to the dose parameters and weight factors in Section 4.1, which motivates our choice to tune the dose values exclusively. Next, we illustrate the behavior of random and Bayesian search methods with reference to grid search results on a small 2-D problem in Section 4.2. The full 10-D problem is implemented in Section 4.3, with comparisons to clinical treatment plans. Finally, we demonstrate the potential benefits of dimensionality reduction in Section 4.4.

4.1 Dose sensitivity with respect to weight factors and dose parameters

For our first example, we demonstrate the relative sensitivity of ROI doses with respect to weight factors and dose parameters. This is done by first fixing all weights and dose parameters . Next, we compute treatment plans by separately varying the weight factor and dose parameter for the D2cm. In general we would expect the maximum dose to the D2cm to decrease as we increase and decrease ; however, this does not appear to be the case. In Figure 2, we plot the results of varying the D2cm weight (orange dots) and dose parameter (blue dots) for the first five patients. The maximum allowed dose , denoted by the green dotted line, is chosen for each patient based on the PTV volume.

[scale=0.5]figures/fig1a.png 04.89.614.419.22424262830(1)D2cm Weight ()[scale=0.5]figures/fig1b.png 05.210.415.620.8262224262830(2)D2cm Weight ()
[scale=0.5]figures/fig1c.png 04.89.614.419.2242223242526272829(3)[scale=0.5]figures/fig1d.png 0612182430242628303234(4)D2cm Parameter ()
[scale=0.5]figures/fig1e.png 04.89.614.419.2242426283032(5)D2cm Parameter ()

Figure 2: D2cm maximum-dose sensitivity with respect to objective function weight factors and dose parameters for the first five cases. None of the plans computed by varying (orange dots) satisfy the maximum-dose clinical goal for the D2cm, while varying (blue dots) leads to treatment plans that both meet the goal and decrease the dose further below (green dotted lines).

As we increase , the resulting D2cm maximum doses behave differently for each of the five patients. For example, the dose in case 1 is generally lower for higher values of , while the dose in case 2 doesn’t seem to follow any particular pattern. Notably, none of the plans computed by varying satisfy the maximum-dose clinical goal for the D2cm. Alternatively, as we decrease the maximum dose to the D2cm initially decreases, then begins to increase as the objective becomes too difficult for the TPS. This highlights the importance of identifying a good parameter range, because unrealistic parameters can lead to worse treatment plans. In all cases, varying leads to treatment plans that meet the maximum-dose clinical goal for the D2cm, and decreases the dose further below for cases 2–5. Based on the results of this section, and in order to keep our problem dimension low, we fix all weight factors at and tune only the dose parameters for the remainder of our examples.

4.2 Comparison of search methods and utility functions

While an exhaustive grid search is simple to implement, it is infeasible for most practical problems because the number of sample points increases exponentially with the number of hyperparameters [bergstra2012random, hazan2017hyperparameter]. In order to compare our Bayesian search method and uniform random sampling with a “ground truth” grid search, in this section we consider a toy problem with only two tunable parameters. Using our first lung case, we again vary the D2cm maximum-dose parameter , and we add an additional dimension by varying the ribs maximum-dose parameter . All weights are fixed at , and all remaining dose parameters are set to their limit values .

[scale=0.5]figures/fig2a.png 01020304005101520-40-2002040Chestwall MaxDVH(% Difference From )[scale=0.5]figures/fig2b.png 010203040-1000100D2cm MaxDose(% Difference From )[scale=0.5]figures/fig2c.png 010203040-50050Esophagus MaxDVH(% Difference From )
[scale=0.5]figures/fig2d.png 01020304005101520-50050Lungs MaxDVH(% Difference From )[scale=0.5]figures/fig2e.png 010203040-50-2502550Lungs MaxDVH(% Difference From )[scale=0.5]figures/fig2f.png 010203040-100-50050100PTV MaxDose(% Difference From )
[scale=0.5]figures/fig2g.png 01020304005101520-40-2002040Ribs MaxDVH(% Difference From )[scale=0.5]figures/fig2h.png 010203040-40-2002040Ribs Parameter ()Ribs MaxDose(% Difference From )[scale=0.5]figures/fig2i.png 010203040-40-2002040Ribs Parameter ()Spinal Cord MaxDVH(% Difference From )
[scale=0.5]figures/fig2j.png 01020304005101520-50-2502550Ribs Parameter ()Spinal Cord MaxDose(% Difference From )

Figure 3: ROI doses corresponding to clinical goals computed by varying D2cm and ribs maximum-dose parameters and . While the chestwall, esophagus, lungs, and spinal cord meet their respective clinical goals, the D2cm, PTV, and ribs have regions where their maximum-dose clinical goals are not met.

For our 2-D grid search, we calculate treatment plans for parameter pairs over the range with 1 Gy increments. In Figure 3 we see the resulting dose values for each clinical goal, excluding the PTV D95, plotted in terms of percent difference from . The clinical goals for the chestwall, esophagus, lungs, and spinal cord are met in all treatment plans , with doses far below in most cases. In contrast, the D2cm, PTV, and ribs are more affected by changes to and , with regions where the clinical goals are not met. The trade-offs between meeting these different ROI goals will presumably impact on the shape of the utility function.

4.2.1 Linear utility functions

We first consider the problem of maximizing the linear utility function using random sampling and Bayesian optimization with Gaussian processes. Each method is run for 50 iterations, where the first 10 iterations of the Bayesian routine correspond to the first 10 random samples. In the first row of Figure 4, we plot the random and Bayesian samples over the utility landscape calculated using the grid search results, along with the convergence of the two search methods. While the Bayesian approach converges to a maximum utility of 465.94 after 28 iterations, compared to the grid search optimum of 467.58, the random search takes 37 iterations to converge to a lower utility of 451. In the second row of Figure 4, we plot the mean and standard deviation of the Bayes posterior surrogate function. The mean function is qualitatively similar to the utility landscape, though naturally it has more accuracy and less uncertainty near the sampled points. Finally, in Figure 5 we include the doses delivered in the optimal treatment plans, expressed as the percent difference from . All of the clinical goals are met with the exception of the D2cm maximum dose, where the lowest and highest doses are achieved in the Bayesian and random solutions, respectively. Based upon the location of the optima and the ROI doses, it appears that the trade-off specified by the linear utility function favors lowering the dose to the ribs over meeting the maximum-dose limit for the D2cm.

[scale=0.7]figures/fig3a.png 01020304005101520100200300400Ribs Parameter ()Linear Utility ()[scale=0.52]figures/fig3b.png 01020304050340360380400420440460GridRandomBayesIteration ()
[scale=0.7]figures/fig3c.png 01020304005101520100200300400Ribs Parameter ()Surrogate Mean[scale=0.7]figures/fig3d.png 010203040051015200204060Ribs Parameter ()Surrogate Standard Deviation

Figure 4: Comparison of 2-D grid search, random sampling, and Bayesian optimization using the linear utility function. (Top) Random (purple) and Bayesian (green) samples are plotted over the utility landscape calculated with the grid search plans on the left, with convergence results on the right (optimal grid search utility in orange). (Bottom) Bayesian samples are plotted over the Bayes posterior surrogate function mean and standard deviation.

[scale=0.7]figures/fig3e.png GridRandomBayes200-20-40-60

Figure 5: Doses delivered in the plans calculated with the linear utility function using grid search, random sampling, and Bayesian optimization, expressed as percent difference from .

4.2.2 Linear–quadratic utility functions

To emphasize meeting the most important clinical goals, we use a combination of the linear and linear–quadratic utility terms. For example, we use linear–quadratic utilities for all ROIs except for the chestwall and ribs, assuming that they have lower priority in clinical practice than other ROIs. For the goals corresponding to these two ROIs, we use linear utility terms. In the first row of Figure 6, we plot the iterates of our random sampling and Bayesian optimization on top of the linear–quadratic utility landscape calculated from our grid search. By setting the lower limit of our color range to zero, we see that the region of positive utility mimics the shape of the space where the D2cm satisfies its maximum-dose clinical goal in Figure 3. This demonstrates the ability of the linear–quadratic utility to better prioritize meeting important clinical goals.

For this case, the best plan identified by the grid search has a utility of 445.13, while random sampling achieves an optimum of 442.32 after 28 iterations, and Bayesian optimization produces a maximum utility of 442.98 after 47 iterations. The second row of Figure 6 contains the mean and standard deviation functions of the Bayes posterior surrogate model. On the left, we see that the Bayesian method was able to identify a region of positive utility similar to the D2cm feasible region, which enabled it to avoid sampling bad parameter values, in contrast to the random search method. We compare the doses delivered to each ROI in Figure 7, where the maximum dose to the D2cm is reduced as compared to the plans generated with the linear utility function. In particular, the Bayesian method produces a plan that meets all of the clinical goals. We also see that while the grid search optimum has a higher utility, it is achieved through a different trade-off than that found by the other two plans. Specifically, the treatment plan identified though the grid search exchanges a higher D2cm dose for lower rib doses, which does not correspond to our clinical intentions. This illustrates the importance of the utility function design, and the difficulty of translating clinical intuition into a mathematical objective function.

[scale=0.7]figures/fig4a.png 010203040051015200100200300400Ribs Parameter ()Linear–Quadratic Utility ()[scale=0.52]figures/fig4b.png 01020304050410420430440GridRandomBayesIteration ()
[scale=0.7]figures/fig4c.png 010203040051015200100200300400Ribs Parameter ()Surrogate Mean[scale=0.7]figures/fig4d.png 01020304005101520050010001500Ribs Parameter ()Surrogate Standard Deviation

Figure 6: Comparison of 2-D grid search, random sampling, and Bayesian optimization using the linear–quadratic utility function. Lower limit of the color range is set to zero to demonstrate the shape of the positive utility region. (Top) Random (purple) and Bayesian (green) samples are plotted over the utility landscape calculated with the grid search plans on the left, with convergence results on the right (optimal grid search utility in orange). (Bottom) Bayesian samples are plotted over the Bayes posterior surrogate function mean and standard deviation.

[scale=0.7]figures/fig4e.png GridRandomBayes0-20-40-60

Figure 7: Doses delivered in the optimal plans calculated with the linear–quadratic utility function using grid search, random sampling, and Bayesian optimization, expressed as percent difference from .

4.3 Comparison with clinical plans

After demonstrating how our approach works for a small 2-D problem, we now consider the 10-D problem where each of the dose parameters are tuned over the range , with the exception of the fixed PTV D95 parameter . The lower bound of is chosen empirically based on the observation in Section 4.1 that unreasonably low parameters can lead to worse treatment plans. Because the PTV maximum-dose parameter cannot be less than the PTV D95 parameter, it is instead tuned over the range . As before, all weight factors are fixed at , and all terms in the utility function use the linear–quadratic utility, except for those corresponding to the chestwall and ribs, where we use the linear utility. For each case, we run random and Bayesian search for 100 iterations, where the first 20 iterations of the Bayesian routine correspond to the first 20 random samples. We compare the resulting optimal plans with the doses from clinical plans manually generated by a certified medical dosimetrist.

In Figure 8, we plot the convergence results along with the utility evaluated for the clinical and default plans, where all dose parameters are set to their clinical goal limits, i.e., . For most cases, the default utility is worse than the clinical utility; the utility is particularly low for the case 4 clinical plan because the spinal cord maximum dose–volume value is 33.85% above (see Figure 9), where a linear–quadratic utility term is used. After tuning the objective function dose parameters, the plans computed with Bayesian optimization achieve the highest utility, on average 3.38% higher than the plans computed with random sampling and 40.12% higher than the clinical utilities (median 12.83%). The random search method takes between 1.31–2.88 hours to complete 100 iterations, with a mean of 2.07 hours total computation time. This is on average 23.39 minutes faster than the Bayesian routine, which takes between 1.81–3.64 hours with a mean of 2.46 hours total. This is most likely due to the fact that the latter method also computes the Bayes posterior surrogate model and optimizes the acquisition function every iteration.

[scale=0.5]figures/fig5a.png -800-600-400-2000200400[scale=0.5]figures/fig5b.png 440450460470480
[scale=0.5]figures/fig5c.png -2500-2000-1500-1000-5000500[scale=0.5]figures/fig5d.png 410415420425430
[scale=0.5]figures/fig5e.png -1000-750-500-2500250500[scale=0.5]figures/fig5f.png 465470475480485490495
[scale=0.5]figures/fig5g.png -800-600-400-2000200400[scale=0.5]figures/fig5h.png 300310320330340350360370
[scale=0.5]figures/fig5i.png 020406080100-600-400-2000200400Iteration ()[scale=0.5]figures/fig5j.png 20406080100510512514516518520522ClinicalDefaultRandomBayesIteration ()

Figure 8: Convergence results for 10-D random (purple) and Bayesian (green) search methods, along with the clinical (orange) and default (blue) plan utilities. For all five cases, the Bayesian method achieves the highest utility.

Of course, a higher utility only matters if the resulting plans have improved with respect to the clinical goals. To evaluate plan quality, we include the ROI doses for each optimal plan as a percent difference from (Figure 9) and as a percent difference from the clinical values (Figure 10). For all cases, the Bayesian method meets the clinical goals for every term using the linear–quadratic utility. Furthermore, the automatically generated plans deliver up to a 76.75% lower dose than the clinical plans for all ROI goals that are violated in the latter (highlighted in red). Where the ROI doses are larger in the automatically generated plans, they are still between 0.57% above and 98.87% below the limit doses, indicating that they are clinically acceptable. In Figure 11, we plot the dose–volume histograms for each case, with dotted lines for the clinical plans, dashed lines for the plans computed through random sampling, and solid lines for the plans identified from Bayesian optimization. The most consistent trade-off made by the automatically generated plans, as compared to the clinical plans, appears to be an increased dose to the PTV in exchange for a decreased dose to the spinal cord.

[scale=0.55]figures/fig6.png ClinicalRandomBayesClinicalRandomBayesClinicalRandomBayesClinicalRandomBayesClinicalRandomBayes-80-60-40-2002040

Figure 9: Dose metrics in the optimal plans expressed as percent difference from . For all cases, the automatically generated plans meet the clinical goals for every term using the linear–quadratic utility. Goals that are not met by the clinical plans are highlighted in red.

[scale=0.55]figures/fig7.png RandomBayesRandomBayesRandomBayesRandomBayesRandomBayes-60-40-2002040

Figure 10: Dose metrics in the optimal plans expressed as percent difference from clinical plan doses. The automatically generated plans deliver up to 76.75% lower dose than the clinical plans for all clinical goals that are violated in the latter (highlighted in red).

[scale=0.5]figures/fig8a.png 010203040506070020406080100[scale=0.5]figures/fig8b.png 010203040506070020406080100
[scale=0.5]figures/fig8c.png 010203040506070020406080100[scale=0.5]figures/fig8d.png 0204060020406080100Dose (Gy)
[scale=0.5]figures/fig8e.png 010203040506070020406080100ChestwallD2cmEsophagusLungsPTVRibsSpinal CordDose (Gy)

Figure 11: Dose–volume histograms for plans computed with random and Bayesian search compared to clinical plans. Dotted, dashed, and solid lines correspond to the clinical, random sampling, and Bayesian optimization plans, respectively. The most consistent trade-off made by the automatically generated plans, as compared to the clinical plans, appears to be an increased dose to the PTV in exchange for a decreased dose to the spinal cord.

Finally, in Figure  12 we look at the distribution of optimal dose parameters , expressed as percent difference from , found for all automatically generated plans. We first observe that many of the optimal plans are achieved with parameters on the lower boundary of their search ranges, which is often much lower than the parameters values typically considered by a manual planner. This confirms the observation made in [holdsworth2012use] that more flexibility in the search space can lead to improved treatment plans. The optimal parameters with the widest ranges correspond to clinical goals that appear to be the easiest to meet across all five cases, specifically those for the esophagus and the lungs. This seems to suggest that the utility may be insensitive to changes in these parameters, though other factors, such as the correlation between these parameters, the ROI doses, and the utility values must also be considered. We also note that some optimal parameters never occur in the upper portion of their respective ranges (D2cm and spinal cord), while others never occur in the lower portion (chestwall and ribs). This may in part be due to the choice of linear vs. linear–quadratic utility terms.

[scale=0.7]figures/fig9.png -60-40-200Optimal Dose Parameters

Figure 12: Optimal dose parameters for all automatically generated plans, expressed as percent difference from .

4.4 Dimensionality considerations

In practice, Bayesian optimization methods have been shown to work efficiently for problems of moderate dimension; however, their success is often limited to no more than 10–20 parameters [wang2013bayesian, maclaurin2015gradient, kandasamy2015high]. Therefore, scaling these approaches to higher dimensions remains a challenge, and efforts to reduce problem dimension can lead to increased efficiency. A useful feature of many problem classes is low effective dimensionality: most dimensions don’t significantly affect the utility function, and many dimensions may be correlated. Low effective dimensionality is often present in the treatment planning problem, e.g. (2) and (1), and as mentioned in Section 1, there have been previous approaches to analytically and empirically identify important parameters for the treatment planning problem. In this section, we motivate future work in automatic parameter selection for our hyperparameter optimization approach by demonstrating the benefit, both in terms of efficiency and treatment plan quality, of identifying a subset of sensitive parameters and corresponding parameter ranges.

We consider a clinically challenging case, where PTV coverage in the clinical plan is compromised to meet the spinal cord clinical goal (though in our framework, the plan is normalized to meet the PTV D95 goal). To see if optimizing on a subset of the parameters can improve results, we compare plans optimized over both 10 and 4 dose parameters. Specifically, for the 10-D plan, we use the parameters from Section 4.3, and for the 4-D plan, we fix all parameters at , except for those corresponding the D2cm, PTV, and ribs maximum doses and the spinal cord maximum dose–volume objective. Furthermore, these parameters are tuned over the modified ranges of , , , and . These parameters and ranges are chosen empirically, based on correlations between ROI doses and parameter configurations in the results of the 10-D problem.

In Figure 13, we compare the convergence results of using the 4-D subset of our parameter space against the results using the 10-D parameter space. While the 10-D example achieved a maximum utility of 85.9 at iteration 91, the 4-D version finds a treatment plan with a higher utility of 165.68 after only 35 iterations. This improvement in utility corresponds to decreased doses in many of the ROIs, as seen in Figure 14. This example demonstrates the potential benefit of choosing a good set of parameters to improve the efficiency of the Bayesian optimization routine. While in this case we manually chose our parameter subset, methods to automatically discover sensitive parameters could further improve results.

[scale=0.5]figures/fig10a.png 020406080100-25000-20000-15000-10000-50000ClinicalDefault10-D Bayes4-D BayesIteration ()[scale=0.5]figures/fig10b.png 020406080100-3000-2000-10000Iteration ()

Figure 13: Convergence results for Bayesian search methods using a different choice and range of parameters, along with the clinical (orange) and default (blue) plan utilities. Results using a 4-D subset of the parameters (green) are compared to those using the full 10-D search space (purple).

[scale=0.7]figures/fig11.png Clinical10-D Bayes4-D Bayes50250-25-50-75

Figure 14: Doses delivered in the optimal plans expressed as percent difference from . The middle row contains doses from the 10-D Bayesian optimal plan, while the bottom row corresponds to the result computed using a 4-D subset of the parameters.

5 Conclusion and Future Work

The manual, trial-and-error approach prevalent in the radiotherapy inverse planning process to find optimal objective function parameters is inefficient, and the plan quality largely depends on the treatment planner’s skills and planning time allowed. In a busy clinic setting, this could potentially lead to suboptimal treatment plans. In this paper, we proposed a hyperparameter optimization framework to automate the search for optimal objective function parameters, which can help maintain plan quality and reduce treatment planning time. We demonstrated the feasibility and potential benefit of our model through numerical examples for peripherally located lung tumors treated with SBRT using the RayStation TPS. Specifically, we automatically generated treatment plans for six cases using both random sampling and Bayesian optimization with Gaussian processes to optimize linear and linear–quadratic utility functions. The average treatment planning time (no planner intervention required) was 2.3 hours, and plan quality improved for most cases. Our approach can be easily integrated with any commercial TPS with a scripting interface and does not require training a model on a library of previously optimized plans as in the KBP approach.

There are many potentials directions to expand upon the work in this paper. For example, developing methods to automatically identify sensitive parameters and effective parameter ranges could guide dimensionality reduction to improve efficiency and utility, as discussed in Section 4.4. When a library of previously optimized plans is available, our approach could also be used in conjunction with KBP methods. Other aspects of our implementation, including utility function design and the choice of the surrogate model’s kernel and acquisition functions, could also enhance performance. Additionally, further advances in ML hyperparameter optimization can continue to motivate future work, including techniques such as early stopping [li2016hyperband], parallel implementations for Bayesian optimization [snoek2012practical], taking advantage of utility functions with additive structure [kandasamy2015high], or collaborative approaches that can tune multiple similar problems simultaneously [bardenet2013collaborative].

Availability of Materials

The figures in this article, as well as the Python scripts necessary to reproduce them, are available openly under the CC-BY license [maass2021figures]. All examples are computed using the RayStation TPS [bodensteiner2018raystation] and the Python package Scikit-Optimize [head2020scikit].