I Introduction
There has been increasing interest in automated machine learning (AutoML) for improving data scientists’ productivity and reducing the cost of model building. A number of general or specialized AutoML systems have been developed [13, 30, 35, 11, 27, 21, 23], showing impressive results in creating good models with much less manual effort. Most of these systems only support a single objective, typically accuracy or error, to assess and compare models during the automation process. However, building and selecting machine learning models is inherently a multiobjective optimization problem, in which tradeoffs between accuracy, complexity, interpretability, fairness or inference speed are desired. There are a plethora of metrics for describing model performance [10, 33] such as precision, recall, F1 score, AUC, informedness, markedness, and correlation to name a few. In general, each measure has an inherent bias [33] and we typically expect data scientists to compare different performance measures when selecting the best models from a set of candidates. A data scientist might desire relatively accurate models but with minimal memory footprints and/or faster inference speed. Alternatively, a data scientist might have business constraints that are difficult to incorporate into the machine learning model training algorithm itself. There could also be a number of segments inherent within the data where it is important to have comparable accuracy across all segments. When toggling between different performance measures and goals, what the data scientist is really doing is executing a manual multiobjective optimization. Arguably, they are mentally constructing a Pareto front and choosing the model that achieves the best compromise for their use case and criteria.
It is considered fruitless to search for a single measure that perfectly captures the multiple dimensions of interest in machine learning as shown in Zitzler et. al [42] and paraphrased here:
Theorem 1.
In general, solution quality for the objective optimization problem cannot be reduced to less than performance measures.
To emphasize this observation, we include a hypothetical example. Consider Matthews Correlation Coefficient (MCC) [40] that is considered a good metric to quantify performance of the binary classification problem even when data is unbalanced:
Now suppose we were to apply single objective optimization and discover two models (model A and model B) with performances shown in Table I.
TP  FP  FN  TN  ACC  MCC  FPR  

model A  900  500  100  8500  94.0%  0.73  5.6% 
model B  350  100  650  8900  92.5%  0.49  1.1% 
With MCC as the single objective to be maximized , an optimization algorithm would discard model B in preference for model A. However, the choice of which model is better depends entirely on context. For instance, if this is a credit card fraud case, we might also be interested in reducing false positive rate (FPR) because false positives are very costly [2]. Thus, we would prefer to search around model B to attempt to improve MCC while trying to maintain FPR. However, with unconstrained single objective optimization, this preference is difficult to enforce during the optimization process.
One approach to addressing this problem is aggregating multiple objectives into a single objective, usually accomplished by some linear weighting of the objectives. The main disadvantage of this approach is that many separate optimizations with different weighting factors need to be performed to examine the tradeoffs among the objectives. Another popular approach is multiobjective optimization [24, 41], which generates diverse multiple Paretooptimal models to achieve a desired tradeoff among various performance metrics and goals. However, a potential drawback of pure multiobjective optimization is that the corresponding algorithms are designed to determine the entire Pareto front when, in practice, only part of the front may be desired. For example, if considering false negative rate and false positive rate together, the trivial models that predict always negative and always positive could be part of the Pareto front. It would be a waste of computational resources to train models to refine such regions of the Pareto front. Moreover, not all measures for assessing models can be easily formulated as objectives. Therefore, it can be very beneficial to guide model search to the desired area by using constraints.
In this work, we provide a constrained multiobjective optimization framework for automated machine learning. This framework is built on a suite of derivativefree search methods and supports multiple objectives and linear or nonlinear constraints. While the default search method works well in most settings, the hybrid framework is extensible so that other desirable search methods can be incorporated easily in such a way that computing resources are shared to minimize and exploit inherent load imbalance. Moreover, redundant evaluations are intercepted and handled seamlessly to avoid similar algorithms within the hybrid strategy from performing redundant work. The approach works well on standard benchmark problems and shows promising results on real world applications. Our main contributions in this work are:

To the best of our knowledge, this is the first general extensible constrained multiobjective optimization framework specifically designed for automated machine learning.

The Autotune framework embraces the nofreelunch theorem in that new and diverse search algorithms fit well in the existing framework and may be added in a collaborative rather than a competitive manner, permitting resource sharing and making completed evaluations available to all internal solvers that are capable of using them.

By supporting general constraints, we can aid users in focusing on specific segments of the Pareto front to save computational time from models that are of little interest to the user. Further, in certain cases the multiobjective problem is really a nonlinearly constrained problem in disguise; for example, one might wish only to optimize specificity and sensitivity while ensuring overall accuracy does not degrade beyond a given threshold. The Autotune framework offers this flexibility.
Ii Related work
claims that machine learning is inherently a multiobjective task and provides a compilation of various multiobjective applications including feature extraction, accuracy, interpretability, and ensemble generation. He et al.
[22]use reinforcement learning to balance the tradeoff between accuracy and compression of neural networks. The approach is sequential and not targeted toward the general multiobjective problem. Asgari et al.
[39]apply a specialized evolutionary algorithm to optimize parameters of an autoencoder with respect to the two objectives: reconstruction error and classification error. Loeckx
[28] stresses the need for multiobjective optimization in the context of machine learning applied to structural and energetic properties of models, emphasizing that such an approach provides a gateway to hierarchy and abstraction. A novel multiobjective evolutionary algorithm (ENORA) was created to search for and select the optimal feature subset in the context of a multiclass classification problem [31]. Shenfield and Rostami [37] apply an evolutionary algorithm that optimizes neural network weights, biases, and structures to simultaneously optimize both overall and individual class accuracy. In RapidMiner [34], an evolutionary framework is proposed where the user may manually design the evolutionary algorithm using drag and drop features.A significant body of multiobjective research has been proposed in the context of neural architecture search (NAS). To simultaneously optimize accuracy and inference speed, Kim et al. [26] propose a multiobjective approach where neural architectures are encoded using integer variables and optimized using a customized evolutionary algorithm. Elsken et al. [9] develop a novel evolutionary algorithm (LEMONADE) to optimize both accuracy and several model complexity measures including number of parameters. They propose a Lamarckian inheritance mechanism for warmstarting children networks with parent network predictive performance. Dong et al. [8] adopt progressive search to optimize for both devicerelated (inference speed and memory usage) and deviceagnostic objectives (accuracy and model size). DVOLVER [29], an evolutionary approach inspired by NSGAII [6]
, is created to find a family of convolutional neural networks with good accuracy and computational resource tradeoffs.
Multiobjective optimization in machine learning seems to favor evolutionary algorithms. However, there have been enhancements made to many other derivativefree optimization approaches that are appropriate and have complementary properties that, if combined, may create robust powerful hybrid approaches. The derivativefree optimization community has been successfully handling these scenarios in arguably similar if not identically complex and challenging conditions [6, 1, 3, 4]. For instance, inspired by directsearch methods, Custódio et al. [5] propose a novel algorithm called direct multisearch for optimization problems with multiple blackbox objectives. Deb and Sundar [7] combine a preference based strategy with an evolutionary multiobjective optimization methodology and demonstrate that a preferred set of solutions near a reference point can be found in parallel (instead of one solution).
Iii Constrained Multiobjective Optimization Framework
Autotune is designed specifically to tune the hyperparameters and architectures of various machine learning model types including decision trees, forests, gradient boosted trees, neural networks, support vector machines, factorization machines, Bayesian network classifiers, and more. The tuning process utilizes customizable, hybrid strategies of search methods and multilevel parallelism (for both training and tuning). In this work, we focus on the two key features of Autotune:
multiple objectives and constraints.The Autotune framework is shown in Figure 1. An extendable suite of search methods (also called solvers) is driven by the search manager that controls concurrent execution of the search methods. The search methods propose candidate configurations that are stored in a dedicated pool. New search methods can easily be added to the framework. The model evaluator utilizes a distributed computing system to train and evaluate candidate models. The search manager supervises the entire search and evaluation process and collects the best models found. The pseudocode in Algorithm 1 provides a highlevel algorithmic view of the Autotune framework.
Iiia DerivativeFree Optimization Strategy
Autotune is able to perform optimization of general nonlinear functions over both continuous and integer variables. The functions do not need to be expressed in analytic closed form (i.e., blackbox integration is supported), and they can be nonsmooth, discontinuous, and computationally expensive to evaluate. Problem types can be single objective or multiobjective. The system is designed to run in either single machine mode or distributed mode.
Because of the limited assumptions that are made about the objective and constraint functions, Autotune takes a parallel, hybrid, derivativefree approach similar to those used in Taddy et al. [38]; Plantenga [32]; Gray, Fowler, and Griffin [14]; Griffin and Kolda [19]. Derivativefree methods are effective whether or not derivatives are available, provided that the number of variables is not too large (Gray and Fowler [15]). As a rule of thumb, derivativefree algorithms are rarely applied to blackbox optimization problems that have more than 100 variables. The term “blackbox” emphasizes that the function is used only as a mapping operator and makes no implicit assumption about the structure of the functions themselves. In contrast, derivativebased algorithms commonly require the nonlinear objectives and constraints to be continuous and smooth and to have an exploitable analytic representation.
Autotune has the ability to simultaneously apply multiple instances of global and local search algorithms in parallel. This ability streamlines the process of needing to first apply a global algorithm in order to determine a good starting point to initialize a local algorithm. For example, if the problem is convex, a local algorithm should be sufficient, and the application of the global algorithm would create unnecessary overhead. If the problem instead has many local minima, failing to run a global search algorithm first could result in an inferior solution. Rather than attempting to guess which paradigm is best, the system simultaneously performs global and local searches while continuously sharing computational resources and function evaluations. The resulting run time and solution quality should be similar to having automatically selected the best global and local search combination, given a suitable number of threads and processors. Moreover, because information is shared among simultaneous searches, the robustness of this hybrid approach can be increased over other hybrid combinations that simply use the output of one algorithm to hot start the second algorithm.
Autotune handles integer and categorical variables by using strategies and concepts similar to those in Griffin et al.
[16]. This approach can be viewed as a genetic algorithm that includes an additional “growth” step, in which selected points from the population are allotted a small fraction of the total evaluation budget to improve their fitness score (that is, the objective function value) by using local optimization over the continuous variables.
Execution of the system is iterative in its processing, with each iteration repeating the following steps:

Acquire new points from the solvers

Evaluate each of those points by calling the appropriate blackbox functions (model training and validation)

Return the evaluated point values (model assessment metrics) back to the solvers
The search manager exchanges points with each solver in the list. During this exchange, the solver receives back all the points that were evaluated in the previous iteration. Based upon those evaluated point values, the solver generates a new set of points it wants evaluated and those new points get passed to the search manager to be submitted for evaluation. For any solvers capable of “cheating”, they may look at evaluated points that were submitted by a different solver. As a result, search methods can learn from each other, discover new opportunities, and increase the overall robustness of the system.
To best utilize computing resources, Autotune supports multiple levels of parallelization ran simultaneously:

Each evaluation can use multiple threads and multiple worker nodes, and

Multiple evaluations can run concurrently
Evaluation sessions can be configured to minimize the overlap of worker nodes but also allow resources to be shared. This design makes Autotune extremely powerful and capable of efficiently using compute grids of any size.
IiiB MultiObjective Optimization Approach
When attempting to find the best machine learning model, it is very common to have several objectives. For instance, we might want to build models that maximize accuracy while also minimizing model size so that the models can be deployed in mobile devices. The desired result for such problems is usually not a single solution but rather a range of solutions that we can use to identify an acceptable compromise. Ideally each solution represents a necessary compromise in the sense that no single objective can be improved without causing at least one remaining objective to deteriorate. The goal of Autotune in the multiobjective case is thus to provide to the decision maker a set of solutions that represent the continuum of bestcase scenarios.
Mathematically, we can define multiobjective optimization in terms of dominance and Pareto optimality. For a kobjective minimizing optimization problem, a point (solution) x is dominated by a point y if for all and for some .
A Pareto front contains only nondominated solutions. In Figure 2, a Pareto front is plotted with respect to minimization objectives and along with a corresponding population of 10 points that are plotted in the objective space. In this example, point dominates , dominates , dominates , and dominates . Although no other point in the population dominates point , it has not yet converged to the true Pareto front. Thus there are points in a neighborhood of that have smaller values of and that have not yet been identified.
In the constrained case, a point is dominated by a point if and , where denotes the maximum constraint violation at point and the feasibility tolerance is thus feasibility takes precedence over objective function values.
Unlike common multiobjective optimization approaches that solely use metaheuristics [9, 29, 37], the default approach employed by Autotune is a novel hybrid strategy that combines the global search emphasis of metaheuristic [12] with lesser known, but efficient, direct local search methods [18]. The hybrid search strategy begins by creating a Latin Hypercube Sampling (LHS) of the search space. This LHS is used as the starting point for a Genetic Algorithm (GA) to search the solution space for promising configurations. GA’s enable us to attack multiobjective problems directly in order to evolve a set of Paretooptimal solutions in one run of the optimization process instead of solving multiple separate problems. In addition, Autotune conducts local searches using a Generating Set Search (GSS) algorithm in neighborhoods around nondominated points to improve objective function values and reduce crowding distance. For measuring convergence, Autotune uses a variation of the averaged Hausdorff distance [36] that is extended for general constraints.
IiiC Constraint Handling
In realworld use cases, it is common to encounter constraints that impose limits on the predictive models being used. For example, consider the context of the Internet of Things (IoT). In the IoT setting, model size and inference speed are very important factors as models are typically deployed to edge computing devices. If a model requires too much memory for storage or is very slow to score, then it is not a good fit for edge computing. For mobile devices, models that need many computations during inference will consume too much power and should be avoided. For these examples, it can be extremely powerful to add constraints when picking a model. The constraints can be used to focus on the parts of the solution space that satisfy the business needs.
Autotune uses different strategies to handle different types of constraints. Linear constraints are handled by using both linear programming and strategies similar to those in
[17], where tangent directions to nearby constraints are constructed and used as search directions. In this case, trial points that violate the linear constraints are first projected back to the feasible region before being submitted for evaluation. Nonlinear constraints are handled by using smooth merit functions [20]. Nonlinear constraint violations are penalized with an L2norm penalty term that is added to the objective value. In the context of constrained multiobjective optimization, when comparing points for domination, a feasible point is always favored over an infeasible one.Iv Experimental Results
While Autotune is designed specifically for automatically finding good machine learning models, the optimization process that drives it is applicable to general optimization problems. Therefore, to evaluate the performance of Autotune and its effectiveness at solving multiobjective optimization problems, we conducted a benchmark experiment by applying the Autotune system to a set of common multiobjective optimization benchmark problems. We present a sampling of the results here for two of the benchmark problems: ZDT1 and ZDT3, taken from [41]. For both of these problems, the true Pareto front is known, which provides a basis for comparison.
The mathematical formulation for ZDT1 is:
ZDT1 is a multiobjective optimization problem with two objectives (, ) and 30 variables. Figure 2(a) shows Autotune’s results when running with a sufficiently large evaluation budget of 25,000 evaluations. The plot shows that Autotune has completely captured the true Pareto front and Autotune’s Pareto markers completely cover the true Pareto front. Many times in realworld use cases, evaluation budgets are limited due to time and cost. Figure 2(b) shows Autotune’s results when running with a limited evaluation budget of 5000 evaluations. In this case, we can see that Autotune’s approximation of the Pareto front isn’t nearly as complete, and there are significant gaps when running with the limited evaluation budget. Constraints can be added to the optimization to focus the search to a particular region of the solution space. To demonstrate the power of constraints in the Autotune multiobjective optimization framework, Figure 2(c) shows the results of rerunning Autotune against ZDT1, this time with a constraint specifying that . Again, Autotune was given a limited budget of 5000 evaluations. This plot clearly shows how adding the constraint has focused the optimization to that lowerright section of the solution space. This has allowed Autotune to capture a much better representation of the true Pareto front in that region where .
The mathematical formulation for ZDT3 is:
ZDT3 has two objectives (, ) and 30 variables. Figure 3(a) shows that Autotune is able to obtain the true Pareto front very well when given a sufficiently large evaluation budget of 25,000 objective evaluations. Figure 3(b) shows Autotune’s results when running with a limited evaluation budget of 5000 objective evaluations. Autotune struggles to find a complete representation of the Pareto front when limited to 5000 evaluations. In particular, the left side of the plot only shows a few Pareto points that were found by Autotune. Figure 3(c) shows the results with the same limited evaluation budget of 5000 objective evaluations but with an added constraint of . The plot clearly shows Autotune was able to do a much better job of representing the Pareto front in that area of the solution space.
This experiment demonstrates that Autotune correctly captures the Pareto fronts of the benchmark problems when given adequate evaluation budgets. By using constraints, Autotune is able to significantly improve the search efficiency by focusing on the regions of the solution space that we are interested in.
V Case Studies
The case study data sets are much larger real world machine learning applications, using multiobjective optimization to tune a high quality predictive model. The first data set comes from the Kaggle ‘Donors Choose’ challenge. The second data set is a sales leads data set. After a preliminary study of different model types, including logistic regression, decision trees, forests, and gradient boosted trees, the gradient boosted tree model type was selected for both case studies as the other model types all significantly underperformed. Table
II presents the tuning hyperparameters of gradient boosted tree, their ranges, and default values.Hyperparameter  Lower  Default  Upper 

Num Trees  100  100  500 
Num Vars to Try  1  all  all 
Learning Rate  0.01  0.1  1.0 
Sampling Rate  0.1  0.5  1.0 
Lasso  0.0  0.0  10.0 
Ridge  0.0  0.0  10.0 
Num Bins  20  20  50 
Maximum Levels  2  6  7 
For both studies, Autotune’s default hybrid strategy that combines a LHS as the initial population with the GA and GSS algorithms is used. The population size used is 50 and the maximum number of iterations is 20. The tuning process is executed on a compute cluster containing 100 worker nodes. Individual model training uses multiple worker nodes and multiple models are trained in parallel.
Va Donors Choose Data
This case study involves building a model using data from the website DonorsChoose.org. This is a charity organization that provides a platform for teachers to request materials for projects. The business objective is to identify projects that are likely to attract donations based on the historical success of previous projects. Since DonorsChoose.org receives hundreds of thousands of proposals each year, automating the screening process and providing consistent vetting with a machine learning model allows volunteers to spend more time interacting with teachers to help develop successful projects. Properly classifying whether or not a project is “exciting” is a primary objective, but an important component of that is to minimize the number of projects improperly classified as exciting (false positives). This ensures that valuable human resources are not wasted vetting projects that are likely to be unsuccessful.
The data includes 24 attributes describing the project, including:

the type of school (metro, charter, magnet, yearround, NLNS)

school state/region

average household income for the region

grade level, subject, and focus area for the project

teacher information

various aspects of project cost
The data set contains 620,672 proposal records, of which roughly 18% were ultimately considered worthy of a review by the volunteers. A binary variable labeling whether or not the project was ultimately considered “exciting” is used as the target for predictive modeling. The data set was partitioned into 70% for training (434,470) and 30% for validation (186,202) for tuning the gradient boosted tree predictive model.
As mentioned in the study data set description, using misclassification rate as a single objective is not sufficient, and a successful predictive model is expected to also minimize the false positive rate. This makes the multiobjective optimization approach well suited for the study, with both misclassification rate and false positive rate (FPR) as the two objectives. It is unlikely that using any one of the more traditional machine learning metrics for tuning the models would produce the desired results.
The default gradient boosted tree model uses the default hyperparameter configuration listed in Table II
. Its confusion matrix is shown in Table
III. The default model predicts 5,562 false positives, a significant amount. The FPR on the validation data set is 3.6%. The overall misclassification rate on the validation set is high, around 15%, and needs to be improved, ideally while also improving FPR.Target  Predicted False  Predicted True 

False  146,956  5,562 
True  22,963  10,721 
The multiobjective tuning results for the Donors Choose data set are shown in Figures 5 and 6. In Figure 5 the entire set of evaluated configurations is displayed, along with the default model and the generated Pareto front, trading off the minimization of misclassification on the xaxis and the minimization of the FPR on the yaxis. The entire cloud of points is split into two distinct branches, one branch trending towards a near zero FPR value, and another branch trending towards lower misclassification values, resulting in a split set of Pareto points. The default configuration appears to be a near equal compromise of the two objectives.
Several other tuning runs were executed with various traditional metrics (AUC, KS, MCE and F1) as a single objective. The best model configurations for each of the runs are superimposed on Figures 5 and 6. Nearly all of the single objective runs converged to similar values of misclassification and FPR. All of them sacrificed some FPR in the process, which is undesirable as defined by the conditions of this study.
While the near zero FPR values are appealing, the increase in the misclassification makes these configurations undesirable. It is more beneficial to look at models with both objectives reduced compared to the default model. Because of this, an additional tuning run was executed with an added constraint of misclassification <0.15. The Pareto points for this tuning run are shown in Figure 6. This figure shows a zoomedin area around the points of interest and one of the Pareto points selected as the ‘Best’ overall model. The confusion matrix for this ’Best’ model is shown in Table IV. The number of false positives reduced by 8% (461) compared to the default model but more importantly, the misclassification improved from 15% to 10%.
Target  Predicted False  Predicted True 

False  147,417  5,101 
True  13,650  20,034 
VB Sales Leads Data
Marketers often rely on machine learning models to accurately predict marketing actions and strategies that are most likely to succeed. In this case study, we use a data set collected by the marketing department at SAS Institute Inc. A key goal of this study is to provide the sales team of the company with an updated list of quality candidate leads. Supervised models are then built to identify and prioritize qualified leads across about 20 global regions. Machine learning qualifies leads by prioritizing known prospects and accounts based on their likelihood of acting.
The training data has about 200 candidate features through a fouryear window. Web traffic data is a key feature category that includes page counts for several company websites as well as the referrer domain. Customer experience data such as the number of whitepapers downloaded, webcasts watched, and live events attended is also captured. A text analytics tool is used to standardize new features such as job function and department. Marketing based on business rules and actual outcomes labels the binary target for model training. The nonevent (not a lead) is down sampled using stratified sampling to obtain a 10% target event rate. The data set contains 962,670 observations. For the tuning process, the observations were partitioned into 42% for training (404,297), 28% for validation (269,556), and 30% for test (288,817).
Purchase propensity models are very difficult to build due to the unbalanced nature of the training data. It is very important to deliver a scoring model that captures the event well yet minimizes false negatives so that sales opportunities are not overlooked. Typically with unbalanced data, overall misclassification rate is not the preferred measure of model quality. Here we investigate several model quality measures along with a multiobjective tuning strategy that incorporates both overall model accuracy and minimizing the false negative rate (FNR).
The confusion matrix for a default gradient boosted tree model is shown in Table V. The default model predicts many more false negatives than false positives which is opposite from the desired scenario in this case – only 31% of true positives are captured.
Target  Predicted False  Predicted True 

False  276,718  1,193 
True  7,542  3,364 
The multiobjective tuning results for the leads data set are shown in Figures 7 and 8. In Figure 7 the entire set of evaluated configurations is displayed, along with the default model and the generated Pareto front, trading off the minimization of misclassification on the xaxis and the minimization of the FNR on the yaxis. The majority of the cloud of evaluations perform better than the default model, with respect to both overall misclassification and FNR. The Pareto front represents a set of tradeoff solutions all of which are significantly better than the default model, cutting the FNR in half.
The Pareto front is shown in more detail in Figure 8. It can be seen more clearly that the solution generated by maximizing only KS for this unbalanced data set, given the same evaluation budget, underperforms relative to the Pareto front of solutions. The overall misclassification of this solution is similar to that of the highest misclassification solution on the Pareto front and the FNR is higher than that of all solutions on the Pareto front. When the misclassification is minimized as a single objective tuning effort the misclassification is similar to the lowest misclassification solution on the Pareto front, but the FNR is higher. In review of the Pareto front, it is clear that the range of misclassification of the solutions is relatively small. If it is desirable to trade some false positives for a reduction of false negatives, an increase of over 300 sales leads can be obtained by sacrificing just 0.05% in overall misclassification.
Constraints on both FNR and misclassification were applied in this problem in an attempt to identify more Pareto solutions with lower FNR. However, since the Pareto front is very narrow in this case study, with both objectives gravitating towards the lower left in the solution space, no additional preferred Pareto solutions were identified by adding constraints. With very little tradeoff between objectives observed after running multiobjective optimization, a final attempt to further reduce FNR is executed as a single objective constrained optimization problem. This result is shown in Figure 8 which shows that when minimizing FNR directly as a single objective, we do not achieve results as desirable as those that were found when executing the multiobjective tuning process. The solution with the lowest FNR was chosen as the ‘Best’ model and its confusion matrix is given in Table VI. The number of false negatives is reduced by 40% (3007), compared to the default model. The FNR is 0.4343 on the holdout test data; 56.6% of the true positive leads are captured, a significant improvement over 31% with the default model.
Target  Predicted False  Predicted True 

False  276,482  1,429 
True  4,535  6,371 
Vi Conclusions
Automation in machine learning improves model building efficiency and creates opportunities for more applications. This work extends the general framework Autotune by implementing two novel features: multiobjective optimization and constraints. With multiobjective optimization, instead of a single model, a set of models on a Pareto front are produced. Then, the preferred model can be selected by balancing different objectives. Adding constraints is also important in the model tuning process. Constraints provide a way to enforce business restrictions or improve the search efficiency by pruning parts of the solution search space. The numerical experiments on benchmark problems demonstrate the effectiveness of our implementation of multiobjective optimization and constraint handling. The two case studies we presented show Autotune’s ability to find models that appropriately balance multiple objectives while also adhering to constraints. Future work to enhance Autotune includes simplifying the user’s experience when choosing metrics for objectives and constraints.
References
 [1] (200802) Multiobjective optimization through a series of singleobjective formulations. SIAM J. on Optimization 19 (1), pp. 188–210. External Links: Document, ISSN 10526234 Cited by: §II.
 [2] Cited by: §I.
 [3] (2017) A multiobjective DIRECT algorithm towards structural damage identification with limited dynamic response information. CoRR. Cited by: §II.
 [4] (20181001) MultiGLODS: global and local multiobjective optimization using direct search. Journal of Global Optimization 72 (2), pp. 323–345. External Links: ISSN 15732916 Cited by: §II.
 [5] (2011) Direct multisearch for multiobjective optimization.. SIAM Journal on Optimization 21 (3), pp. 1109–1140. Cited by: §II.
 [6] (2000) A fast elitist nondominated sorting genetic algorithm for multiobjective optimization: nsgaii. In Parallel Problem Solving from Nature PPSN VI, M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. J. Merelo, and H. Schwefel (Eds.), Berlin, Heidelberg, pp. 849–858. External Links: ISBN 9783540453567 Cited by: §II, §II.

[7]
(2006)
Reference point based multiobjective optimization using evolutionary algorithms.
In
Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation
, GECCO ’06, New York, NY, USA, pp. 635–642. External Links: ISBN 1595931864 Cited by: §II.  [8] (2018) PPPnet: platformaware progressive search for paretooptimal neural architectures. Cited by: §II.
 [9] (2018)(Website) External Links: Link Cited by: §II, §IIIB.
 [10] (2009) An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30 (1), pp. 27–38. Cited by: §I.
 [11] (2015) Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 2962–2970. Cited by: §I.
 [12] (1989) Genetic algorithms in search, optimization and machine learning. 1st edition, AddisonWesley Longman Publishing Co., Inc.. External Links: ISBN 0201157675 Cited by: §IIIB.
 [13] Cited by: §I.
 [14] (2010) Hybrid optimization schemes for simulationbased problems. Procedia Computer Science 1, pp. 1349–1357. Cited by: §IIIA.
 [15] (2011) The effectiveness of derivativefree hybrid methods for blackbox optimization. International Journal of Mathematical Modeling and Numerical Optimization 2, pp. 112–133. Cited by: §IIIA.
 [16] (2011) Derivativefree optimization via evolutionary algorithms guiding local search (eagls) for minlp. Pacific Journal of Optimization 7, pp. 425–443. Cited by: §IIIA.
 [17] (2008) Asynchronous parallel generating set search for linearly constrained optimization. SIAM Journal on Scientific Computing 30, pp. 1892–1924. Cited by: §IIIC.
 [18] (2008) Asynchronous parallel generating set search for linearly constrained optimization. SIAM Journal on Scientific Computing 30, pp. 1892–1924.. Cited by: §IIIB.
 [19] (2010) Asynchronous parallel hybrid optimization combining direct and gss. Optimization Methods and Software 25, pp. 797–817. Cited by: §IIIA.

[20]
(2010)
Nonlinearly constrained optimization using heuristic penalty methods and asynchronous parallel generating set search
. Applied Mathematics Research Express 2010, pp. 36–62. Cited by: §IIIC.  [21] Cited by: §I.
 [22] (2018) ADC: automated deep compression and acceleration with reinforcement learning. CoRR abs/1802.03494. Cited by: §II.
 [23] (20180627)(Website) External Links: cs.LG/1806.10282 Cited by: §I.
 [24] (2008) Paretobased multiobjective machine learning: an overview and case studies. Trans. Sys. Man Cyber Part C 38 (3), pp. 397–415. Cited by: §I, §II.
 [25] Y. Jin (Ed.) (2006) Multiobjective machine learning. Studies in Computational Intelligence, Vol. 16, Springer. External Links: ISBN 9783540306764 Cited by: §II.
 [26] (2017) NEMO : neuroevolution with multiobjective optimization of deep neural network for speed and accuracy. Cited by: §II.
 [27] (2017) Autoweka 2.0: automatic model selection and hyperparameter optimization in weka. Journal of Machine Learning Research 18 (25), pp. 1–5. Cited by: §I.
 [28] (201506) Beyond mitchell: multiobjective machine learning – minimal entropy, energy and error. In 11th Metaheuristics International Conference (MIC), Agadir, Morocco, Cited by: §II.
 [29] (2019) DVOLVER: efficient paretooptimal neural network architecture search. Cited by: §II, §IIIB.
 [30] Cited by: §I.

[31]
(2016)
Big models for big data using multi objective averaged one dependence estimators
. CoRR abs/1610.07752. External Links: 1610.07752 Cited by: §II.  [32] (2009) HOPSPACK 2.0 user manual (v 2.0.2). Technical report Sandia National Laboratories. Cited by: §IIIA.
 [33] (200801) Evaluation: from precision, recall and ffactor to roc, informedness, markedness & correlation. Mach. Learn. Technol. 2. Cited by: §I.
 [34] Cited by: §II.
 [35] Cited by: §I.
 [36] (2012) Using the averaged hausdorff distance as a performance measure in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation 16, pp. 504–522. Cited by: §IIIB.
 [37] (2017) Multiobjective evolution of artificial neural networks in multiclass medical diagnosis problems with class imbalance. In CIBCB, pp. 1–8. Cited by: §II, §IIIB.
 [38] (2009) Bayesian guided pattern search for robust local optimization. Technometrics 51, pp. 389–401. Cited by: §IIIA.
 [39] (2017) Paretooptimal multiobjective dimensionality reduction deep autoencoder for mammography classification. Computer methods and programs in biomedicine 145, pp. 85–93. Cited by: §II.
 [40] (197511) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta 405, pp. 442–51. Cited by: §I.
 [41] (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation 8 (2), pp. 173–195. Cited by: §I, §IV.
 [42] (2002) Why quality assessment of multiobjective optimizers is difficult. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02, San Francisco, CA, USA, pp. 666–674. External Links: ISBN 1558608788 Cited by: §I.