1 Introduction
Exploration of multiresponse physical/engineering systems with the objective of determining “good” points in the input space with desirable or targetted values of each output variable is typically a challenging problem. Such problems become even more complicated if the number of responses or outputs is large (possibly larger than the number of inputs), and certain combinations of inputs are “infeasible”, in the sense they do not produce any reasonable output. Consider a system with input variables and output variables . The problem is to determine “good” combinations of inputs that produce responses as close to some predefined “target” values as possible. We assume that the functional relationships for are in principal known, as is the case in computer experiments, but they may be expensive to compute. We will denote the vectors of the responses and targets by and respectively.
If , i.e., for a single response, this problem can be formulated as a response surface optimization problem Box and Draper (1987). When certain input combinations are infeasible, this becomes a response surface optimization problem with unknown constraints (associated with feasible regions). Such a problem has been addressed in engineering literature Henkenjohann et al. (2005). When , a widely used approach is to optimize some one dimensional loss function like the weighted squared error loss
(1)  
where are a set of weights associated with the responses Wu and Hamada (2009) and is a diagonal matrix with entries . Such a method can also be applied with unknown constraints (e.g. Henkenjohann and Kunert (2007) where ). However, one response may be closest to its target at a unique combination of inputs and farthest at a combination that is best for another response, leading to multiple local optima for the loss function.
We illustrate this situation with a toy example with . Let the input space be , and suppose the inputoutput relations are described by the following equations:
that is, the first response only depends on whereas the second response depends on both and . Let the target vector for be . Then for any weight vector , the loss function is minimized at two points: and where it attains value zero. Figure 1 shows the contour plot of the loss function with weights , in which the two optimum points are shown with diamond marks. However, as shown in the figure, loss function is virtually the same in the light blue band around these two optima  for example it is 0.0004 at . The yellow elliptic region near the bottom left corner shows the infeasible region, generated using the following logistic model:
and defining the binary feasibility variable (infeasible) or 1 (feasible) according as or .
Figure 1 makes it obvious that as increases, trying to formulate the problem as global optimization of the loss function may not be a great idea. Rather, the objective should be to detect several “good” points within the feasible input space by efficient search algorithms. This is particularly true for the material sciences application (explained in the next section) that motivated this research, where can be fairly large, ranging from 15 to 300, and is typically in the range of with strong correlation among groups of responses and unknown infeasible regions.
We propose a solution by developing a structured framework that aims at exploring the space of inputs and identifying points in the “best” regions where responses are reasonably close to their targets. The framework revolves around a smart experimental design strategy called the minimum energy design (MED) and supervised machine learning methods like classification. Unsupervised machine learning methods like clustering and simple data exploration and visualization techniques also form useful components of the framework. Such a combination, parallel to solving the engineering problem, facilitates important scientific understanding about the underlying process.
In the next section, we describe the motivating scientific problem. Section III describes the essential steps in the proposed framework assuming that the true input output relationships are known, or at least, can be simulated correctly whenever necessary. This assumption is true for all ReaxFF systems; however, in most cases the number of evaluations required to find a solution with the proposed approach may become prohibitive. In Section III, we also present some ideas to build surrogate models or “emulators” of the actual inputoutput functions, that can be used to overcome the problem of conducting a prohibitive number of expensive simulations. In Section IV we demonstrate the effectiveness of the proposed approach using a few simulation studies. Finally, we demonstrate applications of the proposed approach using two actual examples of the Molybdenum Disulfide (MoS) ReaxFF system and NickelChromium (NiCr) binary ReaxFF system in Section V, and present some conclusions and opportunities of future research in Section VI.
2 Motivating example: Optimization of ReaxFF systems
Recent advances in materials science have established ReaxFFbased atomistic simulations as a promising method for atomistic level investigations in materials science. The ReaxFF is a force field that incorporates complex functions with associated inputs in order to describe the inter and intraatomic interactions in materials systems. A typical ReaxFF force field consists of hundreds of parameters (inputs) per element type. During the development of a force field for a molecular system of interest, these parameters are optimized to reproduce reference values with reasonable accuracy. These reference values are molecular properties (e.g., bond lengths, bond angles, charges, and energies, etc.) of reference systems, also known as “gold standards”, obtained by quantum chemistry methods (e.g., Density Functional Theory (DFT)) or experiments. In our notation defined in Section 1, for a ReaxFF system ’s represent parameters, ’s represent the molecular properties and ’s represent the gold standards or targets.
The conventional optimization method that is widely used by the force field developers is a sequential one inputatattime parabolic extrapolation method Shchygol et al. (2019)
, which is not capable of switching between local minima to detect lowest loss regions in the input space. In contrast, this method is susceptible to being stuck in a local minimum, preventing parallelization of the optimization algorithm. Another big limitation of such onefactoratattime approaches is its failure to capture important interactions among inputs. Due to these limitations in the conventional method, considerable effort has been directed toward finding a solution to this optimization problem using heuristic methods that include simulated annealing and genetic algorithm
Iype et al. (2013); Larsson et al. (2013); Dittner et al. (2015) but none of them have been found efficient for exploring the parameter space comprehensively.The ReaxFF systems present several practical challenges to the exploration of complex input spaces in systems with multiple response. The true input output relationships are complex, typically involving systems of partial differential equations. Simulations for most ReaxFF systems are time consuming and expensive. The number of responses is large, making the loss function complex and multimodal. Another major complication arises from the fact that several parameter combinations do not produce meaningful outcomes, in the sense that either the simulation does not converge within the specified stopping time, or the results produce such a large discrepancy of the
’s from the ’s that they are not meaningful at all. For many ReaxFF systems, the percentage of such infeasible combinations is larger than the percentage of feasible ones, making the process of output data generation even more expensive.3 Essential steps involved in the proposed exploration approach
Space filling designs have found extensive use in efficient exploration of complex response surface and for identification of good regions in the input space. As explained in Joseph (2016), space filling designs focus on placing the design points in the experimental region in an intelligent manner so that there exist points everywhere in the experimental region with as few gaps or holes as possible. Therefore, ideally, if one can construct a very large spacefilling design to cover the entire design space, then it is possible to identify the good input regions. However, as the dimension of the input space increases, the total number of points required to cover the entire space become prohibitively large, and additional strategies are required to facilitate exploration. Thus, most strategies for exploring large and complex input spaces rely on a sequential strategy in which an initial exploration is done using a standard spacefilling design.
Our first step therefore is to create an initial spacefilling design of points that does an initial exploration of the input space. Each of the design points in this initial design is a combination of the inputs. Several of these points are expected to be infeasible, i.e., they do not produce any output. Let and denote the number of feasible and infeasible points respectively, such that . The feasible points generate an matrix of responses, where each column corresponds to an output .
The second step is to fit a classification model using the information on feasible and
infeasible points, that predicts the probability that a given combination of ReaxFF parameters will result in a feasible simulation output. The classification model can be parametric, e.g., logistic regression, or based on nonparametric or machine learning methods like random forests.
The third step is the most crucial in the exploration process and involves exploring the input space using the matrix of responses as the starting point, and finding the best regions with lowest loss making as few evaluations of new combinations as possible. This step will also incorporate the classification model fitted in the previous step. This will be done by application of a recently proposed design strategy called minimum energy designs (MED) Joseph et al. (2015).
As in the case of the ReaxFF problem, we assume that the true response functions are deterministic and known, but the feasibility region is unknown. However, with increase in the dimension of the input space , the actual number of evaluations of the true function required to obtain good results may be prohibitive. In such cases, one may need to substitute the simulator by a cheap or fast surrogate called the “emulator”. This is an optional fourth step in the optimization algorithm.
We now describe these four steps and illustrate each step using the toy example described in Section 1 and used to generate Figure 1.
3.1 Initial design and data generation
An initial space filling design is generated in which the levels of the input variables are simultaneously varied to produce different combinations. Simulations are conducted at each of these combinations and the outcome recorded. A dimensional input space can be explored using specific design strategies such as the Latin Hypercube sampling McKay et al. (1979). The design generated using such a sampling strategy is called a Latin hypercube design (LHD). However, as observed by several authors, an arbitrary LHD does not necessarily have good spacefilling properties and it is necessary to incorporate additional criteria like orthogonality of inputs and maximin distance (that maximizes the minimum distance between every pair of points in the design space). The initial design proposed for initial exploration of the input space is known as Orthogonalmaximin latin hypercube design (OMLHD) originally proposed in Joseph and Hung (2008)
. The OMLHD algorithm can generate parameter combinations within ranges specific to each parameter that are multidimensionally uniformly distributed by reducing the pairwise correlation and maximizing the distance between parameters.
The outcome of each simulation obtained from the initial OMLHD is recorded as follows: (i) a binary outcome variable taking values 0 and 1 according as whether the parameter combination is infeasible (does not produce any result) or feasible (produce results) and (ii) the values of the responses . The structure of the raw data matrix is shown in Table 1 in which the rows are arranged by values of without loss of generality. The data would thus consist of rows and columns ( input variables, responses, and one binary feasibility column). The response columns will be missing for the infeasible combinations, shown as rows in Table 1.
No  Inputs  Responses  Feasibility  

1  1  
1  
0  
0 
Figure 2 demonstrates initial exploration of the loss function in the toy example described in Section 1 using a 20point maximin LHD. The design generates points (90%) in the feasible region, which is consistent with the small size of the infeasible region. In our motivating example, this will typically not be the case, with the infeasible region often being larger than the feasible region making in most cases.
In this example, the initial design also finds four points in the desired region, and one of them is close to one of the two optima. Again, this will rarely be the case, because as the dimension of the input space increases, the volume of the desired region will typically be very small compared to the total input space.
3.2 Classification model
Having generated the data from the initial simulation experiment, the next task is to fit a classification model that predicts the feasibility of a combination of . Such a model is fit using the binary outcomes applying a suitable supervised classification algorithm. There are several classification algorithms in classical statistics and machine learning Hastie et al. (2009). Among these methods, logistic regression stands out as a versatile tool used in almost all scientific fields for several decades and is a natural candidate to be used. On the machine learning side, random forest classification Breiman (2001) has attained tremendous popularity in scientific and engineering applications in the recent years. In practice, multiple algorithms can be tried and the one with best outofsample prediction performance can be chosen. Such a strategy necessitates splitting the data matrix into training and testing sets, fitting models using the training set and comparing them on the basis of certain performance metrics using the testing set. Two performance measures – sensitivity and specificity – can be considered for comparing the approaches and identifying the best one. Let TP, TN, FP and FN denote the numbers of true positives (correct identification of feasibility), true negatives (correct identification of nonfeasibility), false positives (incorrect classification of a true infeasible point as feasible) and false negatives (incorrect classification of a true feasible point as infeasible). Then sensitivity is defined as the true positive rate measured as TP/(TP+FN), whereas specificity is defined as the true negative rate measured as TN/(FP+TN).
3.3 Finding points in the desired region using a Minimum Energy Algorithm
The primary objective of spacefilling experimental designs is to spread points uniformly over the input space. The foregoing discussion in Section 1 and Section 3.1
establishes that this is not the case in the current problem. We need a sequential design or active learning (
Sung and Niyogi (1995); Cohn et al. (1996)) strategy that helps avoid bad regions and generate more and more points from the desired region as the algorithm progresses. Thus the points sampled by the design should represent the response surface under exploration, which means more points should be generated in regions where is large and fewer points in regions where is small. In our case, because regions of lower loss are desirable, the response surface of interest can be taken as the inverse of the loss function in (1). Such an algorithm known as minimum energy designs (MED) was proposed by Joseph et al. (2015) Joseph et al. (2015). We now briefly describe the minimum energy algorithm and explain its usage in the context of our problem again using the toy example.Consider the problem of exploring a dimensional input space, where the range of each factor is scaled to . Thus the design space is . Let be the weight associated with the th design point . Visualize as a charged particle in the box and as the positive charge associated with it. Let denote the Euclidean distance between the points and . Then the design that minimizes the total potential energy for charged particles
is the MED. If the objective is to select samples that represent a positive response surface , the weight should be chosen as .
Based on the above idea, Joseph et al. (2019) Joseph et al. (2019) proposed an efficient algorithm for computation of MED that optimizes a variant of the above energy function, and selects points to represent a response surface . The algorithm requires the user to specify (i) an initial run design (ii) a function that computes the logarithm of (in our case, ) and (iii) the number of iterations . Each iteration, new design points are generated and new evaluations of the logarithm of are done at these points. Thus a total of evaluations of the function are necessary. This algorithm was implemented as the R package “mined” in Wang and Joseph (2018).
Direct application of the above algorithm to our problem, however, is not possible because even if the true response functions are known for , several points in the initial designs are likely to fall in the unknown infeasible region, not returning any value of the and consequently of . A naive way to avoid this problem is to use a modified response function that only considers the observed responses from the feasible input points, thereby automatically truncating the infeasible points. In the context of Table 1, this would essentially mean ignoring the lower half of the data matrix, and considering only with the points in the upper half consisting of the feasible points.
Noting that the infeasible region is essentially a “bad” region that the MED is trying to avoid, a better strategy is to modify the response function by “imputing” the missing responses with values that generates a large value of the loss function. Since to run the MED algorithm we need a function that returns a value of the one dimensional loss for a given input combination, we can simply replace the loss at each infeasible input combination by a value greater than or equal to the largest loss observed from the
feasible points in the initial design.We now illustrate this strategy again using the twodimensional toy example introduced in Section 1. Recall the 20point initial design shown in Figure 2, which generates two infeasible points for which the response values are missing. Among the remaining 18 feasible points, the point near the bottom right corner generates the worst response vector with the largest loss. Therefore, we impute the two missing response vectors with and use all the 20 points as inputs to the MED algorithm.
Figure 3 shows the points generated after 8 iterations of the MED algorithm with the contour of the loss function generated from the modified response function shown in the background. It is seen that the MED algorithm beautifully captures the best region, identifies the two optima, and also avoids the infeasible region. Note that one drawback of the R function “mined” is that it can produce points outside the input space in an attempt to broaden the search region. In this example, the algorithm ends up generating 12 points in the box that are shown in Figure 3.
Although in this particular implementation of MED we end up avoiding generating points in the feasible region, this is not guaranteed because of the MED algorithm’s inherent property of jumping out of good regions and exploring distant regions with uncertainties. Thus, we recommend using the classification model described in Section 3.2 to predict the feasibility outcome for each point generated by the MED as the last step.
This implementation of the MED algorithm requires evaluations of the response function. While this is quite reasonable for , the number of evaluations necessary will increase with the increase in . If the computation is expensive, one can use a surrogate model or emulator constructed from the initial design and use it as an approximation for the true simulator. We discuss a few strategies to fit such a surrogate model in the following subsection.
3.4 Surrogate model or emulator of the simulation model
Most methods for exploration of deterministic complex response surfaces use Gaussian process models as surrogates. However, while such models are flexible and capable of capturing complex nonlinear inputoutput relationships, even with a moderate number of predictors like 2025, they become computationally challenging due to identifiability issues associated with model parameters. This is the situation in our motivating problem. For example, the MoS ReaxFF system has 45 input parameters and 599 output properties.
On the other hand, a naive statistical approach may be to fit individual regression models of responses on the inputs, and then predicting the total error based on the individual predictions. However, such an approach is not wise, because there may exist strong correlations among several properties that may not be exploited in the process. Further, any reasonable regression model should at least consider secondorder terms, i.e., square terms and pairwise interactions among the parameters. For the MoS
system, inclusion of the second order terms takes the total number of predictors to 1080, larger than the number of feasible data points obtained in the training data set. This makes linear regression an impossible proposition, although penalized regression methods like Lasso
Tibshirani (1996) can be used. Finally, independent prediction of each individual response add up the individual noises or modelfitting errors associated with each fit, resulting in a large prediction error for a utility function that combines individual discrepancies.Another popular choice is to fit a multiresponse machine learning model such as a deep learning model and use it as a surrogate. While in recent years deep learning
LeCun et al. (2015) has emerged as a popular tool for modeling multipleinput multipleoutput data, there have also been several criticisms of the approach. First, as noted by Marcus (2018) “deep learning currently lacks a mechanism for learning abstractions through explicit, verbal definition, and works best when there are thousands, millions or even billions of training examples” and “is not an ideal solution in problems where data are limited”. Second, its relative opacity has been met with criticism Ribeiro et al. (2016). Third, like several machine learning algorithms, tuning deep learning is a not straightforward.For MoS ReaxFF system, we used a combination of simple exploratory techniques for dimension reduction of the response vector and a sequentially fit clustered penalized linear regression that involves first and second order terms of input variables to obtain a surrogate model. For NiCr ReaxFF system, we fit a deep learning model as the surrogate. Applications of this approach will be demonstrated in Section 5 and details are provided in the Appendix.
Let , denote the predictor of and denote the predictor of , the feasibility indicator at input combination . Then the combined predictor of the loss function at input combination is given by
(2) 
where is a number at least as large as the maximum observed loss.
4 Simulations
We now examine the effectiveness of the proposed approach using simulations from a multiresponse system. We use a slightly modified version of the DTLZ2 function, a popular test function in multiobjective problems Deb (1999); Huband et al. (2006). The advantage of this function is its flexibility  it can be extended to any input and output dimensions and , and adjusted to create a suitable example for our problem. We first consider the case of , defining the response functions as
where , . The domain of the functions is for , making the input space is . We also assume that the infeasible region is , where , , , . We set the target vector and the weight vector .
The MED algorithm is applied to this problem with initial design sizes and . Figure 4 shows a comparison of the loss function evaluated at the points in the initial design (a 100point maximim LHD) and the MED. Clearly, the MED substantially shrinks the distribution of the loss. The best point identified by the initial design is (0.4961, 0.8342, 0.3506, 0.4381), and it produces a response vector (0.4755, 0.2920, 0.5624, 0.8219) leading to a loss of 0.2407. The best point identified by the MED is (0.0958, 0.1688, 0.3319, 0.5227), resulting in a response vector (0.7086, 0.4070, 0.8338, 0.5663) and a loss of 0.1098.
Figure 5 shows the twodimensional contour plot of the loss function (1) against and , setting at their best setting (0.3319, 0.5227) identified by the MED. The figure shows how the MED beautifully identifies the two good regions.
To confirm that the performance of the MED is not a onetime phenomenon and is robust to the choice of a reasonable spacefilling initial design, we repeated the simulations 100 times, and the results are summarized in Figure 6
. The Table compares (i) the difference between the minimum losses identified by the initial design and the MED, (ii) the difference between the medium losses identified by the initial design and the MED and (iii) the ratio of the standard deviations of the losses identified by the initial design and the MED. Out of 100 repetitions of the simulation, in 72 cases the MED identified a point with minimum loss lower than the one identified by the initial design, and in 25 cases the point with the minimum loss identified by both designs were the same. Only on five occasions, the best point identified by the MED produced a marginally smaller loss than the best point identified by the initial design.
The simulations with the DTLZ2 function described above were also conducted by increasing the input dimension to 10, 15 and 20. The output dimension was kept the same as the input dimension. In each case, the size of the initial design was varied from 50 to 200 in steps of 50. In almost all of these cases, the proposed strategy was able to locate points that were better than the best ones identified by the initial design. The computation times associated with each such simulation settings were also explored, and a representative set of times are shown in Figure 7. As seen from the figure, the computation time taken by the MED algorithm with a 20dimensional DTZL2 function with an initial design of size 200 is approximately 200 seconds.
5 Applications
We now evaluate the proposed approach with MoS and NiCr ReaxFF systems. Different surrogate models were constructed in these two examples, follow by MED algorithm to generate new points.
5.1 Application to the MoS ReaxFF system
We now demonstrate the proposed approach using the MoS ReaxFF system, which has 45 input parameters (s) and 599 material properties or responses denoted by . Using the OMLHD design algorithm described in Section 3.1, 5000 different combinations of the 45 inputs were obtained. The spacefilling property of the OMLHD is illustrated in Figure 8 where 2D scatter plots of four pairs of parameters in the feasible design space are displayed.
Out of the simulation results produced by these 5000 combinations, 3846 were found infeasible and only the remaining 1154 simulations produced results, generating values of 599 different material characteristics. The outcome of each simulation was recorded in the format shown in Table 1 with , and . The distribution of the loss function computed from these 1154 points is summarized in Table 2.
min  median  mean  max 

96459  2606367  17268394  848122062 
The next step was to fit the classification model to predict the binary outcome variable from the variables. The data matrix (45 ReaxFF parameters and feasibility outcome , ignoring the data on responses ’s) was split into a training set consisting of 4000 data points and a testing set consisting of 1000 data points. The models fit with the training data were compared on the basis of their sensitivities and specificities (defined in Section 3.2) computed from the testing data.
A logistic regression classifier that included linear terms of all ReaxFF predictors resulted in a sensitivity of 0.6422 and specificity of 0.9296 based on the testing data. The performance of the random forest classifier, like several other machine learning algorithms, depends on the choice of the tuning parameters. Two of the most important tuning parameters are NTREE (the number of trees to grow) and MTRY (the number of variables that should be selected at a node split). The random forest classifier was fit with the training data with several combinations of NTREE and MTRY, and the performances are summarized in Table
3.Tuning parameters  Performance  

NTREE  MTRY  Sensitivity  Specificity 
200  5  0.3088  0.9837 
200  10  0.4706  0.9611 
200  15  0.5245  0.9497 
200  20  0.5147  0.9447 
200  25  0.5686  0.9372 
400  5  0.3039  0.9925 
400  10  0.4510  0.9686 
400  15  0.5245  0.9497 
400  20  0.5441  0.9410 
400  25  0.5343  0.9384 
600  5  0.2990  0.9899 
600  10  0.4461  0.9673 
600  15  0.5294  0.9472 
600  20  0.5245  0.9422 
600  25  0.5343  0.9397 
800  5  0.2843  0.9962 
800  10  0.4363  0.9673 
800  15  0.5098  0.9497 
800  20  0.5245  0.9435 
800  25  0.5490  0.9384 
1000  5  0.2941  0.9925 
1000  10  0.4461  0.9673 
1000  15  0.5000  0.9510 
1000  20  0.5392  0.9410 
1000  25  0.5441  0.9384 
It appears from Table 3 that in terms of predicting the feasibility of ReaxFF parameter combinations, the logistic regression based classifier outperforms the random forest classifier with respect to sensitivity, and is marginally inferior with respect to specificity. Considering the role of this classifier in predicting whether a promising new combination of ReaxFF parameters should be added to the exploration space, a higher true positive rate is possibly more important than achieving a higher true negative rate. This is because, a positive outcome (feasible combination) leads to a successful simulation and generates values of material characteristics, whereas a negative outcome or infeasible point does not add anything to the existing knowledge. From this perspective, the logistic regression based classifier was chosen over the random forest classifier.
As a precursor to fitting the surrogate model to predict ’s from the ’s, an exploratory analysis, following the guidelines provided in Appendix Exploratory analysis, dimension reduction and clustering of responses, was conducted with the matrix of s by summarizing the scaled individual errors (SIE) for and in an attempt to identify the ones that have major contributions to the total discrepancy. A graphical summary of the means and standard deviations of SIEs of the 599 responses (calculated from 1154 observations) is shown in Figure 9.
Figure 9 revealed a very interesting aspect  several (in fact 190 out of 599) of the responses appeared to have negligibly small standard deviations in the generated data, meaning that they remained more or less constant over the 1154 feasible ReaxFF parameter combinations in the initial design. Clearly prediction models involving such properties are meaningless. Interestingly, most of these 190 properties also had their means very close to the gold standards, meaning they could not be improved further. On the other extreme, there were some responses that varied widely across the parameter settings and were therefore potentially interesting candidates for model fitting. As proposed in Appendix Exploratory analysis, dimension reduction and clustering of responses, using the measure given by (3), 37 properties were identified as the top contributors to the total squared SIE . These 37 properties contributed to 96.7% of the total SIE in the generated data, i.e., where represents the set of indices of these properties. These 37 properties are represented by signs in both panels of Figure 9.
The 37 shortlisted properties were grouped into five clusters on the basis of correlations among the transformed responses , i.e, the logarithms of absolute individual differences from the target. The logic behind transforming the responses is explained in Appendix A clusterbased sequential penalized regression model. Figure 10 shows the heatmap of correlations.
Finally, the predictor of the loss function at a new input combination was obtained along the lines of (2) and (4). After running the MED algorithm with this predicted loss function, it was able to identify one point with a loss of approximately 80,000, providing a substantial (17%) improvement over the best point obtained from the initial 1154 points.
5.2 Application to the NiCr ReaxFF system
We now illustrate the proposed approach using another ReaxFF system NiCr, which has 16 input parameters and 90 material properties denoted by . The main difference in this application from the previous example of the system was the surrogate modeling approach. Unlike the clusterbased sequential penalized regression model used for exploring the system, we decided to fit a deep learning (DL) model as a surrogate. Fitting a DL model typically entails generating a large number of training data points. Therefore, using the OMLHD design algorithm described earlier, 79635 different combinations of the 16 inputs were obtained, and ReaxFF simulations were conducted with these combinations. Out of these 79635 combinations, 4999 were found infeasible. The rest produced meaningful values of the responses. The distribution of the loss function obtained from these points is summarized in Table 4.
min  median  mean  max 

2812  2416505  63081433 
The data obtained from the feasible region were split into two parts  a training set comprising of the points, and a testing set consisting of the remaining 20%. The training data were used to fit the DL model with two hidden layers and 90 nodes and the model was tested using mean absolute error (MAE) as the measure of prediction accuracy. Details on fitting the DL algorithm can be found in Sengul et al. (2020).
The trained DL model permitted prediction of material properties at any new input combination . Consequently, the predictor of was obtained along the lines of (2) and (4). The initial 200run design for MED was chosen from a region centered around the parameter combination with the smallest loss obtained from the initial design points. After running the MED algorithm with the surrogate loss function based on the DL model, several points with losses smaller than 2812 were identified. The best point with a loss 2316.17 provided a substantial () improvement over the best point obtained from the initial 79635 points. Our approach produced a significant improvement over the conventional method, which was not able to find input points with loss lower than 5614.26. In addition, it also stood out in terms of computation time by producing several good parameter combinations, including the optimum, in about two minutes compared to two weeks taken by an experienced force field developer using the conventional method.
6 Concluding remarks
In this article, we have proposed a framework for finding good points in an input space that produce a vector of responses close to their targets, where the “goodness” or “closeness” is defined in terms of a onedimensional loss function. This problem is different from traditional global optimization because of the possible existence of several good regions with almost similar values of the loss function, and there are large unknown input regions that are infeasible in the sense that they do not produce any output. The responses are assumed to be deterministic, but can be extended to accomodate noise. The problem is motivated by and applied to the ReaxFF optimization systems in physics, and provide initial results that are encouraging. Two key elements of the proposed approach are classifying the input space into feasible and infeasible regions and intelligently sampling points from the good regions using a method called minimum energy designs.
Another popular approach to define “goodness” of points in multiresponse systems is to find solutions that are Pareto optimal Lu et al. (2011); Lu and AndersonCook (2014); Lu et al. (2014); Chen et al. (2018), i.e., solutions that cannot be improved so as to make any one response closer to the target without making at least another response move farther away from its target. However, finding Pareto optimal solutions with unknown constraints, and input and output dimensions as high as what is encountered in a typical ReaxFF problem is rarely addressed. This can be an interesting direction for future research.
Acknowledgment
This work was supported by NSF NSF EAGER Grant No. DMR 1842952, DMR 1942922 and MRI 1626251.
Appendix
Exploratory analysis, dimension reduction and clustering of responses
Recall that our objective is to obtain input combinations that produce low discrepancies of responses from respective targets, i.e., for . We first take a close look at the matrix of scaled individual errors (SIE) , for and , where ’s denote weights, in an attempt to identify the ones that have major contributions to the total discrepancy. Plots of summary statistics of the SIE’s, i.e., their sample means , sample standard deviations for
or sample quantiles provide useful information on the distributions of SIEs for each response.
Based on such exploratory analysis, one can choose a subset of output characteristics that contributes to at least an aimed proportion of the total squared SIE as a criterion. That is, we can choose a subset of the set of indices satisfying , where
(3) 
represents the proportion of total squared SIE from all responses explained by the responses for in the data generated and is a chosen threshold, which could be 0.80 or 0.95, for example.
A clusterbased sequential penalized regression model
We now consider prediction of the responses for , where is the set of responses identified in the previous section. Let denote the cardinality of . A naive strategy is to fit independent regression models of each of the responses on the input variables . However, such a strategy does not take into account potential correlations present within the properties. We propose fitting a clusterbased penalized regression model to address this problem. Instead of modeling the responses , specially for the ReaxFF application, we propose modeling the transformed responses (TR) where is the target for and is the weight defined earlier. This transformation is justified by the fact that it is the absolute difference between the response and target that needs to be made small. Further, in the context of the ReaxFF problem, for most of the responses the distribution of the absolute individual error
appeared to have moderate to heavy skewness, that could be corrected using the log transformation.
The first task is to cluster the TR variables into clusters based on their correlation matrix. This can be done by using any clustering package in a standard statistical computing. Let denote the number of clusters and the number of TR variables in clusters respectively.
Let denote the TRs for cluster . We use the following algorithm to fit a set of predictor models for .

For each TR
in the cluster, using a Lasso regression
Tibshirani (1996), select significant predictors in the penalized linear model of on all first and second order terms, i.e., , and for . 
From these models, select the one with the maximum predictive power (determined using outofsample meansquared error, cross validation methods or adjusted ). This becomes the baseline model of the cluster, interpreted as the one with maximum predictive power solely based on the input variables . Let denote the TR in the baseline model, and we call the baseline TR in cluster . Let represent the baseline model.

Now consider the prediction of the remaining TRs in this cluster. Pick the TR, say , that can be best predicted using the predictors chosen by Lasso in step (i), and the baseline TR already identified in step (ii). Then update the model for by including the baseline TR as a predictor in addition to the terms if it satisfies a prespecified inclusion rule (like achieving a threshold improvement in outofsample error or adjusted ). Let denote this model.

Repeat step (iii) sequentially for the remaining TRs in the cluster, and include TRs from the previous steps to predict them in addition to the ’s if the inclusion rule is satisfied.
The above procedure is repeated for each cluster. Thus we have, for each cluster, a collection of models that predict each TR in that cluster. Let denote the predicted TR for the th response of cluster . Then the surrogate model for the loss function at input combination is
(4) 
where is a very large number and the predicted binary response outcome from the classification model.
References
 Box and Draper (1987) Box, G. E. P. and Draper, N. R. (1987). Empirical ModelBuilding and Response Surfaces. Wiley, New York, NY.
 Breiman (2001) Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
 Chen et al. (2018) Chen, P., Santner, T., and Dean, A. (2018). Sequential pareto minimization of physical systems using calibrated computer simulators. Statistica Sinica, 28:671–692.

Cohn et al. (1996)
Cohn, D., Ghahramani, Z., and Jordan, M. (1996).
Active learning with statistical models.
Journal of Artificial Intelligence Research
, page 129–145.  Deb (1999) Deb, K. (1999). Multiobjective genetic algorithms: Problem difficulties and construction of test problems. Evolutionary Computation, 7:205–230.
 Dittner et al. (2015) Dittner, M., Muller, J., Aktulga, H. M., and Hartke, B. (2015). Efficient global optimization of reactive forcefield parameters. J Comput Chem, 36:1550–1561.
 Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
 Henkenjohann et al. (2005) Henkenjohann, N., Gobel, R., Kleiner, M., and Kunert, J. (2005). An adaptive sequential procedure for efficient optimization of the sheet metal spinning process. Quality and Reliability Engineering International, 21:439–455.
 Henkenjohann and Kunert (2007) Henkenjohann, N. and Kunert, J. (2007). An efficient sequential optimization approach based on the multivariate expected improvement criterion. Quality Engineering, 19:267–280.
 Huband et al. (2006) Huband, S., Hingston, P., Barone, L., and While, L. (2006). A review of multiobjective test problems and a scalable test problem toolkit. IEEE Transactions on Evolutionary Computation, 10:477–506.
 Iype et al. (2013) Iype, E., Hutter, M., Jansen, A. P., Nedea, S. V., and Rindt, C. C. (2013). Parameterization of a reactive force field using a monte carlo algorithm. J Comput Chem, 34:1143–1154.
 Joseph (2016) Joseph, V. R. (2016). Spacefilling designs for computer experiments: A review. Quality Engineering, 28:28–35.
 Joseph et al. (2015) Joseph, V. R., Dasgupta, T., Tuo, R., and Wu, C. F. J. (2015). Sequential exploration of complex surfaces using minimum energy designs. Technometrics, 57:64–74.
 Joseph and Hung (2008) Joseph, V. R. and Hung, Y. (2008). Orthogonal maximin latin hypercube designs. Statistica Sinica, 18:171–186.
 Joseph et al. (2019) Joseph, V. R., Wang, D., Gu, L., Lv, S., and Tuo, R. (2019). Deterministic sampling of expensive posteriors using minimum energy designs. Technometrics, 61:297–308.
 Larsson et al. (2013) Larsson, H. R., van Duin, A. C., and Hartke, B. (2013). Global optimization of parameters in the reactive force field reaxff for sioh. J Comput Chem, 34:2178–2189.
 LeCun et al. (2015) LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521:436–444.
 Lu et al. (2014) Lu, L., AndersonCook, C., and Lin, D. (2014). Optimal designed experiments using a Pareto front search for focused preference of multiple objectives. Computational Statistics and Data Analysis, 30:1178–1192.
 Lu and AndersonCook (2014) Lu, L. and AndersonCook, C. M. (2014). Balancing multiple criteria incorporating cost using Pareto front optimization for splitplot designed experiments. Quality and Reliability Engineering International, 30:37–55.
 Lu et al. (2011) Lu, L., AndersonCook, C. M., and Robinson, T. J. (2011). Optimization of designed experiments based on multiple criteria utilizing a Pareto frontier. Technometrics, 53:353–365.
 Marcus (2018) Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631, 2018  arxiv.org, 36.
 McKay et al. (1979) McKay, M., Beckman, R., and Conover, W. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21:239–245.
 Ribeiro et al. (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should I trust you? : Explaining the predictions of any classifier. KDD ’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144.
 Sengul et al. (2020) Sengul, M., Nayir, N., Gao, Y., Hung, Y., Dasgupta, T., and van Duin, A. (2020). An initial designenhanced deep learningbased optimization framework to parameterize multicomponent reaxff force fields. Technical report, chemrxiv.

Shchygol et al. (2019)
Shchygol, G., Yakovlev, A., Trnka, T., van Duin, A. C. T., and Verstraelen, T.
(2019).
Parameter optimization with montecarlo and evolutionary algorithms: Guidelines and insights.
J Chem Theory Comput, 15:6799–6812.  Sung and Niyogi (1995) Sung, K. and Niyogi, P. (1995). Active learning for function approximation. Proc. Advances in Neural Information Processing Systems, page 7.
 Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58:267–288.
 Wang and Joseph (2018) Wang, D. and Joseph, V. R. (2018). mined: Minimum Energy Designs. R package version 1.02 — For new features, see the ’Changelog’ file (in the package source).
 Wu and Hamada (2009) Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis and Optimization. Wiley, Hoboken, NJ.
Comments
There are no comments yet.