Surrogate models were developed to resolve computational limitations in the analysis of massive datasets by replacing a resource-expensive procedure with a much cheaper approximation . They are especially useful in applications where numerous evaluations of an expensive procedure are required over the same or similar domains, e.g. in the parameter optimisation of a theoretical model. The term “metamodel” proves especially meaningful in this case, when the surrogate model approximates a computational process which is itself a model for a (perhaps unknown) physical process . There exists a spectrum between “physical” surrogates which are constructed with some contextual knowledge in hand, and “empirical” surrogates which are derived purely from samples of the underlying expensive model.
In this work, we develop an empirical surrogate model for the tritium breeding ratio (TBR) in an inertial confinement fusion (ICF) reactor. The expensive model that our surrogate model approximates is a Monte Carlo (MC) neutronics simulation, Paramak 
, which returns a prediction of the TBR for a given configuration of a spherical ICF reactor. Although more expensive 3D parametric models exist, we chose the Paramak simulation for its preferable speed in dataset generation in order to most fully demonstrate our methods.
Paramak facilitates simulation via OpenMC neutronics workflow that is enclosed in a portable Docker container, which conveniently exposes an HTTP API using the Python 3 flask package. For the purposes of our work, we used this setup to simulate a point source with a Muir energy distribution 111A bug in the Muir distribution involving erronous normal sampling was recently uncovered (see https://github.com/openmc-dev/openmc/pull/1670), but is disregarded in the present approach-focused work. around to approximate a Deuterium-Tritium (D-T) plasma neutron source. Illustrated in Figure 1, the simulated geometry was deliberately left adjustable, so that dependency of TBR on various parameters may be explored. Nuclear data for simulation was extracted from the following sources, in order of preference: Fendl 3.1d ; Jeff 3.3 ; and ENDF 7.1 
. To maintain model-agnostic approach, variance reduction (VR) techniques were not used to accelerate the MC neutronics simulation. It should be noted that depending on application, VR may constitute a viable alternative to the presented work.
For the remainder of Section 1, we will define the TBR and further motivate our research. In Section 2 we will present our methodologies for the comparison testing of a wide variety of surrogate modelling techniques, as well as a novel adaptive sampling procedure suited to this application. After delivering the results of these approaches in Section 3, we will give our final conclusions and recommendations in Section 4.
1.1 Problem Description
Nuclear fusion technology relies on the production and containment of an extremely hot and dense plasma containing enriched Hydrogen isotopes. The current frontier generation of fusion reactors, such as the Joint European Torus (JET) and the under-construction International Thermonuclear Experimental Reactor (ITER), make use of both tritium and deuterium fuel. While at least one deuterium atom occurs for every molecules of naturally-sourced water, and may be easily distilled, tritium is extremely rare in nature. It may be produced indirectly through irradiation of heavy water (D2O) during nuclear fission, but only at very low rates which could never sustain industrial-scale fusion power.
Instead, modern D-T reactors rely on tritium breeding blankets, specialized layers of material which partially line the reactor and produce tritium upon neutron bombardment, e.g. by
where T represents tritium and Li, Li are the more and less common isotopes of lithium, respectively. The TBR is defined as the ratio between tritium generation in the breeding blanket and consumption in the reactor, whose description in Paramak is facilitated by 2 classes of parameters (exhaustively listed in Table 1). While the geometry of a given reactor is described by continuous parameters, material selections are specified by discrete categorical parameters. For all parameters, we have attempted to cover the full theoretical range of values even where those values are practically infeasible with current technology (e.g. packing fractions close to 1). Simulating broadly around typical values of parameters is necessary to construct the most accurate model of the typical range, and also aids in demonstrating the robustness of constructed models.
In our work, we set out to produce a fast TBR function which takes these same input parameters and approximates the MC model used in Paramak with the greatest achievable regression performance, while also minimising the required quantity of expensive samples.
Parameter name Domain
Breeder fraction† Breeder Li enrichment fraction Breeder material Breeder packing fraction Coolant fraction† Coolant material Multiplier fraction† Multiplier material Multiplier packing fraction Structural fraction† Structural material Thickness
Armour fraction‡ Coolant fraction‡ Coolant material Structural fraction‡ Structural material Thickness
Labeling the expensive Paramak model , a surrogate is a mapping such that and minimize a selected dissimilarity metric. In order to be considered viable, is required to achieve expected evaluation time lower than that of . In this work, we consider 2 methods of producing viable surrogates: (1) a conventional decoupled approach, which evaluates
on a set of randomly sampled points and trains surrogates in a supervised scheme, and (2) an adaptive approach, which attempts to compensate for localized regression performance insufficiency’s by interleaving multiple epochs of sampling and training.
Surrogate family Abbr. Impl. Support vector machines  SVM SciKit  3 Gradient boosted trees [7, 8, 13] GBT SciKit 11 Extremely randomized trees  ERT SciKit 7
AdaBoosted decision trees
ABT SciKit 3 Gaussian process regression  GPR SciKit 2 nearest neighbours KNN SciKit 3 Artificial neural networks ANN Keras  2 Inverse distance weighing  IDW SMT  1 Radial basis functions RBF SMT 3
Note that ABTs can be viewed as a subclass of GBTs.
Regression performance metrics Notation Mathematical formulation Mean absolute error MAE Standard deviation of error Coefficient of determination Adjusted Computational complexity metrics Mean training time Mean prediction time Relative speedup
This corresponds to evaluation of Paramak on all points of the test set. In surrogates, the equivalent time period is referred to as “prediction time.”
For both methods, we selected state-of-the-art regression algorithms to perform surrogate training on sampled point sets. Listed in Table 2, these implementations define 9 surrogate families that are later reviewed in Section 3. We note that each presented algorithm defines hyperparameters that may influence its performance. Their problem-specific optimal values are searched within the scope of this work, in particular in Experiments 1 & 2 that are outlined in Section 2.1.
To compare the quality of the produced surrogates, we define a number of metrics listed in Table 3. For regression performance analysis, we include a selection of absolute metrics to assess the models’ approximation capability and set practical bounds on the expected uncertainty of their predictions. In addition, we also track relative measures that are better-suited for comparison between this work and others as they are invariant with respect to the selected domain and image space. For analysis of computational complexity, surrogates are assessed in terms of wall time (captured by the Python 3 time package). This is motivated by common practical use cases of our work, where models are trained and used as drop-in replacements for Paramak. All times reported (training, test, evaluation) are normalized by the corresponding dataset size, i.e. correspond to “time to process a single datapoint.”
Even though some surrogates support acceleration by means of parallelisation, we used non-parallelized implementations. The only exception to this are ANNs, which require a considerable amount of processing power for training on conventional CPU architectures. Lastly, to prevent undesirable bias by training set selection, all reported metrics are obtained via 5-fold cross-validation. In this setting, a sample set is uniformly divided into 5 disjoint folds, each of which is used as a test set for models trained on the remaining 4. Having repeated the same experiment for each division, the overall value of individual metrics is reported in terms of their mean and standard deviation over all folds.
2.1 Decoupled Approach
Experiments related to the decoupled approach are organized in 4 parts, further described in this section. In summary, we aim to optimize the hyperparameters of each surrogate family separately and later use the best found results to compare surrogate families among themselves.
The objective of Experiment 1 is to simplify the regression task for surrogates prone to suboptimal performance in discontinuous spaces. To this end, training points are filtered to a single selected discrete feature assignment, and surrogates are trained only on the remaining continuous features. This is repeated several times to explore variances in behaviour, particularly in 4 distinct assignments that are obtained by setting blanket and first wall coolant materials to one of: . Experiment 2 conventionally measures surrogate performance on the full feature space without any parameter restrictions. In both experiments, hyperparameter tuning is facilitated by Bayesian optimisation , where we select the hyperparameter configuration that produces the model that maximizes . The process is terminated after 1000 iterations or 2 days, whichever condition is satisfied first. The results of Experiments 1 & 2 are depicted in Figures 3 & 4 respectively, and described in Section 3.1.1.
In Experiment 3, the 20 best-performing hyperparameter configurations per each model family are used to train surrogates on sets of various sizes to investigate their scaling properties. In particular, we consider sets of sizes in thousands of samples to better characterize the relationship of each family between sample count and the metrics of interest listed in Table 3. The results of this experiment are shown in Figure 5 and discussed in Section 3.1.2.
2.2 Adaptive Approach
Adaptive sampling techniques appear frequently in the literature and have been specialized for surrogate modelling, where precision is implicitly limited by the quantity of training samples which are available from the expensive model. Garud’s 
“Smart Sampling Algorithm” achieved notable success by incorporating surrogate quality and crowding distance scoring to identify optimal new samples, but was only tested on a single-parameter domain. We theorized that a nondeterministic sample generation approach, built around Markov Chain Monte Carlo methods (MCMC), would fare better for high-dimensional models by more thoroughly exploring all local optima in the feature space. MCMC produces each sample points according to a shared proposal distribution from the prior point. These sample points will converge to a desired posterior distribution, so long as the acceptance probability meets certain statistical criteria (see for a review).
Many researchers have embedded surrogate methods into MCMC strategies for parameter optimisation [27, 12], in particular the ASMO-PODE algorithm  which makes use of MCMC-based adaptive sampling. Our approach draws inspiration from ASMO-PODE, but instead uses MCMC to generate samples which increase surrogate precision throughout the entire parameter space.
We designed the Quality-Adaptive Surrogate Sampling algorithm (QASS, Figure 2) to iteratively increment the training/test set with sample points which maximize surrogate error and minimize a crowding distance metric (CDM)  in feature space. On each iteration following an initial training of the surrogate on
uniformly random samples, the surrogate was trained and absolute error calculated. MCMC was then performed on the error function generated by performing nearest-neighbor interpolation on these test error points. The resultant samples were culled byaccording to the CDM, and then the highest-error candidates were selected for reintegration with the training/test set, beginning another training epoch. Validation was also performed during each iteration on independent, uniformly-random sample sets.
3.1 Decoupled Approach
3.1.1 Hyperparameter Tuning
The results displayed in Figure 3 indicate that in the first, simplified case GBTs clearly appear to be the most accurate as well as the fastest surrogate family in terms of mean prediction time. Following that, we note that ERTs, SVMs and ANNs also achieved satisfactory results with respect to both examined metrics. While the remainder of tested surrogate families does not exhibit prohibitive complexity, its regression performance falls below average.
Comparing these results with those of the second, unrestricted experiment (shown in Figure 4), we observe that many surrogate families consistently underperform. The least affected models appear to be GBTs, ANNs and ERTs, which are known to be capable of capturing relationships involving mixed feature types that were deliberately withheld in the first experiment. With only negligible differences, the first two of these families appear to be tied for the best performance as well as the shortest prediction time. We observe that ERTs and RBFs also demonstrated satisfactory results, clearly outperforming the remaining surrogates in terms of regression performance, and in some cases also in prediction time.
Following both hyperparameter tuning experiments, we conclude that while domain restrictions employed in the first case have proven effective in improving the regression performance of some methods, their performance fluctuates considerably depending on the selected slices. Furthermore, in all instances the best results are achieved by families of surrogates that do not benefit from this restriction (GBTs, ANNs, ERTs).
3.1.2 Scaling Benchmark
The results shown in Figure 5 suggest that in terms of regression performance the most accurate families from the previous experiments consistently maintain their relative advantage over others, even as more training points are introduced. While such families achieve nearly comparable performance on the largest dataset, in the opposite case tree-based ensemble approaches clearly outperform ANNs. This can be observed particularly on sets of sizes up to .
Consistent with our expectation, the shortest training times were achieved by instance-based learning methods (KNN, IDW) that are trained trivially at the expense of increased lookup complexity later during prediction. Furthermore, we observe that the majority of tree-based ensemble algorithms also perform and scale well, unlike RBFs and GPR which appear to behave superlinearly. We note that ANNs, which are the only family to utilize parallelisation during training, show an inverse scaling characteristic. We suspect that this effect may be caused by a constant multi-threading overhead that dominates the training process on relatively small sets.
Finally, all tested families with the exception of previously mentioned instance-based models offer desirable prediction times. Analogous to previous experiments, GBTs, ABTs and ANNs appear to be tied, as they not only exhibit comparable times but also similar scaling slopes. Following those, we note a clear hierarchy of ERTs, SVMs, GPR and RBFs, trailed by IDW and KNNs.
3.1.3 Model Comparison
In Experiment 4, we aim to create models that yield: (a) the best regression performance regardless of other features, (b) acceptable performance with the shortest mean prediction time, or (c) acceptable performance with the smallest training set. To this end, we trained 8 surrogates that are presented in Figure 6 and Table 4. We compared these surrogates with the baseline represented by Paramak per-sample evaluation time , which was measured earlier on a set of samples.
Having selected ANNs, GBTs, ERTs, RBFs and SVMs based on the results of Experiments 2 & 3, we utilized the best-performing hyperparameters. In pursuit of goal (a), the best approximator (Model 1, ANN) achieved and mean prediction time
. These correspond to a standard errorand a relative speedup with respect to Paramak. Satisfying goal (b), the fastest model (Model 2, ANN) achieved , , and . While these surrogates were trained on the entire available set of datapoints, to satisfy goal (c) we also trained a more simplified model (Model 4, GBT) that achieved , , and with a set of size only .
Regression performance Computational complexity Model MAE [TBR] [TBR] [rel.] [rel.]   [rel.] 1 (ANN) 2 (ANN) 3 (GBT) 4 (GBT) 5 (ERT) 6 (ERT) 7 (RBF) 8 (SVM)
Overall we found that due to their superior performance, boosted tree-based approaches seem to be advantageous for fast surrogate modelling on relatively small training sets (up to the order of ). Conversely, while neural networks perform poorly in such a setting, they dominate on larger training sets (at least of the order of ) both in terms of regression performance and mean prediction time.
3.2 Adaptive Approach
In order to test our QASS prototype, several functional toy theories for TBR were developed as alternatives to the expensive MC model. By far the most robust of these was the following sinusoidal theory over continuous parameters , with adjustable wavenumber parameter :
ANNs trained on this model demonstrated similar performance to those on the expensive MC model. QASS performance was verified by training an ANN on the sinusoidal theory for varied quantities of initial, incremental, and MCMC candidate samples.
An increase in initial samples with increment held constant had a strong impact on final surrogate precision, an early confirmation of basic functionality. An increase in MCMC candidate samples was seen to have a positive but very weak effect on final surrogate precision, suggesting that the runtime of MCMC on each iteration could be limited for increased efficiency. We also found that an optimum increment exists and is monotonic with initial sample quantity, above or below which models showed slower improvement on both the training and evaluation sets, and a larger minimum error on the evaluation set. This performance distinction will be far more significant for an expensive model such as Paramak, where the number of sample evaluations is the primary computational bottleneck.
A plateau effect in surrogate error on the evaluation set was universal to all configurations, and initially suspected to be a residual effect of retraining the same ANN instance without adjustment to data normalisation. A “Goldilocks scheme” for checking normalisation drift was implemented and tested, but did not affect QASS performance. Schemes in which the ANN is periodically retrained were also discarded, as the retention of network weights from one iteration to the next was demonstrated to greatly benefit QASS efficiency. Further insight came from direct comparison between QASS and a baseline scheme with uniformly random incremental samples, shown in Figure 7.
Such tests revealed that while QASS has unmatched performance on its own adaptively-sampled training set, it is outperformed by the baseline scheme on uniformly-random evaluation sets. We suspected that while QASS excels in learning the most strongly peaked regions of the TBR theory, this comes at the expense of precision in broader, smoother regions where uniformly random sampling suffices. Therefore a mixed scheme was implemented, with half MCMC samples and half uniformly random samples incremented on each iteration, which is also shown in Figure 7. An increase in initial sample size was observed to also resolve precision in these smooth regions of the toy theory, as the initial samples were obtained from a uniform random distribution. As shown in Figure 8, with initial samples it was possible to obtain a decrease in error as compared to the baseline scheme, from 0.0025 to 0.0015 mean averaged error. Comparing at the point of termination for QASS, this corresponds to a decrease in the number of total samples needed to train a surrogate with the same error.
We employed a broad spectrum of data analysis and machine learning techniques to develop fast and high-quality surrogates for the Paramak TBR model. After reviewing 9 surrogate model families, examining their behaviour on a constrained and unrestricted feature space, and studying their scaling properties, we retrained the best-performing instances to produce properties desirable for practical use. The fastest surrogate, an artificial neural network trained on datapoints, attained an with mean prediction time of , representing a relative speedup of with respect to Paramak. Furthermore, we demonstrated the possibility of achieving comparable results using only a training set of size .
We further developed a novel adaptive sampling algorithm, QASS, capable of interfacing with any of the presented surrogate models. Preliminary testing on a toy theory, qualitatively comparable to Paramak, demonstrated the effectiveness of QASS and behavioural trends consistent with the design of the algorithm. With initial samples and 100 incremental samples per iteration, QASS achieved a decrease in surrogate error compared to a baseline random sampling scheme. Further optimisation over the hyperparameter space has strong potential to increase this performance by further reduction of necessary expensive samples, in particular by decreasing the required quantity of initial samples. This will allow for future deployment of QASS on the Paramak TBR model in coalition with any of the most effective identified surrogates.
PM & GVG were supported the STFC UCL Centre for Doctoral Training in Data Intensive Science (grant no. ST/P006736/1).
This project was supported by the EU Horizon 2020 research & innovation programme [grant No 758892, ExoAI]. N. Nikolaou acknowledges the support of the NVIDIA Corporation’s GPU grant.
This work has been carried out within the framework of the EUROfusion consortium and has received funding from the Euratom research and training programme 2014-2018 and 2019-2020 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission.
This work has been partly funded by the Institutional support for the development of a research organization (DKRVO, Czech Republic).
This work has also been part-funded by the RCUK Energy Programme (grant number EP/I501045/1).
-  (2011) Announcement. Nuclear Data Sheets 112 (12), pp. ii. Note: Special Issue on ENDF/B-VII.1 Library External Links: Cited by: §1.
-  (2019) A python surrogate modeling framework with derivatives. Advances in Engineering Software, pp. 102662. External Links: Cited by: Table 2.
-  (2015) Keras. Note: https://keras.io Cited by: Table 2.
-  (1997) Improving regressors using boosting techniques. In ICML, Vol. 97, pp. 107–115. Cited by: Table 2.
-  (2008) LIBLINEAR: a library for large linear classification. Journal of machine learning research 9 (Aug), pp. 1871–1874. Cited by: Table 2.
-  FENDL-3.1d: Fusion Evaluated Nuclear Data Library Ver.3.1d. Note: https://www-nds.iaea.org/fendl/ Cited by: §1.
-  (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232. Cited by: Table 2.
-  (1999) Stochastic gradient boosting technical report. Stanford, CA: Department of Statistics, Stanford University. Cited by: Table 2.
-  (2016-10) Smart Sampling Algorithm for Surrogate Model Development. Computers & Chemical Engineering 96. External Links: Cited by: §2.2.
-  (2006) Extremely randomized trees. Machine learning 63 (1), pp. 3–42. Cited by: Table 2.
-  (2011) Application of the two-stage Markov chain Monte Carlo method for characterization of fractured reservoirs using a surrogate flow model. Computational Geosciences 15 (4), pp. 691. External Links: Cited by: §2.2.
An adaptive surrogate modeling-based sampling strategy for parameter optimization and distribution estimation (ASMO-PODE). Environmental Modelling & Software 95, pp. 61–75. External Links: Cited by: §2.2.
-  (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. Cited by: Table 2.
-  (2013) Variance reduction techniques in monte carlo methods. In Encyclopedia of Operations Research and Management Science, S. I. Gass and M. C. Fu (Eds.), pp. 1598–1610. External Links: Cited by: §1.
-  (2020-05) Surrogate Modelling of the Tritium Breeding Ratio. CDT DIS Note Department of Physics and Astronomy, University College London. Cited by: §4.
-  (2020) UCL TBR Group Project. External Links: Cited by: §4.
-  (1975) On bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference, pp. 400–404. Cited by: §2.1.
-  (2002) Response Surface Methodology: Product and Process Optimization Using Designed Experiments. 2nd edition, John Wiley & Sons, New York. Cited by: §1.
-  (2019) Muir energy spectrum. External Links: Cited by: §1.
-  (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: Table 2.
-  (1968) A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference, pp. 517–524. Cited by: Table 2.
-  (2012) Efficient MCMC for Climate Model Parameter Estimation: Parallel Adaptive Chains and Early Rejection. Bayesian Anal. 7 (3), pp. 715–736 (en). External Links: Cited by: §2.2.
-  (2003) Optimization Using Surrogate Models. Ph.D. Thesis, Technical University of Denmark. Cited by: §1.
-  The Joint Evaluated Fission and Fusion File (JEFF) version 3.3. Note: https://www.oecd-nea.org/dbdata/jeff/jeff33/ Cited by: §1.
-  (2020) Paramak. External Links: Cited by: §1.
-  (2006) Gaussian processes for machine learning. Vol. 2, MIT press Cambridge, MA. Cited by: Table 2.
-  (2020-01) Surrogate-Based Bayesian Inverse Modeling of the Hydrological System: An Adaptive Approach Considering Surrogate Approximation Error. Water Resources Research 56 (1), pp. e2019WR025721. Note: doi: 10.1029/2019WR025721 External Links: Cited by: §2.2.
-  (2018-09) An adaptive Kriging surrogate method for efficient joint estimation of hydraulic and biochemical parameters in reactive transport modeling. Journal of Contaminant Hydrology 216, pp. 50–57. External Links: Cited by: §2.2.