# Neural Networks for Full Phase-space Reweighting and Parameter Tuning

Precise scientific analysis in collider-based particle physics is possible because of complex simulations that connect fundamental theories to observable quantities. The significant computational cost of these programs limits the scope, precision, and accuracy of Standard Model measurements and searches for new phenomena. We therefore introduce Deep neural networks using Classification for Tuning and Reweighting (DCTR), a neural network-based approach to reweight and fit simulations using the full phase space. DCTR can perform tasks that are currently not possible with existing methods, such as estimating non-perturbative fragmentation uncertainties. The core idea behind the new approach is to exploit powerful high-dimensional classifiers to reweight phase space as well as to identify the best parameters for describing data. Numerical examples from e^+e^-→jets demonstrate the fidelity of these methods for simulation parameters that have a big and broad impact on phase space as well as those that have a minimal and/or localized impact. The high fidelity of the full phase-space reweighting enables a new paradigm for simulations, parameter tuning, and model systematic uncertainties across particle physics and possibly beyond.

• 10 publications
• 30 publications
03/02/2022

### Transfer Learning of High-Fidelity Opacity Spectra in Autoencoders and Surrogate Models

Simulations of high energy density physics are expensive, largely in par...
03/16/2020

### Optimal statistical inference in the presence of systematic uncertainties using neural network optimization based on binned Poisson likelihoods with nuisance parameters

Data analysis in science, e.g., high-energy particle physics, is often s...
09/03/2020

### DCTRGAN: Improving the Precision of Generative Models with Reweighting

Significant advances in deep learning have led to more widely used and p...
10/26/2018

### Neural Network-Based Approach to Phase Space Integration

Monte Carlo methods are widely used in particle physics to integrate and...
06/25/2021

### Multifidelity Modeling for Physics-Informed Neural Networks (PINNs)

Multifidelity simulation methodologies are often used in an attempt to j...
05/10/2021

### Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

A common setting for scientific inference is the ability to sample from ...
06/08/2017

### Deep-Learning the Landscape

We propose a paradigm to deep-learn the ever-expanding databases which h...

## Appendix A Optimal Functions

The results presented here can be found (as exercises) in textbooks, but are repeated here for easy access. Let be some discriminating features and

is another random variable representing class membership. Consider the general problem of minimizing some average loss for the function

:

 f=argminf′E[loss(f′(X),Y)], (5)

where means ‘expected value’, i.e. average value or mean (sometimes represented as ). The expectation values are performed over the joint probability density of . One can rewrite Eq. 5 as

 f=argminf′E[E[loss(f′(X),Y)|X]]. (6)

The advantage222The derivation below for the mean-squared error was partially inspired by Appendix A in Ref. Cranmer et al. (2015). of writing the loss as in Eq. 6 is that one can see that it is sufficient to minimize the function (and not functional) for all . To see this, let and suppose that is a function with a strictly smaller loss in Eq. 5 than . Since the average loss for is below that of , by the intermediate value theorem, there must be an for which the average loss for is below that of , contradicting the construction of .

Now, consider the case where the loss is cross-entropy:

 maxzE[Ylog(z)+(1−Y)log(1−z)|X] (7) =maxz(E[Y|X]log(z)+(1−E[Y|X])log(1−z)), (8)

where is fixed. Equation 7 is maximized for . Coincidentally, the exact same result holds if using mean squared error loss. When using either loss function with two outputs and the softmax activation for the last neural network layer, the first output will asymptotically approach and the other by construction will be . The ratio of these two outputs is then:

 g(x)1−g(x) =E[Y|X=x]E[1−Y|X=x] (9) =Pr(Y=1|X=x)Pr(Y=0|X=x) (10) =p(X|Y=1)Pr(Y=1)p(X|Y=0)Pr(Y=0) (11) =Likelihood ratio×Pr(Y=1)Pr(Y=0). (12)

Therefore, the output is proportional to the likelihood ratio. The proportionality constant is the ratio of fractions of the two classes used during the training. In the paper, the two classes always have the same number of examples and thus this factor is unity.