AI Safety for High Energy Physics

by   Benjamin Nachman, et al.

The field of high-energy physics (HEP), along with many scientific disciplines, is currently experiencing a dramatic influx of new methodologies powered by modern machine learning techniques. Over the last few years, a growing body of HEP literature has focused on identifying promising applications of deep learning in particular, and more recently these techniques are starting to be realized in an increasing number of experimental measurements. The overall conclusion from this impressive and extensive set of studies is that rarer and more complex physics signatures can be identified with the new set of powerful tools from deep learning. However, there is an unstudied systematic risk associated with combining the traditional HEP workflow and deep learning with high-dimensional data. In particular, calibrating and validating the response of deep neural networks is in general not experimentally feasible, and therefore current methods may be biased in ways that are not covered by current uncertainty estimates. By borrowing ideas from AI safety, we illustrate these potential issues and propose a method to bound the size of unaccounted for uncertainty. In addition to providing a pragmatic diagnostic, this work will hopefully begin a dialogue within the community about the robust application of deep learning to experimental analyses.


page 1

page 2

page 3

page 4


MadMiner: Machine learning-based inference for particle physics

The legacy measurements of the LHC will require analyzing high-dimension...

The many faces of deep learning

Deep learning has sparked a network of mutual interactions between diffe...

Simulation-based inference methods for particle physics

Our predictions for particle physics processes are realized in a chain o...

A Content-Based Deep Intrusion Detection System

By growing the number of Internet users and the prevalence of web applic...

Contemporary machine learning: a guide for practitioners in the physical sciences

Machine learning is finding increasingly broad application in the physic...

Quantifying Uncertainty for Machine Learning Based Diagnostic

Virtual Diagnostic (VD) is a deep learning tool that can be used to pred...

Effective LHC measurements with matrix elements and machine learning

One major challenge for the legacy measurements at the LHC is that the l...

I Introduction

Experiments in collider-based high-energy physics (HEP) rely critically on detailed simulations which model length scales from sub-nuclear reactions all the way to macroscopic detector-length scales in order to connect fundamental theories to experimentally-observable quantities. Typical experiments, such as measurements of physical constants or searches for new particle species, are designed using blinded methodology, and depend on these calibrated simulations to predict the relative rates of background and signal events. These predictions are in turn used to define the statistical significance and/or confidence intervals of the results observed after unblinding. While the simulations involved in this process are highly sophisticated, they are only an approximation to reality and therefore systematic mismodeling must be accounted for by calibrating to data, when possible, and by assessing systematic uncertainties.

Traditionally, partitions of the data known as signal and control regions

are defined by applying selective criteria on physical observables, in order to isolate regions of the data that are expected to be sensitive to the phenomena of interest from well-understood phenomena. It is then possible to validate and/or calibrate the simulated background predictions against data observed in the signal-free region without biasing the blinded analysis. However, it is often the case that several different observables, perhaps following some complicated relationship, are useful for defining such regions of the data. A typical application of machine learning in HEP is to automate the construction of signal and control regions by reformulating the task as an optimization problem; for example, a binary classifier may be trained on simulations to label observed events as signal-like or background-like. While this has been done for years using “shallow” classifiers such as Boosted Decision Trees, the success of these methods have generally depended strongly on the choice of features input to the classifier, incurring a significant amount of effort towards “feature engineering” to identify useful

high-level observables.

With the more recent introduction of deep learning methods, it has become possible to construct increasingly elaborate classification models using higher-dimensionality input features. Perhaps surprisingly, it has been shown111There are now too many examples to cite them all here. References Larkoski et al. (2017); Radovic et al. (2018); Guest et al. (2018) are recent reviews of deep learning in HEP and References Baldi et al. (2014); de Oliveira et al. (2016); Baldi et al. (2016); Guest et al. (2016) were the earliest applications of deep learning to collider-based HEP classification problems. that when provided with high-dimensional low-level (HDLL) features (i.e. observables that are minimally processed using physical intuition), deep neural networks are able to automatically learn to exceed the performance of networks trained on physically-motivated high-level features.

Increasingly, experimentalists at the Large Hadron Collider (LHC) and elsewhere are taking this message seriously. While the first analysis-level deep learning results from the LHC are only starting to become public (see e.g. ATLAS Collaboration (2019b, a); CMS Collaboration (2019)), analysis-non-specific deep learning models have been used for a few years, notably with early successful applications in flavor tagging CMS Collaboration (2017); ATLAS Collaboration (2017). In addition, there is a plethora of experimental and phenomenological studies for additional methods which will likely be realized as part of physics analyses in the near future. This includes proposals to use the lowest-level inputs available from the detector, reaching input feature dimensionalities of  Andrews et al. (2018, 2019) and beyond.

A potential problem with this approach arises when combining deep learning on HDLL features with the conventional simulation-based analysis paradigm. In the traditional approach, uncertainties on the extrapolation between the control region and the signal region rely on simulation variations that can be validated on a small number of one-dimensional physically-motivated features. Correlated uncertainties covering the full HDLL feature space are often not known or experimentally infeasible Nachman (2019)

. Moreover, low-dimensional validation may be insufficient: recent developments in the area of Artificial Intelligence (AI) safety have demonstrated that when neural networks operate on high-dimensional input spaces, classification performance of neural networks can be arbitrarily degraded by applying subtle variations to the input features 

Szegedy et al. (2014); Goodfellow et al. (2015).

To illustrate this challenge, we implement an adversarial attack that demonstrates small perturbations in detector-level measurements can have drastic effects on the performance of a neural network trained to identify signal-like events. This adversarial procedure can be conceptualized as a sort of demon THOMSON (1874) which intercedes between a physical process and the observation of that process, in such a way as to maximally confound a neural network while remaining minimally noticeable by current experimental standards. While certainly no such demon exists, we propose a procedure based on this concept as a diagnostic tool to evaluate the worst-case sensitivity of a deep network-based anaysis to uncertainty from mismodeling high-dimensional correlations. This bound may then be used either to demonstrate that a given network architecture is manifestly robust against a certain class of systematic uncertainties, or as a guiding metric to aid in the future development of more robust networks.

Ii Benchmark Problem

As one of the first examples from HEP in which both feature engineering and deep learning have demonstrated promising advantages, jet classification Larkoski et al. (2017) is a natural testing ground for this study. Jets are collimated sprays of particles resulting from high energy quark and gluon fragmentation, which are clustered Cacciari et al. (2008) into groups that approximate physical states in the original hard scattering process. A typical problem is to identify whether a jet originated from a quark/gluon state, or by the decay of some intermediate massive particle. In particular, we simulate two scattering processes: a background consisting of dijets, and a signal comprised of -bosons produced in association with an energetic photon, in which the Lorentz-boosted particle subsequently decays to a pair of quarks.

All samples are simulated at parton-level using Madgraph5 Alwall et al. (2014), with fragmentation by Pythia 8 Sjostrand et al. (2006, 2008), and an ATLAS-like detector simulation by Delphes 3.4.1 de Favereau et al. (2014). Activated regions of the simulated calorimeter detector (‘towers’) are clustered using the anti- algorithm with radius parameter  Cacciari et al. (2008), and only the highest222Using collider coordinates, is the particle momentum transverse to the collision axis, is the azimuthal angle and the pseudo-rapidity is , where is the polar angle.- jet of each event is considered for the classification task. Furthermore, selected events must contain a jet with GeV and mass GeV/. This jet must also be comprised of at least three constituent towers.

After clustering, a jet is represented as a truncated list333The architecture described below can accommodate a variable number of inputs. However, the adversarial setup is currently configured to output a fixed size perturbation. The loss in performance when using more than 64 constituents was negligible. of constituent 4-momenta: . The resulting dimensionality of observable features for learning is thus about 200. The of each constituent is expressed in units of TeV, resulting in a maximum value that is of order unity, suitable for input to neural networks. Jets are then input to two different network architectures which are trained to discriminate signal jets from background jets.

The first architecture, referred to as the High-level (HL) model, has a first layer with no learnable parameters which computes four jet-level observables from the input constituents: , , invariant mass, and

. The first three features simply represent the 4-vector associated with the entire jet; the quantity

 Larkoski et al. (2014) is theoretically motivated from the strong force and designed to identify jets with radiation patterns consistent with two sub-jet axes, characteristic of Lorentz boosted boson decays. The remainder of the network is comprised of fully-connected layers, each with

units and ReLU activation, followed by a single output neuron with sigmoid activation.

The second architecture, referred to as the Low-level (LL) model, is a Particle Flow Network Komiske et al. (2019), also with a special non-parameteric first layer. This layer shifts the values of constituents such that the jet axis is centered at zero. This common preprocessing step is done within the network so that adversarial attacks are unable to alter the jet origin. The benchmark LL model has three layers in the subnetwork with units and threelayers in the subnetwork, each with units. Both subnetworks use ReLU activations on all layers. The final layer is again a single neuron with sigmoid activation. We note that similar results were obtained using a simple fully-connected network. However, the PFN architecture is invariant with respect to permutations of the input constituents, which simplifies the interpretation as reordering caused by the adversary has no impact on the result. We suspect that any other similarly performing LL network (see e.g. Ref. Butter and others (2019)) will have a similar susceptibility as the PFN.

Both networks are implemented using Keras 

Chollet (2017)

and Tensorflow 

Abadi et al. (2016), and are trained using the Adam Kingma and Ba (2014)

optimizer. The loss function is the binary cross-entropy to classify signal and background events. After tuning the architectures via hyperparameter scans, we found the LL network (AUC=

) was able to significantly outperform the HL network (AUC=), as is often the case.

Iii Methods

To demonstrate the potential sensitivity of HDLL networks to subtle mismodeling of their input features, we subject each of our benchmark networks to an adversarial attack. An adversarial attack exploits the gradient of the (fixed) target network with respect to its inputs, in order to shift those inputs to solicit the desired response from the classifier.

Note that in order to realize the specific mismodeling necessary to foil a particular classifier, the parameters of the target network must be known; hence the “demon” posited in Sec. I

. Nonetheless, as this attack yields a mathematically optimal perturbation to a given input for a given network, it can be viewed as a worst-case scenario. Therefore, if a specific network is shown to be robust against the attack, it is also reasonable to conclude that the effects of any intractable systematic mismodelings present in physics simulations are safe to ignore. Conversely, if a network is shown to be sensitive to such attacks, the adversarially-induced systematic shift can be viewed only as a (potentially weak) upper bound for more realistic systematic effects. In this case, a poor upper bound may indicate more careful scrutiny of a network’s systematic exposure is warranted. The bound may also be used as a guiding metric in the development of more robust models, as discussed in Sec. 


In this work we implement two different forms of adversarial attack. The first is based on the fast gradient sign method (FGSM) proposed in Ref. Goodfellow et al. (2015), which computes a bounded perturbation for a given input. The second is a broader attack utilizing an adversarial neural network which learns to construct malicious jets for arbitrary inputs. The former method is the most literal realization of our “demon”, as each jet is individually modified to optimally confound the network. The latter demonstrates that there exists a universal mapping between one dataset (e.g. simulation) and another (e.g. experimental data) which systematically affects network performance. Therefore the adversary can be thought of as transfer function representing a measurement effect or theoretical mismodeling.

In both cases, our demon is tasked with transforming signal jets to induce a background-like response from the benchmark classifier. In such a scenario, an experimental analysis optimized using the simulated signal model would be very likely to reject true signal events as background. This in turn could lead to overconfidently excluding a theoretical hypothesis, or even missing a discovery of new physics altogether.

iii.1 Fast Gradient Sign Method

The FGSM method works by taking an individual jet and regressing the loss function to compute a bounded perturbation , such that network’s response to the input tends towards some desired value. Specifically, the perturbation is given by:


where is the target classifier network, is the binary crossentropy loss function, and is the label corresponding to background events. The size of the gradient is scaled to a free parameter , as described below. This ensures that the perturbation of each input observable is bounded while moving approximately in the direction normal to the decision boundary.

In our example, this gradient, like the input, is a tensor, where the quantities on the second axis represent , , and . We want to ensure that the perturbation is small relative to experimental resolution, yet these observables have rather different scales associated to them. To accommodate this, and ensure a physically relevant perturbation, the perturbation is given by , where and . In principle, these do not need to be the same, but are chosen here to all be . All multiplications in Eq. 1 follow tensorflow broadcast semantics Abadi et al. (2016). This procedure is iteratively applied 10 times so the scale of the perturbations are bounded by 0.01.

iii.2 Adversarial Network Method

In this approach, given a target classifier , we train a second neural network . The goal is to learn a map , for an arbitrary signal or background jet such that the classifier network will perceive as background. Because various observables can be validated against data for backgrounds, it is important that the distribution before match the distribution after perturbation. For practical reasons, we enforce this by preventing the adversary from making large changes to these observables on a jet-by-jet basis for background events.

To further simplify training, we also bound the degree to which can modify constituents, and prevent it from creating new constituents within a jet. In particular, the adversary is unable to induce collinear splittings or add spurious soft radiation that physically-motivated observables are often designed to be robust against. Given these constraints in the form of , the attack presented here represents a worst-case scenario only for a specific class of mismodeling. As it turns out, even this restricted form of attack can have surprisingly large effects; we leave the assessment of sensitivity to more general attack models to future work.

The adversarial network is trained by minimizing separate loss functions for signal and background defined by:


is the categorical crossentropy, which impels to modify signal jets so as to be labeled as background by . The first term of minimizes changes between the target network’s response to the jet before and after the adversarial perturbation. The functions represent any features of interest to be preserved. The tunable hyperparameters encode the adversary’s preference to preserve the target network response and observable features, respectively, for background events.

In our experiments, is a fully-connected network with 4 hidden layers, each with 300 units and ReLU activation. The penultimate layer has units, with activation. Analogously to the function in Eq. 1 and the bounding parameters in Sec. III.1, the outputs of the final layer are bounded by applying a activation, and the axes corresponding to , , and are scaled by parameters , , and , respectively. The output of this layer represents a differential change in the input jet, . The final layer is essentially a residual skip-connection layer computing as described in Sec. III.1.

A separate adversary is trained for each of the HL and LL benchmark networks. In both cases, the bounding magnitude of the constituent perturbations are fixed at , which is slightly larger than the scale of perturbations for the FGSM. Two observable constraints are included in : the jet mass and . The parameters and

are tuned by training until either convergence or until certain validation criteria are violated. The validation criteria are met when the Kolmogorov-Smirnoff (KS) test statistic between perturbed and unperturbed background distributions are below heuristically-defined thresholds of 0.04 for jet mass and

, and 0.02 for classifier response. In practice, these thresholds would be set by the data statistics as well as the size of known experimental uncertainties. A more realistic test in practice is to consider the agreement between validation histograms evaluated in an unblinded control region, as illustrated in Fig. 1 for the case of the LL network.

Figure 1: Illustration of typical validation procedure. Pseudodata (black points) are sampled from the BG distribution with the adversarial perturbation applied; solid histograms show the unperturbed BG model. Top: The unshaded control region in this case is defined where the signal efficiency is expected to be less than 10%; the shaded region would typically be blinded when designing an experiment. The green vertical line indicates the expected optimal signal region. Middle, Bottom: The jet and mass distributions for events in the control region. Good agreement is observed between the “observed” pseudodata and the expected background model in the control region for all three observables. The values are , , and repsectively.

Iv Results

To quantify the effect of these adversarial attacks, we consider a simplified example of a typical experimental analysis in HEP. If and are the predicted number of signal and background events, respectively, then in the asymptotic limit (,  Cowan et al. (2011)), the expected statistical significance of an observation with respect to the background-only hypothesis is

, in units of standard deviations. After considering only events that pass a classifier threshold, the relative change in the significance is

, where is the true positive rate (signal efficiency) and is the false positive rate (background efficiency). A classifier is only useful for improving the sensitivity of a search if this relative discovery significance exceeds unity. The relative discovery significance for both the LL and HL classifiers are shown in Fig. 2. As expected given it’s AUC (Sec. II), the LL classifier yields a more sensitive result than the HL classifier, with peak relative discovery significances of about 2.5 and 1.5, respectively. Figure 2 additionally shows the relative discovery significance after the application of the FGSM. This bounded perturbation is only applied to signal events and designed to make them look more like background events. Both the LL and HL relative discovery significance are degraded by this perturbation by about 30%. Additionally, the optimal classifier threshold shifts for the HL case, so a threshold chosen based on the nominal simulation would actually have a relative discovery significance less than unity in the perturbed simulation.

Figure 2: The relative discovery significance as a function of the classifier threshold for HL observables and LL observables before and after the FGSM perturbation. Also shown is the effect induced by randomly perturbing constituents with the same values used for the FGSM.

The FGSM attack has bounded perturbations on the jet constituent four-vectors, but is otherwise unconstrained. Figure 3 shows the modifications to various signal observables as a result of the FGSM attack. Due to the limited size of the perturbation, the qualitative shapes of the signal , mass, and distributions are the same and after the FGSM perturbation. Of these, the mass distribution is most affected. The distributions of the classifier outputs are shifted to the left, overlapping more with the background distribution, hence the degradation observed in Fig. 2.

Figure 3: The classifier output and various observables before and after the FGSM attack for signal and background. Only 1500 signal jets are used for the FGSM perturbation.

While the FGSM perturbation is illustrative, it is too synthetic because only the signal events are attacked and there are no constraints on HL observables that can be validated. The adversarial attack described in Sec. III.2 avoids both of these issues, and the analog to Fig. 2 is presented in Fig. 4. For these discovery significance curves, both the signal and background distributions are perturbed by the same network, representing a consistent mismodeling between the jets expected by the simulation model (pre-perturbation), and the jets actually observed in our hypothetical scenario (post-perturbation). The degradation in the relative discovery significance is comparable to the FGSM for the fully trained LL network, however, we were unable to train an adversary to produce as large an effect on the HL network. This is to be expected, as the HL network derives much of its classification power from the jet mass observable, which the adversarial network is constrained to minimally change.

Fig. 4 also demonstrates the effect of the adversarial attack on deliberately undertrained instances of the LL network. We found that the classifier network’s training is interruped early on, the susceptibility to the adversarial attack is reduced, and tends to increase with additional training. In particular, when the LL network is trained only to the same level of performance as the HL network, it is nearly impervious to the adversarial attack. We hypothesize that the additional information the LL network uses in order to outperform the HL network is more sensitive to small-scale perturbations than the theoretically-motivated HL observables. Although this effect seems to have spurious counterexamples due to random network initialization, the trend may suggest that undertraining very sensitive HDLL networks could be be a useful regularization technique to build in analysis robustness, while still providing a performance boost relative to HL architectures.

Figure 4:

Effect of adversarial mismodeling on discovery significance, for high-level and low-level feature networks. The vertical gray line indicates the expected optimal selection threshold, which differs by about 25% from the “true” significance when taking the adversarial perturbation into account. The HL network’s expected sensitivity differs by about 15% from the true value. While the fully-trained low-level network is expected to perform better than the high-level network, it is also more strongly affected by an adversarial attack. However, when the LL network is deliberately undertrained, its susceptibility is reduced. Also shown is the effect induced by randomly perturbing constituents by a uniform distribution in the range


Representative HL features and the classifier distributions for the adversarial attack are presented in Fig. 5. Even though both signal and background jets are subjected to the same adversary, the background distributions are nearly identical before and after the perturbation. In contrast, the classifier response and mass distributions are noticeably distorted for the signal. This allows the systematic mismodeling induced by the adversary to go undetected in typical experimental conditions, as shown in Fig. 1. The green line delineating the signal region corresponds to the maximum discovery significance expected based on the simulated signal and background models. The shaded region, defined as the region in which expected signal efficiency exceeds 10%, is taken to be blinded during experimental design and validation phase. ‘Observations’ are samples from the perturbed simulation and the ‘Expected’ prediction is the unperturbed simulation. The jet and mass distributions in the validation region agree well between the Observed and Expected values to within statistical uncertainty. Despite this apparent agreement, due to the adversary’s effect on jets in the signal region, the discovery sensitivity for a potential signal at the predicted optimal working point is reduced by about 25% as shown in Fig. 4.

Figure 5: Comparison of the effect the adversarial network perturbations on the LL and HL classifier response, as well as various jet observables.

V Discussion

In the traditional HEP search paradigm, simulations are used to extrapolate predictions from a control region to a signal region. Many sources of uncertainty on this extrapolation are well-constrained from auxiliary measurements on individual reconstructed objects, but others are mostly unvalidated from data. For example, the modeling of strong-force processes related to hadron formation is a complex multiscale process for which it is customary to compare two different models (such as Pythia Sjostrand et al. (2006, 2008) and Herwig Bahr and others (2008); Bellm and others (2016) or Sherpa Gleisberg et al. (2009); Bothmann and others (2019)). These algorithms model the same physical processes in different ways, with a mix of formal and phenomenological insight, and the difference between models is treated as a systematic uncertainty in statistical analyses. Given the high dimensionality of collider events, it is unlikely that a single nuisance parameter encoding the distance between two arbitrary models represents a reasonable prior to cover the distribution of all possible systematic mismodelings.

The results of Sec. IV show that small perturbations in the high-dimensional phase space of collider events can significantly change the scientific conclusions from a given dataset. If a complete uncertainty model in high-dimensions ensured that such perturbations were unphysical, then the adversarial results are irrelevant. However, this is not the current situation described above. Furthermore, networks trained on high-dimensional low-level inputs can outperform networks trained on high-level features precisely because they can take advantage of subtle correlations distributed across multiple dimensions. The modeling of such correlations are particularly difficult to validate. Significant physics input is required to build a full phase space uncertainty model.

As is characteristic of problems involving nonconvex optimization, is difficult to show that a particular adversarial attack is maximal. Therefore, the existence of an attack with a particular impact only provides a lower bound for the upper bound. Nonetheless, if a certain network architecture is repeatedly shown to be particuarly difficult to attack, it may lend credibility to the currently-accepted treatment of simply ignoring certain high-dimensional systematic uncertainties.

In any case, for now the method described here offers the only rigorous means for quantifying how sensitive an analysis procedure is to high-dimensional mismodeling. A given analysis procedure, including the control/signal region definition and any auxiliary features that will be validated in data, can be attacked to quantify the impact on the signal sensitivity. While general methods from AI safety may also be useful for making classifiers robust to attacks, Sec. IV demonstrated that networks based on physically-motivated features can be less sensitive to adversarial perturbations, if only because it is feasible to ensure these observables are modeled reliably. However, it may be possible to design sensitive architectures that are able to leverage low-level information while remaining robust against adversarial attacks, for example by exploiting symmetries and other physically-motivated constraints. Additionally, evidence suggests that purposefully undertraining HDLL networks may serve to reduce systematic exposure while meeting the performance of simpler HL network models.

Even though the adversarial methods presented in Sec. IV were able to make targeted attacks knowing the full form of the classifier, they are not the most general attack possible. First of all, the perturbations were not allowed to split particles into multiple particles nor were they able to add new particles. The modeling of such ‘soft’ and ‘collinear’ physics is particularly challenging and so such effects are an interesting class of perturbations for future studies. Second, the true values of individual features or combinations of features are not observable - only distribution-level statistics can be validated. In this work, per-constituent perturbations were constrained to ensure that observable features were approximately unperturbed on a jet-by-jet basis. However, it is worth considering adversarial examples which have limited resemblance to any particular jet in the original dataset, while preserving properties of ensembles of events (such as the jet mass distribution). One may be able to generalize the procedures described here using constraints on sufficiently large mini-batches or even on entire datasets.

Vi Conclusions

The interest in deep learning methods for HEP has grown significantly since the first studies were published five years ago Baldi et al. (2014). While these methods hold great promise to enhance experimental sensitivity to discover new fundamental properties of nature, conventional analysis techniques must be updated. We have shown that neural networks using high-dimensional low-level features (and to a lesser extent, high-dimensional high-level features) are highly sensitive to mismodeled inputs. Current uncertainty estimates may not be sufficient to address uncertainties involved when using high-dimensional features, and traditional validation methods may be ineffective in detecting such problems. We have proposed adversarial approaches to evaluate and compare the sensitivity of deep learning-based analysis procedures. While this is a crude bound, it may be used to demonstrate robustness against specific classes of uncertainty, or to diagnose situations where further studies are needed. This work will hopefully begin a dialogue within the community about the robust application of deep learning to experimental measurements and searches.


We thank Paul Tipton for constructive feedback since the early stages of this project. We are also grateful to Jesse Thaler and Daniel Whiteson for insightful comments and discussions. This work is supported by the DOE under contracts DE-AC02-05CH11231 and DE-SC0017660. BPN would like to thank NVIDIA for providing Volta GPUs used for some of the numerical examples and the Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611.


  • M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. (2016) TensorFlow: a system for large-scale machine learning.. In OSDI, Vol. 16, pp. 265–283. Cited by: §II, §III.1.
  • J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. -S. Shao, T. Stelzer, P. Torrielli, and M. Zaro (2014) The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP 07, pp. 079. External Links: Document, 1405.0301 Cited by: §II.
  • M. Andrews, M. Paulini, S. Gleyzer, and B. Poczos (2018) End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC. External Links: 1807.11916 Cited by: §I.
  • M. Andrews, J. Alison, S. An, P. Bryant, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, and E. Usai (2019) End-to-End Jet Classification of Quarks and Gluons with the CMS Open Data. External Links: 1902.08276 Cited by: §I.
  • ATLAS Collaboration (2017) Identification of Jets Containing

    -Hadrons with Recurrent Neural Networks at the ATLAS Experiment

    ATL-PHYS-PUB-2017-003. External Links: Link Cited by: §I.
  • ATLAS Collaboration (2019a) Search for direct top squark pair production in the 3-body decay mode with a final state containing one lepton, jets, and missing transverse momentum in TeV collision data with the ATLAS detector. ATLAS-CONF-2019-017. External Links: Link Cited by: §I.
  • ATLAS Collaboration (2019b) Search for non-resonant Higgs boson pair production in the final state with the ATLAS detector in collisions at TeV. External Links: 1908.06765 Cited by: §I.
  • M. Bahr et al. (2008) Herwig++ Physics and Manual. Eur. Phys. J. C58, pp. 639–707. External Links: Document, 0803.0883 Cited by: §V.
  • P. Baldi, K. Bauer, C. Eng, P. Sadowski, and D. Whiteson (2016) Jet Substructure Classification in High-Energy Physics with Deep Neural Networks. Phys. Rev. D93 (9), pp. 094034. External Links: Document, 1603.09349 Cited by: footnote 1.
  • P. Baldi, P. Sadowski, and D. Whiteson (2014) Searching for Exotic Particles in High-Energy Physics with Deep Learning. Nature Commun. 5, pp. 4308. External Links: Document, 1402.4735 Cited by: §VI, footnote 1.
  • J. Bellm et al. (2016) Herwig 7.0/Herwig++ 3.0 release note. Eur. Phys. J. C76 (4), pp. 196. External Links: Document, 1512.01178 Cited by: §V.
  • E. Bothmann et al. (2019) Event Generation with Sherpa 2.2. SciPost Phys. 7, pp. 034. External Links: Document, 1905.09127 Cited by: §V.
  • A. Butter et al. (2019) The Machine Learning Landscape of Top Taggers. SciPost Phys. 7, pp. 014. External Links: Document, 1902.09914 Cited by: §II.
  • M. Cacciari, G. P. Salam, and G. Soyez (2008) The Anti-k(t) jet clustering algorithm. JHEP 04, pp. 063. External Links: Document, 0802.1189 Cited by: §II, §II.
  • F. Chollet (2017) Keras. GitHub. Note: Cited by: §II.
  • CMS Collaboration (2017) Heavy flavor identification at CMS with deep neural networks. CMS-DP-2017-005. External Links: Link Cited by: §I.
  • CMS Collaboration (2019) Search for direct top squark pair production in events with one lepton, jets and missing transverse energy at 13 TeV. CMS-PAS-SUS-19-009. External Links: Link Cited by: §I.
  • G. Cowan, K. Cranmer, E. Gross, and O. Vitells (2011) Asymptotic formulae for likelihood-based tests of new physics. Eur. Phys. J. C71, pp. 1554. Note: [Erratum: Eur. Phys. J.C73,2501(2013)] External Links: Document, 1007.1727 Cited by: §IV.
  • J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaitre, A. Mertens, and M. Selvaggi (2014) DELPHES 3, A modular framework for fast simulation of a generic collider experiment. JHEP 02, pp. 057. External Links: Document, 1307.6346 Cited by: §II.
  • L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman (2016) Jet-images ? deep learning edition. JHEP 07, pp. 069. External Links: Document, 1511.05190 Cited by: footnote 1.
  • T. Gleisberg, Stefan. Hoeche, F. Krauss, M. Schonherr, S. Schumann, F. Siegert, and J. Winter (2009) Event generation with SHERPA 1.1. JHEP 02, pp. 007. External Links: Document, 0811.4622 Cited by: §V.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. See DBLP:conf/iclr/2015, External Links: Link Cited by: §I, §III.
  • D. Guest, K. Cranmer, and D. Whiteson (2018) Deep Learning and its Application to LHC Physics. Ann. Rev. Nucl. Part. Sci. 68, pp. 161–181. External Links: Document, 1806.11484 Cited by: footnote 1.
  • D. Guest, J. Collado, P. Baldi, S. Hsu, G. Urban, and D. Whiteson (2016) Jet Flavor Classification in High-Energy Physics with Deep Neural Networks. Phys. Rev. D94 (11), pp. 112002. External Links: Document, 1607.08633 Cited by: footnote 1.
  • D. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. External Links: 1412.6980 Cited by: §II.
  • P. T. Komiske, E. M. Metodiev, and J. Thaler (2019) Energy Flow Networks: Deep Sets for Particle Jets. JHEP 01, pp. 121. External Links: Document, 1810.05165 Cited by: §II.
  • A. J. Larkoski, I. Moult, and B. Nachman (2017) Jet Substructure at the Large Hadron Collider: A Review of Recent Advances in Theory and Machine Learning. External Links: 1709.04464 Cited by: §II, footnote 1.
  • A. J. Larkoski, I. Moult, and D. Neill (2014) Power Counting to Better Jet Observables. JHEP 12, pp. 009. External Links: Document, 1409.6298 Cited by: §II.
  • B. Nachman (2019) A guide for deploying Deep Learning in LHC searches: How to achieve optimality and account for uncertainty. External Links: 1909.03081 Cited by: §I.
  • A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad (2018) Machine learning at the energy and intensity frontiers of particle physics. Nature 560 (7716), pp. 41–48. External Links: Document Cited by: footnote 1.
  • T. Sjostrand, S. Mrenna, and P. Z. Skands (2006) PYTHIA 6.4 Physics and Manual. JHEP 05, pp. 026. External Links: Document, hep-ph/0603175 Cited by: §II, §V.
  • T. Sjostrand, S. Mrenna, and P. Z. Skands (2008) A Brief Introduction to PYTHIA 8.1. Comput. Phys. Commun. 178, pp. 852–867. External Links: Document, 0710.3820 Cited by: §II, §V.
  • C. Szegedy, G. Inc, W. Zaremba, I. Sutskever, G. Inc, J. Bruna, D. Erhan, G. Inc, I. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In In ICLR, Cited by: §I.
  • W. THOMSON (1874) Kinetic theory of the dissipation of energy. Nature 9 (232), pp. 441–444. External Links: Document, ISBN 1476-4687, Link Cited by: §I.