1 Introduction
An immense search effort by the LHC collaborations has successfully probed many extreme regions of the Standard Model phase space atlasexoticstwiki ; atlassusytwiki ; atlashdbspublictwiki ; cmsexoticstwiki ; cmssusytwiki ; cmsb2gtwiki ; lhcbtwiki . Despite strong theoretical and noncollider experimental motivation, there is currently no convincing evidence for new particles or forces of nature from the LHC searches. However, many final states are uncovered Kim:2019rhy ; Craig:2016rqv
and the full hypervariate phase space accessible by modern detector technology is only starting to be probed holistically with deep learning methods
Larkoski:2017jix ; Guest:2018yhq ; Abdughani:2019wuv ; Radovic:2018dip . There is a great need for new searches that can identify unexpected scenarios.Until recently, nearly all model independent searches relied heavily on simulation. Generically, these searches operate by comparing data with backgroundonly simulation in a large number of phase space regions. Such searches have been performed without machine learning at D0
sleuth ; Abbott:2000fb ; Abbott:2000gx ; Abbott:2001ke , H1 Aaron:2008aa ; Aktas:2004pz , CDF Aaltonen:2007dg ; Aaltonen:2007ab ; Aaltonen:2008vt , CMS CMSPASEXO14016 ; CMSPASEXO10021 , and ATLAS Aaboud:2018ufy ; ATLASCONF2014006 ; ATLASCONF2012107. A recent phenomenological study proposed extending this idea to deep learning classifiers
DAgnolo:2018cun ; DAgnolo:2019vbw . While independent of signal models, these approaches are dependent on the fidelity of the background model simulation for both signal sensitivity and background accuracy. If the background simulation is inaccurate, then differences between simulation and (backgroundonly) data will hide potential signals. Even if a biased simulation can find a signal, if the background is mismodeled, then the signal specificity will be poor.A variety of approaches have been proposed to enhance signal sensitivity without simulations. Such proposals are based on clustering or nearest neighbor algorithms DeSimone:2018efk ; Mullin:2019mmh ; 1809.02977
Farina:2018fyg ; Heimel:2018mkt ; Roy:2019jae ; Cerri:2018anq ; Blance:2019ibf ; Hajer:2018kqm , probabilistic modeling Dillon:2019cqt , weak supervision Collins:2018epr ; Collins:2019jip , density estimation anode , and others AguilarSaavedra:2017rzt . These approaches must also be combined with a background estimation strategy. If simulation is used to estimate the background, then the specificity is the same as the modeldependent searches. Many of these approaches can be combined with a resonance search, as explicitly demonstrated in Ref. Collins:2018epr ; Collins:2019jip . The background estimation strategy may impose additional constraints on the learning, such as the need for decorrelation between a resonant feature and other discriminative features Louppe:2016ylz ; Dolen:2016kst ; Moult:2017okx ; Stevens:2013dya ; Shimmin:2017mfk ; Bradshaw:2019ipy ; ATLPHYSPUB2018014 ; Xia:2018kgd ; Englert:2018cfo ; Wunsch:2019qbo ; Disco . A detailed overview of model independent approaches can be found in Ref. anode .While it is desirable to be robust to background model inaccuracies, it is also useful to incorporate information from Standard Model simulations. Even though these simulations are only an approximation to nature, they include an extensive set of fundamental and phenomenological physics models describing the highest energy reactions all the way to signal formation in the detector electronics. This paper describes a method that uses a background simulation in a way that depends as little on that simulation as possible. In particular, a model based on the
Deep neural networks using Classification for Tuning and Reweighting
(dctr) procedure Andreassen:2019nnm is trained in a region of phase space that is expected to be devoid of signals. In a resonance search, there is one feature where the signal is known to be localized and the sideband can be used to train the dctr model. This reweighting function learns to morph the simulation into the data and is parameterized in the feature(s) used to mask potential signals. Then, the model is interpolated to the signalsensitive region and the reweighted background simulation can be used for both enhancing signal sensitivity and estimating the Standard Model background. As deep learning classifiers can naturally probe high dimensional spaces, this reweighting model can in principle exploit the full phase space for both enhancing signal sensitivity and specificity.This paper is organized as follows. Section 2 introduces the Simulation Assisted Likelihoodfree Anomaly Detection (salad) method. A dijet search at the LHC is emulated to illustrate the new method. The simulation and deep learning setup are introduced in Sec. 3 and then the application of dctr is shown in Sec. 4. The signal sensitivity and specificity are presented in Sec. 5 and 6, respectively. The paper ends with conclusions and outlook in Sec. 7.
2 Methods
Let be a feature (or set of features) that can be used to localize a potential signal in a signal region (SR). Furthermore, let be another set of features which are useful for isolating a potential signal. The prototypical example is a resonance search where is the single resonant feature, such as the invariant mass of two jets, while are other properties of the event, such as the substructure of the two jets. The salad method then proceeds as follows:

Train a classifier to distinguish data and simulation for . This classifier is parameterized in by simply augmenting with , Cranmer:2015bka ; Baldi:2016fzo . If is trained using the binary cross entropy or the mean squared error loss, then asymptotically, a weight function is defined by
(1) where the last factor in Eq. 1 is an overall constant that is the ratio of the total amount of data to the total amount of simulation. This property of neural networks to learn likelihood ratios has been exploited for a variety of full phase space reweighting and parameter estimation proposals in high energy physics Andreassen:2019nnm ; Brehmer:2018hga ; Brehmer:2018eca ; Brehmer:2018kdj ; Cranmer:2015bka ; Andreassen:2019cjw .

Simulated events in the SR are reweighted using . The function is interpolated automatically by the neural network. A second classifier
is used to distinguish the reweighted simulation from the data. This can be achieved in the usual way with a weighted loss function such as the binary crossentropy:
(2) Events are then selected with large values of . Asymptotically^{1}^{1}1Sufficiently flexible neural network architecture, enough training data, and an effective optimization procedure., will be monotonically related with the optimal classifier:
(3) It is important that the same data are not used for training and testing. The easiest way to achieve this is using different partitions of the data for these two tasks. One can make use of more data with a crossvalidation procedure Collins:2018epr ; Collins:2019jip .

One could combine the previous step with a standard datadriven background estimation technique like a sideband fit or the ABCD method. However, one can also directly use the weighted simulation to predict the number of events that should pass a threshold requirement on :
(4) for some threshold value . The advantage of Eq. 4 over other databased methods is that could be correlated with ; for sideband fits, thresholds requirements on cannot sculpt local features in the spectrum.
3 Simulation
A largeradius dijet resonance search is used to illustrate the salad method. The simulations are from the LHC Olympics 2020 community challenge R&D dataset gregor_kasieczka_2019_2629073 . The background process is generic parton scattering (labeled QCD for quantum chromodynamics) and the signal is a hypothetical boson that decays into an boson and boson. Each of the and decay to quarks. The masses of the , and particles are 3.5, 0.5, and 0.1 TeV, respectively. The mass hierarchy between the particle and its decay products means that the and particles are Lorentz boosted in the lab frame and therefore their twoprong decay products are captured inside a single largeradius jet. Particlelevel simulations are produced with Pythia 8 Sjostrand:2006za ; Sjostrand:2007gs or Herwig++ Bahr:2008pv without pileup or multiple parton interactions and a detector simulation is performed with Delphes 3.4.1 deFavereau:2013fsa ; Mertens:2015kba ; Selvaggi:2014mya . Particle flow objects are clustered into jets using the Fastjet Cacciari:2011ma ; Cacciari:2005hq implementation of the anti algorithm Cacciari:2008gp using as the jet radius. Events are selected by requiring at least one such jet with TeV. In the remaining studies, the Pythia dataset is treated as ‘data’, while the Herwig dataset is treated as ‘simulation’, to mimic the scenario in practice where the simulation is different than data.
Figure 1 presents the invariant mass of the leading two jets. The selection is evident from the peak around 3 TeV. The signal peaks around the mass and aside from the kinematic feature from the jet selection, the background distribution is featureless. The spectra from Pythia and Herwig are nearly identical, which may be expected since the invariant mass is mostly determined by hardscatter matrix elements and not final state effects.
To demonstrate the salad approach, two features^{2}^{2}2In principle, salad can readily accommodate the full phase space as used by the original dctr method Andreassen:2019nnm based on particle flow networks Komiske:2018cqr ; this will be explored in future studies. about each of the leading jets are used for classification. The first feature is the jet mass and the second feature is the subjettiness ratio Thaler:2011gf ; Thaler:2010tr . This second feature is the most widely used feature for differentiating jets that have two hard prongs (as in the signal) from jets that have only one hard prong (as for most of the background). The two jets are ordered by their mass and the four features used for machine learning are presented in Fig. 2. As expected, the signal mass distributions show peaks at the and masses and the distributions are small, indicating twoprong substructure. Pythia and Herwig differ mostly at low mass and across the entire distribution.
The baseline performance for classifying signal versus the QCD background is presented in Fig. 3
. As is the case for all neural networks presented in the following sections, three fully connected layers with 100 hidden nodes on each intermediate layer are implemented using Keras
kerasand TensorFlow
tensorflow with the Adam adamoptimization algorithm. Rectified linear units are the activation function for all intermediate layers and the sigmoid is used for the final output layer. Networks are trained with binary cross entropy for 50 epochs with early stopping (with patience 10). The supervised classifier presented in Fig.
3 effectively differentiates signal from background, with a maximum significance improvement of about 10. It is expected that the performance of any model independent approach will be bounded from above by the performance of this classifier.4 Parameterized Reweighting with DCTR
The first step of the dctr reweighting procedure is to train a classifier to distinguish the ‘data’ (Pythia) from the ‘simulation’ (Herwig) in a sideband region. The output of such a classifier is shown in Fig. 4, where the signal region is defined as GeV. There are about 850k events in the sideband region and 150k events in the signal region. Unlike the classifier in Fig. 3, the separation in Fig. 4 is not as dramatic because Pythia and Herwig are much more similar than signal is with QCD. As expected, the network is a linear function of the likelihood ratio so the ratio plot in Fig. 4 is linear. Interestingly, the signal is more Herwiglike than Pythialike. The reweighting function is applied to the Herwig in Fig. 4 to show that the reweighted simulation (Sim.+dctr) looks nearly identical to the ‘Data’. All of the events used for Fig. 4 are independent from the ones used for training the network. Figure 5 shows shows that this reweighting works for all of the input distributions to the neural network as well.
The next step for salad is to interpolate the reweighting function. The neural network presented in Fig. 4 is trained conditional on and so it can be evaluated in the SR for values of the invariant mass that were not available during the network training. Note that the signal region must be chosen large enough so that the signal contamination in the sideband does not bias the reweighting function. For this example, for 25% signal fraction in the signal region, the contribution in the sideband is about 1% and has no impact on the dctr model. Figure 6 shows a classifier trained to distinguish ‘data’ and ’simulation’ in the signal region before and after the application of the interpolated dctr model. There is excellent closure, also for each of the input features to the classifier as shown in Fig. 7.
5 Sensitivity
After reweighting the signal region to match the data, the next step of the search is to train a classifier to distinguish the reweighted simulation from the data in the signal region. If the reweighting works exactly, then this new classifier will asymptotically learn , which is the optimal classifier by the NeymanPearson lemma neyman1933ix . If the reweighting is suboptimal, then some of the classifier capacity will be diverted to learning the residual difference between the simulation and background data. If the reweighted simulation is nothing like the data, then all of the capacity will go towards this task and it will not be able to identify the signal. There is therefore a tradeoff between how different the (reweighted) simulation is from the data and how different the signal is from the background. If the signal is much more different from the background than the simulation is from the background data, it is possible that a suboptimally reweighted simulation will still be able to identify the signal (see Sec. 6 for problems with background estimation).
Figure 8 shows the sensitivity of the salad tagger to signal as a function of the signaltobackground ratio () in the signal region. In all cases, the background is the QCD simulation using Pythia^{3}^{3}3Note that the full one million Pythia events are divided in two pieces, one that acts as the test set for all methods and one that is used for further study. The remaining half is further split in half to represent the data or the simulation for the lines marked ‘Pythia’ in Fig. 8. For a fair comparison, the Herwig statistics are comparable to 25% of the full Pythia dataset. . The Pythia lines correspond to the case where the simulation follows the same statistics as the data ( Pythia). The area under the curve (AUC) should be as close to one as possible and a tagger that is operating uniformly at random will produce an AUC of 0.5. Antitagging (preferentially tagging events that are not signallike) results in an AUC less then 0.5. The maximum significance improvement is calculated as the largest value of , where the 0.01% offset regulates statistical fluctuations at low efficiency.
When the , then the performance in Fig. 8 is similar to the fully supervised classifier presented in Sec. 3. As , the Pythia curves approach the random classifier, with an AUC of 0.5 and a max significance improvement of unity. The Herwig curve has an AUC less than 0.5 as because the signal is more Herwiglike than Pythialike (see Fig. 4) and thus a tagger that requires the features to be datalike (data Pythia) will antitag the signal. Likewise, the efficiency of the tagger on the simulation is higher than 50% when placing a threshold on the NN that keeps 50% of the events in data. The maximum significance improvement quickly drops to unity for Herwig when , indicating the the network is spending more capacity on differentiating Pythia from Herwig than finding signal.
For all four metrics, salad significantly improves the performance of the Herwigonly approach. In particular, the salad tagger is effective to about , whereas the Herwigonly tagger is only able to provide useful discrimination power down to about . For the significance improvement and false positive rate at a fixed true positive rate, the salad tagger tracks the Pythia tagger almost exactly down to below 1%. The AUC about half way between Pythia and Herwig at high , which is indicative of poor performance at low efficiency.
) in the signal region: the area under the curve (AUC) in the top left, the maximum significance improvement (top right), the false positive rate at a fixed 50% signal efficiency (bottom left), and the significance improvement at the same fixed 50% signal efficiency (bottom right). The evaluation of these metrics requires signal labels, even though the training of the classifiers themselves do not have signal labels. Error bars correspond to the standard deviation from training five different classifiers. Each classifier is itself the truncated mean over ten random initializations.
6 Background Estimation
The performance gains from Sec. 5 can be combined with a sideband background estimation strategy, as long as threshold requirements on the classifier do not sculpt bumps in the spectrum. However, there is also an opportunity to use salad to directly estimate the background from the interpolated simulation. Figure 9 illustrates the efficacy of the background estimation for a single classifier trained in the absence of signal. Without the dctr reweighting, the predicted background rate is too low by a factor of two or more below 10% data efficiency. With the interpolated reweighting function, the background prediction is accurate within a few percent down to about 1% data efficiency.
In practice, the difficulty in using salad to directly estimate the background is the estimation of the residual bias. One may be able to use validation regions between the signal region and sideband region, but it will never require as much interpolation as the signal region itself. One can rely on simulation variations and auxiliary measurements to estimate the systematic uncertainty from the direct salad background estimation, but estimating highdimensional uncertainties is challenging Nachman:2019dol ; Nachman:2019yfl . With a lowdimensional reweighting or with a proper highdimensional systematic uncertainty estimate, the parameterized reweighting used in salad should result in a lower uncertainty than directly estimating the uncertainty from simulation. In particular, any nuisance parameters that affect the sideband region and the signal region in the same way will cancel when reweighting and interpolating.
7 Conclusions
This paper has introduced Simulation Assisted Likelihoodfree Anomaly Detection (salad), a new approach to search for resonant anomalies by using parameterized reweighting functions for classification and background estimation. The salad approach uses information from simulation in a way that is nearly backgroundmodel independent while remaining signalmodel agnostic. The only requirement for the signal is that there is one feature where the signal is known to be localized. In the example presented in the paper, this feature was the invariant mass of two jets. The location of the resonance need not be known ahead of time and can be scanned using a series of signal and sideband regions. This scanning will result in a trials factor per nonoverlapping signal region. An additional look elsewhere effect is incurred by scanning the threshold on the neural network. In practice, one could use a small number of widely separated thresholds to be broadly sensitive. As long as the data used for training and testing are independent, there is no additional trials factor for the feature space used for classification. Strategies for maximally using the data for training can be found in Ref. Collins:2018epr ; Collins:2019jip .
While the numerical salad
results presented here did not fully achieve the performance of a fully supervised classifier trained directly with inside knowledge about the data, there is room for improvement. In particular, a detailed hyperparameter scan could improve the quality of the reweighting. Additionally, calibration techniques could be used to further increase the accuracy
Cranmer:2015bka . Future work will investigate the potential of salad to analyze higherdimensional feature spaces as well as classifier features that are strongly correlated with the resonant feature. It will also be interesting to compare salad with other recently proposed model independent methods. When the nominal background simulation is an excellent model of nature, salad should perform similarly to the methods presented in Ref. DAgnolo:2018cun ; DAgnolo:2019vbw and provide a strong sensitivity to new particles. In other regimes where the background simulation is biased, salad should continue to provide a physicsinformed but still mostly background/signal modelindependent approach to extend the search program for new particles at the LHC and beyond.Code and data availability
The code can be found at https://github.com/bnachman/DCTRHunting and the datasets are available on Zendo as part of the LHC Olympics gregor_kasieczka_2019_2629073 .
Acknowledgements.
BN would like to thank Jack Collins for countless discussions about anomaly detection, including ideas related to salad. Some of those discussions happened at the Aspen Center for Physics, which is supported by National Science Foundation grant PHY1607611. This work was supported by the U.S. Department of Energy, Office of Science under contract DEAC0205CH11231. DS is supported by DOE grant DOESC0010008. DS thanks LBNL, BCTP and BCCP for their generous support and hospitality during his sabbatical year. BN would like to thank NVIDIA for providing Volta GPUs for neural network training.References
 (1) ATLAS Collaboration, Exotic Physics Searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ExoticsPublicResults.
 (2) ATLAS Collaboration, Supersymmetry searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/SupersymmetryPublicResults.
 (3) ATLAS Collaboration, Higgs and Diboson Searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/HDBSPublicResults.
 (4) CMS Collaboration, CMS Exotica Public Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsEXO.
 (5) CMS Collaboration, CMS Supersymmetry Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsSUS.
 (6) CMS Collaboration, CMS Beyondtwogenerations (B2G) Public Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsB2G.
 (7) LHCb Collaboration, Publications of the QCD, Electroweak and Exotica Working Group, 2019. http://lhcbproject.web.cern.ch/lhcbproject/Publications/LHCbProjectPublic/Summary_QEE.html.
 (8) J. H. Kim, K. Kong, B. Nachman, and D. Whiteson, The motivation and status of twobody resonance decays after the LHC Run 2 and beyond, arXiv:1907.06659.
 (9) N. Craig, P. Draper, K. Kong, Y. Ng, and D. Whiteson, The unexplored landscape of twobody resonances, Acta Phys. Polon. B50 (2019) 837, [arXiv:1610.09392].
 (10) A. J. Larkoski, I. Moult, and B. Nachman, Jet Substructure at the Large Hadron Collider: A Review of Recent Advances in Theory and Machine Learning, Phys. Reports (2019) [arXiv:1709.04464].
 (11) D. Guest, K. Cranmer, and D. Whiteson, Deep Learning and its Application to LHC Physics, Ann. Rev. Nucl. Part. Sci. 68 (2018) 161, [arXiv:1806.11484].
 (12) M. Abdughani, J. Ren, L. Wu, J. M. Yang, and J. Zhao, Supervised deep learning in high energy phenomenology: a mini review, Commun. Theor. Phys. 71 (2019) 955, [arXiv:1905.06047].
 (13) A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad, Machine learning at the energy and intensity frontiers of particle physics, Nature 560 (2018) 41.
 (14) B. Knuteson. Ph.D. thesis, University of California at Berkeley (2000).
 (15) D0 Collaboration, B. Abbott et al., Search for new physics in data at DØ using Sherlock: A quasi model independent search strategy for new physics, Phys. Rev. D62 (2000) 092004, [hepex/0006011].
 (16) D0 Collaboration, V. M. Abazov et al., A Quasi model independent search for new physics at large transverse momentum, Phys. Rev. D64 (2001) 012004, [hepex/0011067].
 (17) D0 Collaboration, B. Abbott et al., A quasimodelindependent search for new high physics at DØ, Phys. Rev. Lett. 86 (2001) 3712–3717, [hepex/0011071].
 (18) H1 Collaboration, F. D. Aaron et al., A General Search for New Phenomena at HERA, Phys. Lett. B674 (2009) 257–268, [arXiv:0901.0507].
 (19) H1 Collaboration, A. Aktas et al., A General search for new phenomena in ep scattering at HERA, Phys. Lett. B602 (2004) 14–30, [hepex/0408044].
 (20) CDF Collaboration, T. Aaltonen et al., ModelIndependent and QuasiModelIndependent Search for New Physics at CDF, Phys. Rev. D78 (2008) 012002, [arXiv:0712.1311].
 (21) CDF Collaboration, T. Aaltonen et al., ModelIndependent Global Search for New Highp(T) Physics at CDF, arXiv:0712.2534.
 (22) CDF Collaboration, T. Aaltonen et al., Global Search for New Physics with 2.0 fb at CDF, Phys. Rev. D79 (2009) 011101, [arXiv:0809.3781].
 (23) CMS Collaboration, MUSiC, a Model Unspecific Search for New Physics, in pp Collisions at , CMSPASEXO14016 (2017).
 (24) CMS Collaboration, Model Unspecific Search for New Physics in pp Collisions at TeV, CMSPASEXO10021 (2011).
 (25) ATLAS Collaboration, M. Aaboud et al., A strategy for a general search for new phenomena using dataderived signal regions and its application within the ATLAS experiment, Eur. Phys. J. C79 (2019) 120, [arXiv:1807.07447].
 (26) ATLAS Collaboration, A general search for new phenomena with the ATLAS detector in pp collisions at TeV, ATLASCONF2014006 (2014).
 (27) ATLAS Collaboration, A general search for new phenomena with the ATLAS detector in collisions at TeV, ATLASCONF2012107 (2012).
 (28) R. T. D’Agnolo and A. Wulzer, Learning New Physics from a Machine, Phys. Rev. D99 (2019) 015014, [arXiv:1806.02350].
 (29) R. T. D’Agnolo, G. Grosso, M. Pierini, A. Wulzer, and M. Zanetti, Learning Multivariate New Physics, arXiv:1912.12155.
 (30) A. De Simone and T. Jacques, Guiding New Physics Searches with Unsupervised Learning, Eur. Phys. J. C79 (2019) 289, [arXiv:1807.06038].
 (31) A. Mullin, H. Pacey, M. Parker, M. White, and S. Williams, Does SUSY have friends? A new approach for LHC event analysis, arXiv:1912.10625.
 (32) G. M. Alessandro Casa, Nonparametric semisupervised classification for signal detection in high energy physics, arXiv:1809.02977.
 (33) M. Farina, Y. Nakai, and D. Shih, Searching for New Physics with Deep Autoencoders, arXiv:1808.08992.
 (34) T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson, QCD or What?, SciPost Phys. 6 (2019) 030, [arXiv:1808.08979].
 (35) T. S. Roy and A. H. Vijay, A robust anomaly finder based on autoencoder, arXiv:1903.02032.
 (36) O. Cerri, T. Q. Nguyen, M. Pierini, M. Spiropulu, and J.R. Vlimant, Variational Autoencoders for New Physics Mining at the Large Hadron Collider, JHEP 05 (2019) 036, [arXiv:1811.10276].
 (37) A. Blance, M. Spannowsky, and P. Waite, Adversariallytrained autoencoders for robust unsupervised new physics searches, JHEP 10 (2019) 047, [arXiv:1905.10384].
 (38) J. Hajer, Y.Y. Li, T. Liu, and H. Wang, Novelty Detection Meets Collider Physics, arXiv:1807.10261.
 (39) B. M. Dillon, D. A. Faroughy, and J. F. Kamenik, Uncovering latent jet substructure, Phys. Rev. D100 (2019) 056002, [arXiv:1904.04200].
 (40) J. H. Collins, K. Howe, and B. Nachman, Anomaly Detection for Resonant New Physics with Machine Learning, Phys. Rev. Lett. 121 (2018) 241803, [arXiv:1805.02664].
 (41) J. H. Collins, K. Howe, and B. Nachman, Extending the search for new resonances with machine learning, Phys. Rev. D99 (2019) 014038, [arXiv:1902.02634].
 (42) B. Nachman and D. Shih, Anomaly Detection with Density Estimation, arXiv:2001.nnnnn.
 (43) J. A. AguilarSaavedra, J. H. Collins, and R. K. Mishra, A generic antiQCD jet tagger, JHEP 11 (2017) 163, [arXiv:1709.01087].
 (44) G. Louppe, M. Kagan, and K. Cranmer, Learning to Pivot with Adversarial Networks, arXiv:1611.01046.
 (45) J. Dolen, P. Harris, S. Marzani, S. Rappoccio, and N. Tran, Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure, JHEP 05 (2016) 156, [arXiv:1603.00027].
 (46) I. Moult, B. Nachman, and D. Neill, Convolved Substructure: Analytically Decorrelating Jet Substructure Observables, JHEP 05 (2018) 002, [arXiv:1710.06859].
 (47) J. Stevens and M. Williams, uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers, JINST 8 (2013) P12013, [arXiv:1305.7248].
 (48) C. Shimmin, P. Sadowski, P. Baldi, E. Weik, D. Whiteson, E. Goul, and A. Søgaard, Decorrelated Jet Substructure Tagging using Adversarial Neural Networks, Phys. Rev. D96 (2017) 074034, [arXiv:1703.03507].
 (49) L. Bradshaw, R. K. Mishra, A. Mitridate, and B. Ostdiek, Mass Agnostic Jet Taggers, arXiv:1908.08959.
 (50) ATLAS Collaboration, Performance of massdecorrelated jet substructure observables for hadronic twobody decay tagging in ATLAS, ATLPHYSPUB2018014 (2018).
 (51) L.G. Xia, QBDT, a new boosting decision tree method with systematical uncertainties into training for High Energy Physics, Nucl. Instrum. Meth. A930 (2019) 15–26, [arXiv:1810.08387].
 (52) C. Englert, P. Galler, P. Harris, and M. Spannowsky, Machine Learning Uncertainties with Adversarial Neural Networks, Eur. Phys. J. C79 (2019), no. 1 4, [arXiv:1807.08763].
 (53) S. Wunsch, S. Jórger, R. Wolf, and G. Quast, Reducing the dependence of the neural network function to systematic uncertainties in the input space, arXiv:1907.11674.
 (54) G. Kasieczka and D. Shih, DisCo Fever: Robust Networks Through Distance Correlation, arXiv:2001.nnnnn.
 (55) A. Andreassen and B. Nachman, Neural Networks for Full Phasespace Reweighting and Parameter Tuning, arXiv:1907.08209.
 (56) K. Cranmer, J. Pavez, and G. Louppe, Approximating Likelihood Ratios with Calibrated Discriminative Classifiers, arXiv:1506.02169.
 (57) P. Baldi, K. Cranmer, T. Faucett, P. Sadowski, and D. Whiteson, Parameterized neural networks for highenergy physics, Eur. Phys. J. C76 (2016), no. 5 235, [arXiv:1601.07913].
 (58) J. Brehmer, G. Louppe, J. Pavez, and K. Cranmer, Mining gold from implicit models to improve likelihoodfree inference, arXiv:1805.12244.
 (59) J. Brehmer, K. Cranmer, G. Louppe, and J. Pavez, A Guide to Constraining Effective Field Theories with Machine Learning, Phys. Rev. D98 (2018) 052004, [arXiv:1805.00020].
 (60) J. Brehmer, K. Cranmer, G. Louppe, and J. Pavez, Constraining Effective Field Theories with Machine Learning, Phys. Rev. Lett. 121 (2018) 111801, [arXiv:1805.00013].
 (61) A. Andreassen, P. T. Komiske, E. M. Metodiev, B. Nachman, and J. Thaler, OmniFold: A Method to Simultaneously Unfold All Observables, arXiv:1911.09107.
 (62) G. Kasieczka, B. Nachman, and D. Shih, R&D Dataset for LHC Olympics 2020 Anomaly Detection Challenge, Apr., 2019. https://doi.org/10.5281/zenodo.2629073.
 (63) T. Sjöstrand, S. Mrenna, and P. Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026, [hepph/0603175].
 (64) T. Sjöstrand, S. Mrenna, and P. Z. Skands, A Brief Introduction to PYTHIA 8.1, Comput. Phys. Commun. 178 (2008) 852, [arXiv:0710.3820].
 (65) M. Bahr et al., Herwig++ Physics and Manual, Eur. Phys. J. C58 (2008) 639–707, [arXiv:0803.0883].
 (66) DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaitre, A. Mertens, and M. Selvaggi, DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057, [arXiv:1307.6346].
 (67) A. Mertens, New features in Delphes 3, J. Phys. Conf. Ser. 608 (2015) 012045.
 (68) M. Selvaggi, DELPHES 3: A modular framework for fastsimulation of generic collider experiments, J. Phys. Conf. Ser. 523 (2014) 012033.
 (69) M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual, Eur. Phys. J. C72 (2012) 1896, [arXiv:1111.6097].
 (70) M. Cacciari and G. P. Salam, Dispelling the myth for the jetfinder, Phys. Lett. B641 (2006) 57, [hepph/0512210].
 (71) M. Cacciari, G. P. Salam, and G. Soyez, The anti jet clustering algorithm, JHEP 04 (2008) 063, [arXiv:0802.1189].
 (72) P. T. Komiske, E. M. Metodiev, and J. Thaler, Energy Flow Networks: Deep Sets for Particle Jets, JHEP 01 (2019) 121, [arXiv:1810.05165].
 (73) J. Thaler and K. Van Tilburg, Maximizing Boosted Top Identification by Minimizing Nsubjettiness, JHEP 02 (2012) 093, [arXiv:1108.2701].
 (74) J. Thaler and K. Van Tilburg, Identifying Boosted Objects with Nsubjettiness, JHEP 03 (2011) 015, [arXiv:1011.2268].
 (75) F. Chollet, “Keras.” https://github.com/fchollet/keras, 2017.
 (76) M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., Tensorflow: A system for largescale machine learning., in OSDI, vol. 16, p. 265, 2016.
 (77) D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980.
 (78) J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses, Phil. Trans. R. Soc. Lond. A 231 (1933) 289.
 (79) B. Nachman, A guide for deploying Deep Learning in LHC searches: How to achieve optimality and account for uncertainty, arXiv:1909.03081.
 (80) B. Nachman and C. Shimmin, AI Safety for High Energy Physics, arXiv:1910.08606.