1 Introduction
Despite an impressive and extensive search program from ATLAS atlasexoticstwiki ; atlassusytwiki ; atlashdbspublictwiki , CMS cmsexoticstwiki ; cmssusytwiki ; cmsb2gtwiki , and LHCb lhcbtwiki for new particles and forces of nature, there is no convincing evidence for new phenomena at the Large Hadron Collider (LHC). However, there remain compelling theoretical (e.g. naturalness) and experimental (e.g. dark matter) reasons for fundamental structure to be observable with current LHC sensitivity. The vast majority of LHC searches are designed with specific signal models motivated by one of these reasons (e.g. gluino pair production from supersymmetry) in mind, and these searches are optimized with a heavy reliance on simulations, for both the signal and the Standard Model (SM) background. Given that it is impossible to cover every model with a specially optimized search (see e.g. Kim:2019rhy ; Craig:2016rqv for comprehensive lists of currently uncovered models), and given that there are vast regions of unexplored LHC phase space, it is critical to consider extending the search program to include more modelagnostic methods.
A variety of modelagnostic approaches have been proposed to search for physics beyond the Standard Model (BSM) at colliders. These approaches are designed to be broadly sensitive to anomalies in data without focusing on specific models. Yet, they have varying degrees of both signalmodel and backgroundmodel independence, as there is often a tradeoff between the broadness of a search and how sensitive it is to particular classes of signal scenarios. Existing and proposed modelagnostic searches range from fullysignalmodel independent but fullybackground model dependent sleuth ; Abbott:2000fb ; Abbott:2000gx ; Abbott:2001ke ; Aaron:2008aa ; Aktas:2004pz ; Aaltonen:2007dg ; Aaltonen:2007ab ; Aaltonen:2008vt ; CMSPASEXO14016 ; CMSPASEXO10021 ; Aaboud:2018ufy ; ATLASCONF2014006 ; ATLASCONF2012107 ; DAgnolo:2018cun ; DAgnolo:2019vbw (because they compare data to SM simulation); to varying degrees of partial signalmodel and backgroundmodel independence Farina:2018fyg ; Heimel:2018mkt ; Roy:2019jae ; Cerri:2018anq ; Blance:2019ibf ; Hajer:2018kqm ; Collins:2018epr ; Collins:2019jip ; DeSimone:2018efk ; Mullin:2019mmh ; 1809.02977 ; Dillon:2019cqt ; AguilarSaavedra:2017rzt
. A comprehensive overview of existing modelagnostic approaches and how they are classified in terms of signal and background model independence will be given in Section
2.This paper introduces a new approach called ANOmaly detection with Density Estimation
(ANODE) that is complementary to existing methods and aims to be largely background and signal model agnostic. Density estimation, especially in high dimensions, has traditionally been a difficult problem in unsupervised machine learning. The objective of density estimation is to learn the underlying probability density from which a set of independent and identically distributed examples were drawn. In the past few years, there have been a number of breakthroughs in density estimation using neural networks and the performance of high dimensional density estimation has greatly improved. The idea of ANODE is to make use of these recent breakthroughs in order to directly estimate the probability density of the data. Assuming the signal is localized somewhere, one can attempt to use sideband methods and interpolation to estimate the probability density of the background. Then, one can use this to construct a likelihood ratio generally sensitive to new physics.
As with any search for BSM, it is not enough to have a discriminant that is sensitive to signals, one must also have a valid method of background estimation, otherwise it will be impossible to claim a discovery of new physics. The method of background estimation can further introduce possible sources of signal and background model dependence, and it is important to avail oneself of datadriven background methods in any truly modelagnostic search. This paper will explore two methods of datadriven background estimation, one based on importance sampling, and the other based on directly integrating the background density estimate obtained in the ANODE procedure.
Other neural network approaches to density estimation have been studied in high energy physics. Such methods include Generative Adversarial Networks (GANs) Goodfellow:2014upx ; deOliveira:2017pjk ; Paganini:2017hrr ; Paganini:2017dwg ; Butter:2019eyo ; Martinez:2019jlu ; Bellagente:2019uyp ; Vallecorsa:2019ked ; SHiP:2019gcl ; Carrazza:2019cnt ; Butter:2019cae ; Lin:2019htn ; DiSipio:2019imz ; Hashemi:2019fkn ; Chekalina:2018hxi ; ATLSOFTPUB2018001 ; Zhou:2018ill ; Carminati:2018khv ; Vallecorsa:2018zco ; Datta:2018mwd ; Musella:2018rdi ; Erdmann:2018kuh ; Deja:2019vcv ; Derkach:2019qfk ; Erbin:2018csv ; Erdmann:2018jxd ; Urban:2018tqv
Monk:2018zsb ; ATLSOFTPUB2018001 , physicallyinspired networks Andreassen:2018apy ; Andreassen:2019txo , and flows pmlrv37rezende15 ; Albergo:2019eim . GANs are efficient for sampling from a density and are thus promising for accelerating slow simulations, but they do not provide an explicit representation of the density itself. For this reason, ANODE is built using normalizing flows pmlrv37rezende15 and in particular the recently proposed masked autoregressive flow (MAF) NIPS2017_6828 . These methods estimate densities by using a succession of neural networks to gradually map the original data to a transformed dataset that follows a simple distribution (e.g. normal or uniform).The ANODE method is demonstrated using a simulated largeradius dijet search based on the LHC Olympics 2020 R&D dataset gregor_kasieczka_2019_2629073 . In particular, properties of hadronic jets are used as discriminating features to enhance a bump hunt in the invariant mass of pairs of jets. ANODE learns a parameterized density of the features using a sideband and this is combined with a density estimation of the same features in the signal region. The resulting likelihood ratio is able to enhance the sensitivity of a traditional bump hunt from to . There is currently no dedicated search for generic dijet signatures where each of the jets can also originate from a BSM resonance Kim:2019rhy ; AguilarSaavedra:2017zuc ; AguilarSaavedra:2019adu ; Agashe:2018leo ; Agashe:2017wss . Therefore, this particular application could be directly useful for extending the LHC physics search program. Many other applications to resonant new physics searches involving jets and other final states are also possible.
In order to benchmark the performance of ANODE, it is compared with the CWoLa hunting method Collins:2018epr ; Collins:2019jip
. The CWoLa approach is also a neural networkbased resonance search, but does not involve density estimation. Instead, CWoLa hunting uses neural networks to identify differences between signal regions and neighboring sideband regions. By turning the problem into a supervised learning task
Metodiev:2017vrx , CWoLa is able to effectively find rare resonant signals. However, CWoLa hunting has certain requirements on the independence of the discriminating features and the resonant feature. ANODE does not have this requirement and the potential for exploiting correlated features is studied by introducing correlations.This paper is organized as follows. Section 2 reviews the landscape of model independent searches at the LHC to provide context for the ANODE method. Section 3 introduces the details of the ANODE approach and provides a brief introduction to normalizing flows. The reminder of the paper illustrates ANODE through an example based on a dijet search using jet substructure. Details of the simulated samples are provided in Sec. 4 and the results for the signal sensitivity and background specificity are presented in Sec. 5.1 and 5.2, respectively. A study of correlations between the discriminating features and the resonant feature is in Sec. 5.3. The paper ends with conclusions and outlook in Sec. 6.
2 An Overview of Model (In)dependent Searches
A viable search for new physics generally must have two essential components: it must be sensitive to new phenomena and it must also be able to estimate the background under the null hypothesis (Standard Model only). The categorization of a search’s degree of model (in)dependence requires consideration of both of these components. Figure
1 illustrates how to characterize model independence for both BSM sensitivity and SM background specificity. We will now consider each in turn.2.1 BSM sensitivity
For BSM sensitivity, the various types of searches are categorized as follows:

Almost all searches at the LHC are optimized (with or without machine learning) using simulations of both the SM and particular signal models. This is represented as the lowerleft corner of Fig. 1(a).

A handful of searches use signal simulation and unlabeled data to optimize the event selection. These are background model agnostic and are depicted in the upperleft corner of Fig. 1(a). For example, this was used in the channel of the recent observation, using events with inverted selection criteria to define the background data sample for optimization Aaboud:2018urx ; Sirunyan:2018ouh .

A series of signal model agnostic, but background modeldependent searches have been performed by D0 sleuth ; Abbott:2000fb ; Abbott:2000gx ; Abbott:2001ke , H1 Aaron:2008aa ; Aktas:2004pz , CDF Aaltonen:2007dg ; Aaltonen:2007ab ; Aaltonen:2008vt , CMS CMSPASEXO14016 ; CMSPASEXO10021 , and ATLAS Aaboud:2018ufy ; ATLASCONF2014006 ; ATLASCONF2012107 . All of these searches share essentially the same approach: they compared histograms of data to histograms of SM simulations and looked for discrepancies. Such searches are represented in the lowerright part of Fig. 1
(a). Recently, there have been proposals to extend these searches with deep learning
DAgnolo:2018cun ; DAgnolo:2019vbw . 
More recently, a variety of approaches have been proposed, often relying on sophisticated deep learning techniques, that attempt to be both signal and background model agnostic, to varying degrees. These include approaches based on autoencoders Farina:2018fyg ; Heimel:2018mkt ; Roy:2019jae ; Cerri:2018anq ; Blance:2019ibf ; Hajer:2018kqm , weak supervision Collins:2018epr ; Collins:2019jip , nearest neighbor algorithms DeSimone:2018efk ; Mullin:2019mmh ; 1809.02977 , probabilistic modeling Dillon:2019cqt , and others AguilarSaavedra:2017rzt . These are indicated in the upperright corner of Fig. 1(a).
In the upperright corner of Fig. 1(a), we have also attempted to illustrate in finer detail the differences between some recent modelagnostic approaches. For example, the autoencoder is in the farthest corner since it assumes almost nothing about the signal or the background but can be run directly on the data, as long as the signal is sufficiently rare Farina:2018fyg ; Heimel:2018mkt . The tradeoff is that there is no optimality guarantee for the autoencoder – any signals that it does find will be found in a rather uncontrolled manner. Meanwhile, CWoLa hunting Collins:2018epr ; Collins:2019jip is somewhat more signal and background modeldependent than autoencoders, since this approach assumes that the signal is localized in a particular feature, and that there is an uncorrelated set of additional features on which one can train a classifier to distinguish signal region and sideband. In return, one obtains a guarantee of asymptotic optimality – the classifier approaches the likelihood ratio neyman1933ix in the limit of infinite statistics.
The ANODE method introduced in this paper complements the other recently proposed techniques and is asymptotically optimal. To do this, ANODE estimates the density of the backgroundonly scenario using sidebands and compares that with the density estimated in a signalsensitive region (details are in Sec. 3). Like the CWoLa hunting method, the new approach is broadly sensitive to resonant new physics and thus it is placed in the upper right part of Fig. 1(a). The reason that ANODE is further right and above of CWoLa hunting is that it is less sensitive to correlations, a feature that is discussed more below.
2.2 Background estimation
A variety of methods are commonly used for background estimation and are highlighted in Fig. 1(b). Generally, background estimation is less dependent on the signal model than achieving signal sensitivity and therefore the axis range of Fig. 1(b) is more compressed than Fig. 1(a).

In some cases, the simulation is used to directly estimate the background. This is often the case for wellunderstood backgrounds such as electroweak phenomena or very rare processes that are difficult to constrain with data.

Most searches use data in some way to constrain the background prediction. One common approach is the control region method, where a search is complemented by an auxiliary measurement to constrain the simulation. Knowledge of the signal is used to ensure that the auxiliary measurement is not biased by the presence of signal.

The two most common methods for background estimates that do not directly use simulation are the ABCD method and the sideband method (bump hunt). The ABCD method operates by identifying two independent features, each which is sensitive to the presence of signal. Four regions, labeled A,B,C, and D are constructed by (anti)requiring a threshold on the two features. The background rate in the most signal sensitive region is estimated from the other three regions. Background simulations are required to verify independence of the two features.

Finally, the sideband fit only requires that the background be smooth in the region of a potential signal so that a parametric (or not Frate:2017mai ) function can be fit to sidebands and interpolated. However, this method only works for resonant new physics.
While strategies from Fig. 1(a) can often be matched with any approach in Fig. 1(b), there is often one combination that is used in practice. Table 1 provides examples of various searches and the background estimation technique that typically is associated with that search. Searches with a complex background may use multiple background estimation procedures.
ANODE can be combined with any background estimation technique, but it can also be used directly since the background density is already estimated to construct a signalsensitive classifier. Even though directly providing an accurate background estimation puts stringent requirements on the accuracy of the density estimation, it also reduces the need for a full decorrelation between classification features and the resonant feature. A variety of decorrelation techniques exist Louppe:2016ylz ; Dolen:2016kst ; Moult:2017okx ; Stevens:2013dya ; Shimmin:2017mfk ; Bradshaw:2019ipy ; ATLPHYSPUB2018014 ; DiscoFever ; Xia:2018kgd ; Englert:2018cfo ; Wunsch:2019qbo , but ultimately decorrelating removes information available for classification.
Search  Typical Background Strategy  Recent Examples 

MUSiC & the General Search  Pure MC Prediction  Aaboud:2018ufy ; CMSPASEXO14016 
Pure electroweak processes  Pure MC Prediction  Aaboud:2017rel 
SUSY with top quarks & bosons  Control Region Method  Aaboud:2017aeu ; CMSPASSUS19009 
Allhadronic searches  ABCD Method  Aaboud:2017hdf ; Sirunyan:2018rlj 
Longlived particle searches  ABCD Method  Aaboud:2018aqj ; Sirunyan:2018vlw 
BSM resonance searches  Sideband Method  Aad:2019hjw ; Sirunyan:2019vgj 
CWoLa hunting  Sideband Method  Collins:2018epr ; Collins:2019jip 
ANODE  Sideband or Direct Density  This paper 
3 The ANODE Method
This section will describe the ANODE proposal for an unsupervised method to search for resonant new physics using density estimation.
Let be a feature in which a signal (if it exists) is known to be localized around some . The value of will be scanned for broad sensitivity and the following procedure will be repeated for each window in . It is often the case that the width of the signal in is fixed by detector properties and is signal model independent. A region is called the signal region (SR) and is defined as the sideband region (SB). A traditional, unsupervised, modelagnostic search is to perform a bump hunt in , using the SB to interpolate into the SR in order to estimate the background.
Let be some additional discriminating features in which the signal density is different than the background density. If we could find the region(s) where the signal differs from the background and then cut on to select these regions, we could improve the sensitivity of the original bump hunt in . The goal of ANODE is to accomplish this in an unsupervised and modelagnostic way, via density estimation in the feature space .
More specifically, ANODE attempts to learn two densities: and for . Then, classification is performed with the likelihood ratio
(1) 
In the ideal case that for and , Eq. 1
is the optimal test statistic for identifying the presence of signal. In the absence of signal,
, so as long as , this leads to a zerobackground search.In practice, both and are approximations and so is not unity in the absence of signal. The densities are estimated using conditional neural density estimation as described in Sec. 3.1. The function is estimated in the signal region and the function is estimated using the sideband region and then interpolated into the signal region. The interpolation is done automatically by the neural conditional density estimator. Effective density estimation will result in in the SR that is localized near unity and then one can enhance the presence of signal by applying a threshold , for . The interpolated can then also be used to estimate the background, as described in Sec. 3.2.
3.1 Neural Density Estimation
The ANODE procedure as described in the previous subsection is completely general with regards to the method of density estimation. In this work we will demonstrate a proofofconcept using normalizing flow models for density estimation. Since normalizing flows were proposed in Ref. pmlrv37rezende15 , they have generated much activity and excitement in the machine learning community, achieving stateoftheart performance on a variety of benchmark density estimation tasks.
The core idea behind a normalizing flow is to apply a change of variables from a random variable with a simple density (e.g. Gaussian or uniform) to one with a complex density that matches some training dataset. The transformation from one density describing random variable
to another density describing random variable follows the usual change of variables formula using the Jacobian:(2) 
where and are realizations of and , respectively, and have the same dimension, and is an invertible function. The process in Eq. 2 can be repeated to build a normalizing flow:
(3) 
where , , and . The first neural density estimation with normalizing flows had the following form for :
(4) 
where is an elementwise nonlinearity and are trainable parameters. The benefit of Eq. 4
is that the Jacobian evaluation is simple from the chain rule. Since the first development of normalizing flows, there has been significant development in extending their expressivity. One innovation is to combine flows with autoregressive density estimation
NIPS2016_6581 . An autoregressive flow JMLR:v17:16272 modifies the change of variables so that for , , where the indices denote the dimension of and for . Any that satisfies this condition is amenable to neural density estimation because the Jacobian determinant evaluation is simple. In particular, the Jacobian is upper triangular and therefore the determinant is the product of the diagonal elements: . ANODE is built on a masked autoregressive flow (MAF) NIPS2017_6828 . For a MAF,(5) 
where and are arbitrary functions and for arbitrary numbers , . As in Eq. 3, this procedure is repeated multiple times to build a deep autoregressive flow. The Masking in MAF comes from its use of MADE pmlrv37germain15 to evaluate and for all in one forward pass. This approach eliminates the need for the recursion in Eq. 5. MAF is nearly the same as inverse autoregressive flows (IAF) NIPS2016_6581 , which also use Gaussian autoregressions and are built on MADE. The main difference is that MAF is very efficient for density estimation and slow for sampling while IAF is slow for density estimation and fast for sampling. As ANODE only needs to estimate the density without producing new samples, MAF is selected as the method of choice.
The estimation of for ANODE requires that the MAF provides a conditional density. This can be accomplished by adding as an input to all functions and .
3.2 Estimating the Background
An anomaly detection technique is only useful for finding new particles if the Standard Model background can be estimated. As mentioned earlier, one benefit of the direct density estimation in ANODE is that the background can be directly estimated with . This results in multiple possibilities for background estimation that are considered in this work:

Direct density estimation. These methods use the interpolated to directly compute the efficiency of the background after a threshold requirement on .

Density integration. Another approach is to directly integrate for events with :
(6) 
Importance sampling. Analytically integrating a function in high dimensions is impractical, so one can estimate the integral with importance sampling. An effective method to implement this sampling is make the following observation:
(7) The last line in Eq. 7 can be estimated by computing the fraction of events in the SR (representing the full distribution) with and then weighting each event in the counting by .

Sideband in . As long as the requirement does not sculpt a localized feature in , one can estimate the background prediction by performing a fit in the spectrum from the SB and interpolating to the SR. This is a standard approach, as discussed in Sec. 1.
Further details about background estimation are presented in Sec. 5.2 for the numerical example described in the next section.
3.3 Comparison with the CWoLa hunting method
The CWoLa hunting method Collins:2018epr ; Collins:2019jip is a recentlyproposed modelagnostic sideband method that also uses machine learning and will serve as a benchmark for ANODE. In the CWoLa hunting approach, the signal sensitivity is achieved by training a classifier to distinguish the SR from the SB. This classifier will approach the likelihood ratio , which is optimal under certain conditions:
(8) 
where the second equality is true in the absence of signal in the sideband^{1}^{1}1This is not strictly necessary  the classifier can still be optimal even if there is some signal in the sideband Metodiev:2017vrx . and the third equality is true when and are independent. The background is estimated using a sideband fit after placing a selection based on the above classifier.
A key assumption of the CWoLa method is that and are independent. This condition is stronger than the requirement for the background fit, but is necessary for achieving signal sensitivity. In particular, in the presence of a dependence between and , the CWoLa classifier will learn the true differences between SB and SR. If these differences are larger than the difference between signal and background in the SR, the CWoLa classifier may not succeed in finding the signal.
In contrast, the ANODE method does not require any particular relationship between and to achieve signal sensitivity. In fact, the information about could be fully contained within , and ANODE could still succeed in principle. Therefore, ANODE can make use of features which are strongly correlated with , thus extending the potential sensitivity to new signals. This is possible because of the two step density estimation, interpolating from the sideband and then estimating from the SR. Such an approach is not possible with CWoLa hunting, which directly learns the likelihood ratio. The only requirement for ANODE is that there are no nontrivial features in the SR that cannot be smoothly predicted from the SB. Section 5.3 illustrates the ability of ANODE to cope with correlated features.
4 Details of the Sample
A simulated resonance search using largeradius dijets is used to illustrate ANODE. The simulated datasets are from the LHC Olympics 2020 challenge research and development dataset gregor_kasieczka_2019_2629073 . For a background process, one million quantum chromodynamic (QCD) dijet events are simulated with Pythia 8 Sjostrand:2006za ; Sjostrand:2007gs without pileup or multiple parton interactions. The signal is a hypothetical boson ( TeV) that decays into an boson ( GeV) and a boson ( GeV), with the same simulation setup as the QCD dijets. The and bosons decay promptly into quarks and due to their large Lorentz boost in the lab frame, the resulting hadronic decay products are captured by a single largeradius jet. The detector simulation is performed with Delphes 3.4.1 deFavereau:2013fsa ; Mertens:2015kba ; Selvaggi:2014mya and particle flow objects are clustered into jets using the Fastjet Cacciari:2011ma ; Cacciari:2005hq implementation of the anti algorithm Cacciari:2008gp using as the jet radius. Events are selected by requiring at least one such jet with TeV. While there exist LHC searches for the case that and are electroweak bosons Aad:2019fbh ; Sirunyan:2019jbg , the generic case is currently uncovered by a dedicated search.
The resonant feature will be the invariant mass of the leading two jets, . These two jets are ordered by their mass so that by construction, . The discriminating features are fourdimensional, consisting of the observables:
(9) 
where is the nsubjettiness ratio Thaler:2011gf ; Thaler:2010tr . This observable is the most widely used single feature for identifying jets with a twoprong substructure. While the ultimate goal of ANODE is to perform density estimation on highdimensional, lowlevel features, there is already utility in a search with highlevel features from Eq. 9. Thus to demonstrate how ANODE works, this will be the focus for the rest of this paper.
Simulated data are constructed by injecting 1000 signal events to the full background sample. A histogram of is presented in Fig. 2. As expected, the signal peaks near . The signal region is defined by TeV and then the sideband is the rest of the spectrum. The simulated data are divided into two equal samples for training and testing; thus we have background and signal events each sample. In the SR, we are left with background and signal events in each sample. This corresponds to and in the SR. This value of would be the approximate significance from a sideband fit (ignoring the fit errors). Section 5.1 will show how much this can be enhanced from ANODE.
The additional four features for classification are shown in Fig. 3. The lighter jet mass peaks near and the difference between masses peaks at about GeV. The observables are lower for the twoprong signal jets than for the mostly oneprong background jets. Jet mass and are negatively correlated for QCD jets Dolen:2016kst and so is higher for than for .
The conditional MAF (along with most methods of density estimation) has difficulty at sharp, discontinuous edges and boundaries, so we first transform the dataset before performing density estimation. First, all features are linearly scaled to be
. Then, the logit transformation
is applied to map the scaled features to be between . The Jacobian for this map is accounted for when computing probability densities for the original feature space. Even with this transformation, density estimation is difficult near the boundaries. Therefore, the scaled features are required to have . This keeps 95% (72%) of the signal (background) in the SR. Below we will refer to this as the “fiducial region.” All results below are computed with respect to the number of events after this truncation.5 Results
5.1 Sensitivity
The conditional MAF is optimized^{2}^{2}2Based on code from https://github.com/ikostrikov/pytorchflows.
using the log likelihood loss function,
. All of the neural networks are trained with PyTorch
NEURIPS2019_9015. For the hyperparameters, there are 15 MADE blocks (one layer each) with 128 hidden units per block. Networks are optimized with Adam
adam using a learning rate and weight decay of. The SR and SB density estimators are each trained for 50 epochs. No systematic attempt was made to optimize these hyperparameters and it is likely that better performance could be obtained with further optimization. For the SR density estimator, the last epoch is chosen for simplicity and it was verified that the results are robust against this choice. The SB density estimator significantly varies from epoch to epoch. Averaging the density estimates pointwise over 10 consecutive epochs results in a stable result. Averaging over more epochs does not further improve the stability. All results with ANODE present the SB density estimator with this averaging scheme for the last 10 epochs.
Figure 4 shows a scatter plot of versus for the test set in the SR. As desired, the background is mostly concentrated around , while there is a long tail for signal events at higher values of and between . This is exactly what is expected for this signal: it is an overdensity () in a region of phase space that is relatively rare for the background ().
The background density in Fig. 4 also shows that the is narrower around when is large and more spread out when . This is evidence that the density estimation is more accurate when the densities are high and worse when the densities are low. This is also to be expected: if there are many data points close to one another, it should be easier to estimate their density than if the data points are very sparse.
Another view of the results is presented in Fig. 5, with onedimensional information about in the SR. The left plot of Fig. 5 shows that the background is centered and approximately symmetric around
with a standard deviation of approximately 17%. This width is due to various sources, including the accuracy of the SR density, the accuracy of the SB density, and the quality of the interpolation from SB to SR. Each of these sources has contributions from the finite size of the datasets used for training, the neural network flexibility, and the training procedure. The right plot of Fig.
5 presents the number of background and signal events as a function of a threshold . The starting point are the original numbers background (40,000) and signal (400) numbers in the SR window and the fiducial window. Starting from low and one can achieve and a high with a threshold requirement on . Figure 6 shows that the signal is clearly visible in the distribution after applying such a threshold requirement.The performance of as an anomaly detector is further quantified by the Receiver Operating Characteristic (ROC) and Significance Improvement Characteristic (SIC) curves in Fig. 7. These metrics are obtained by scanning and computing the signal efficiency (true positive rate) and background efficiency (false positive rate) after a threshold requirement on
. The Area Under the Curve (AUC) for ANODE is 0.82. For comparison, the CWoLa hunting approach is also shown in the same plots. The CWoLa classifier is trained using sideband regions that are 200 GeV wide on either side of the SR. The sidebands are weighted to have the same number of events as each other and in total, the same as the SR. A single NN with four hidden layers with 64 notes each is trained using Keras
kerasand TensorFlow
tensorflow . Dropout JMLR:v15:srivastava14aof 10% is used for each intermediate layer. Intermediate layers use rectified linear unit activation functions and the last layer uses a sigmoid. The classifier is optimized using binary cross entropy and is trained for 300 epochs. As with ANODE, 10 epochs are averaged for the reported results
^{3}^{3}3A different regularization procedure was used in Ref. Collins:2018epr ; Collins:2019jip based on the validation loss and folding. The averaging here is expected to serve a similar purpose..The performance of ANODE is comparable to CWoLa hunting in Fig. 7, which does slightly better at higher signal efficiencies and much better at lower signal efficiencies. This may be a reflection of the fact that CWoLa makes use of supervised learning and directly approaches the likelihood ratio, while ANODE is unsupervised and attempts to learn both the numerator and denominator of the likelihood ratio. With this dataset, ANODE is able to enhance the signal significance by about a factor of 7 and would therefore be able to achieve a local significance above given that the starting value of is 1.6.
5.2 Background Estimation
This section explores the possibility of using the estimate of to directly determine the background efficiency in the SR after a requirement on . Figure 8 presents a comparison between integration methods (direct integration and importance sampling) described in Sec. 3.2 and the true background yields. Qualitatively, both methods are able to characterize the yield across several orders of magnitude in background efficiency. However, both methods diverge from the truth in the extreme tails of the distribution. The right plot of Fig. 8 offers a quantitative comparison between methods. For efficiencies down to about , both methods are accurate within about 25%. The direct integration method has a smaller bias of about 10%. This is consistent with Fig. 5, for which the standard deviation is between 1020%.
5.3 Performance on a Dataset with Correlated Features
The results presented in the previous sections have established that ANODE is able to identify the signal and estimate the corresponding SM backgrounds introduced in Sec. 4. One fortuitous aspect of the chosen features introduced in Sec. 4 is that they are all relatively independent of . This is illustrated in Fig. 9, using the SR and neighboring sideband regions. As a result of this independence, the CWoLa method is able to find the signal and presumably the ANODE interpolation from SB to SR is easier than if there was a strong dependence.
The purpose of this section is to study the sensitivity of the ANODE and CWoLa hunting methods to correlations in the features with . Based on the assumptions of the two methods, it is expected that with strong correlations, CWoLa hunting will fail to find the signal while ANODE should still be able to identify the presence of signal in the SR as well as estimate the background. To study this sensitivity in a controlled fashion, correlations are introduced artificially. In practice, adding more features to will inevitably result in some dependence with ; the artificial example here illustrates the challenges already in low dimensions. New jet mass observables are created, which are linearly shifted:
(10) 
where for this study. The resulting shifted lighter jet mass is presented in Fig. 10.
New ANODE and CWoLa models are trained using the shifted dataset and their performance is quantified in Fig. 11. As expected, the fully supervised classifier is nearly the same as Fig. 7. ANODE is still able to significantly enhance the signal, with a maximum significance improvement near 4. While in principle ANODE could achieve the same classification accuracy on the shifted and nominal datasets, the performance on the shifted examples is not as strong as in Fig. 7. In practice the interpolation of into the SR is more challenging now due to the linear correlations. This could possibly be overcome with improved training, better choices of hyperparameters, or more sophisticated density estimation techniques.
By construction, there are now bigger differences between the SR and SB than between the SR background and the SR signal. Therefore, the CWoLa hunting classifier is not able to find the signal. This is evident from the ROC curve in the left plot of Fig. 11, which shows that the signalversusbackground classifier is essentially random while the SRversusSB classifier has learned something nontrivial.
Lastly, Fig. 12 shows the performance of direct density estimation for the background prediction using the shifted dataset. The performance is comparable to the unshifted dataset (Fig. 8), meaning that ANODE could potentially be used as a complete anomaly detection method even in the presence of correlated feature spaces.
6 Conclusions
This paper has presented a powerful new modelindependent search method called ANOmaly detection with Density Estimation (ANODE), which is built on neural density estimation. Unlike other approaches, ANODE directly learns the background probability density and data probability density in a signal region. The ratio of these densities is a powerful classifier and the background density can be directly used to estimate the background efficiency from a threshold requirement on the classifier. Finally, ANODE is robust against correlations in the data, which tend to break other modelagnostic sideband methods such as CWoLa.
The results presented in this paper are meant to be a proof of concept of the general method, and there are many exciting future directions. For example, while this paper focused on collider searches for BSM, the ANODE method is completely general and could be applied to many areas beyond high energy physics, including astronomy and astrophysics. Similarly, while the demonstrations here were based on the innovative MAF density estimation technique, the ANODE method can be used in conjunction with any density estimation algorithm. Indeed, there are numerous other neural density estimation methods from the past few years that claim stateoftheart performance, including Neural Autoregressive Flows DBLP:journals/corr/abs180400779 and Neural Spline Flows durkan2019neural
; exploring these would be an obvious way to attempt to improve the results in this paper. In addition, it would be interesting to attempt the ANODE method on even higherdimensional feature spaces, all the way up to the full lowlevel feature set of the four vectors of all the hadrons in the event. The prospects for the ANODE method are exciting: as the field of neural density estimation continues to grow within the machine learning community, ANODE will become more sensitive to resonant new physics in collider high energy physics and beyond.
Acknowledgements.
DS is grateful to Matt Buckley and John Tamanas for many fruitful discussions on neural density estimation. We are especially grateful to John Tamanas for help with the conditional MAF code. Additionally, we would like to thank Uroš Seljak for helpful discussions and Nick Rodd for helpful comments on the draft. This work was supported by the U.S. Department of Energy, Office of Science under contract DEAC0205CH11231. DS is supported by DOE grant DOESC0010008. DS thanks LBNL, BCTP and BCCP for their generous support and hospitality during his sabbatical year.References
 (1) ATLAS Collaboration, Exotic Physics Searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ExoticsPublicResults.
 (2) ATLAS Collaboration, Supersymmetry searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/SupersymmetryPublicResults.
 (3) ATLAS Collaboration, Higgs and Diboson Searches, 2019. https://twiki.cern.ch/twiki/bin/view/AtlasPublic/HDBSPublicResults.
 (4) CMS Collaboration, CMS Exotica Public Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsEXO.
 (5) CMS Collaboration, CMS Supersymmetry Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsSUS.
 (6) CMS Collaboration, CMS Beyondtwogenerations (B2G) Public Physics Results, 2019. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsB2G.
 (7) LHCb Collaboration, Publications of the QCD, Electroweak and Exotica Working Group, 2019. http://lhcbproject.web.cern.ch/lhcbproject/Publications/LHCbProjectPublic/Summary_QEE.html.
 (8) J. H. Kim, K. Kong, B. Nachman, and D. Whiteson, The motivation and status of twobody resonance decays after the LHC Run 2 and beyond, arXiv:1907.06659.
 (9) N. Craig, P. Draper, K. Kong, Y. Ng, and D. Whiteson, The unexplored landscape of twobody resonances, Acta Phys. Polon. B50 (2019) 837, [arXiv:1610.09392].
 (10) B. Knuteson. Ph.D. thesis, University of California at Berkeley (2000).
 (11) D0 Collaboration, B. Abbott et al., Search for new physics in data at DØ using Sherlock: A quasi model independent search strategy for new physics, Phys. Rev. D62 (2000) 092004, [hepex/0006011].
 (12) D0 Collaboration, V. M. Abazov et al., A Quasi model independent search for new physics at large transverse momentum, Phys. Rev. D64 (2001) 012004, [hepex/0011067].
 (13) D0 Collaboration, B. Abbott et al., A quasimodelindependent search for new high physics at DØ, Phys. Rev. Lett. 86 (2001) 3712–3717, [hepex/0011071].
 (14) H1 Collaboration, F. D. Aaron et al., A General Search for New Phenomena at HERA, Phys. Lett. B674 (2009) 257–268, [arXiv:0901.0507].
 (15) H1 Collaboration, A. Aktas et al., A General search for new phenomena in ep scattering at HERA, Phys. Lett. B602 (2004) 14–30, [hepex/0408044].
 (16) CDF Collaboration, T. Aaltonen et al., ModelIndependent and QuasiModelIndependent Search for New Physics at CDF, Phys. Rev. D78 (2008) 012002, [arXiv:0712.1311].
 (17) CDF Collaboration, T. Aaltonen et al., ModelIndependent Global Search for New Highp(T) Physics at CDF, arXiv:0712.2534.
 (18) CDF Collaboration, T. Aaltonen et al., Global Search for New Physics with 2.0 fb at CDF, Phys. Rev. D79 (2009) 011101, [arXiv:0809.3781].
 (19) CMS Collaboration, MUSiC, a Model Unspecific Search for New Physics, in pp Collisions at , CMSPASEXO14016 (2017).
 (20) CMS Collaboration, Model Unspecific Search for New Physics in pp Collisions at TeV, CMSPASEXO10021 (2011).
 (21) ATLAS Collaboration, M. Aaboud et al., A strategy for a general search for new phenomena using dataderived signal regions and its application within the ATLAS experiment, Eur. Phys. J. C79 (2019) 120, [arXiv:1807.07447].
 (22) ATLAS Collaboration, A general search for new phenomena with the ATLAS detector in pp collisions at TeV, ATLASCONF2014006 (2014).
 (23) ATLAS Collaboration, A general search for new phenomena with the ATLAS detector in collisions at TeV, ATLASCONF2012107 (2012).
 (24) R. T. D’Agnolo and A. Wulzer, Learning New Physics from a Machine, Phys. Rev. D99 (2019), no. 1 015014, [arXiv:1806.02350].
 (25) R. T. D’Agnolo, G. Grosso, M. Pierini, A. Wulzer, and M. Zanetti, Learning Multivariate New Physics, arXiv:1912.12155.
 (26) M. Farina, Y. Nakai, and D. Shih, Searching for New Physics with Deep Autoencoders, arXiv:1808.08992.
 (27) T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson, QCD or What?, SciPost Phys. 6 (2019), no. 3 030, [arXiv:1808.08979].
 (28) T. S. Roy and A. H. Vijay, A robust anomaly finder based on autoencoder, arXiv:1903.02032.
 (29) O. Cerri, T. Q. Nguyen, M. Pierini, M. Spiropulu, and J.R. Vlimant, Variational Autoencoders for New Physics Mining at the Large Hadron Collider, JHEP 05 (2019) 036, [arXiv:1811.10276].
 (30) A. Blance, M. Spannowsky, and P. Waite, Adversariallytrained autoencoders for robust unsupervised new physics searches, JHEP 10 (2019) 047, [arXiv:1905.10384].
 (31) J. Hajer, Y.Y. Li, T. Liu, and H. Wang, Novelty Detection Meets Collider Physics, arXiv:1807.10261.
 (32) J. H. Collins, K. Howe, and B. Nachman, Anomaly Detection for Resonant New Physics with Machine Learning, Phys. Rev. Lett. 121 (2018), no. 24 241803, [arXiv:1805.02664].
 (33) J. H. Collins, K. Howe, and B. Nachman, Extending the search for new resonances with machine learning, Phys. Rev. D99 (2019), no. 1 014038, [arXiv:1902.02634].
 (34) A. De Simone and T. Jacques, Guiding New Physics Searches with Unsupervised Learning, Eur. Phys. J. C79 (2019), no. 4 289, [arXiv:1807.06038].
 (35) A. Mullin, H. Pacey, M. Parker, M. White, and S. Williams, Does SUSY have friends? A new approach for LHC event analysis, arXiv:1912.10625.
 (36) G. M. Alessandro Casa, Nonparametric semisupervised classification for signal detection in high energy physics, arXiv:1809.02977.
 (37) B. M. Dillon, D. A. Faroughy, and J. F. Kamenik, Uncovering latent jet substructure, Phys. Rev. D100 (2019), no. 5 056002, [arXiv:1904.04200].
 (38) J. A. AguilarSaavedra, J. H. Collins, and R. K. Mishra, A generic antiQCD jet tagger, JHEP 11 (2017) 163, [arXiv:1709.01087].
 (39) I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, Generative Adversarial Networks, arXiv:1406.2661.
 (40) L. de Oliveira, M. Paganini, and B. Nachman, Learning Particle Physics by Example: LocationAware Generative Adversarial Networks for Physics Synthesis, Comput. Softw. Big Sci. 1 (2017), no. 1 4, [arXiv:1701.05927].
 (41) M. Paganini, L. de Oliveira, and B. Nachman, Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multilayer Calorimeters, Phys. Rev. Lett. 120 (2018), no. 4 042003, [arXiv:1705.02355].
 (42) M. Paganini, L. de Oliveira, and B. Nachman, CaloGAN : Simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks, Phys. Rev. D97 (2018), no. 1 014021, [arXiv:1712.10321].
 (43) A. Butter, T. Plehn, and R. Winterhalder, How to GAN Event Subtraction, arXiv:1912.08824.
 (44) J. Arjona Martinez, T. Q. Nguyen, M. Pierini, M. Spiropulu, and J.R. Vlimant, Particle Generative Adversarial Networks for fullevent simulation at the LHC and their application to pileup description, in 19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: Empowering the revolution: Bringing Machine Learning to High Performance Computing (ACAT 2019) SaasFee, Switzerland, March 1115, 2019, 2019. arXiv:1912.02748.
 (45) M. Bellagente, A. Butter, G. Kasieczka, T. Plehn, and R. Winterhalder, How to GAN away Detector Effects, arXiv:1912.00477.
 (46) S. Vallecorsa, F. Carminati, and G. Khattak, 3D convolutional GAN for fast simulation, EPJ Web Conf. 214 (2019) 02010.
 (47) SHiP Collaboration, C. Ahdida et al., Fast simulation of muons produced at the SHiP experiment using Generative Adversarial Networks, arXiv:1909.04451.
 (48) S. Carrazza and F. A. Dreyer, Lund jet images from generative and cycleconsistent adversarial networks, Eur. Phys. J. C79 (2019), no. 11 979, [arXiv:1909.01359].
 (49) A. Butter, T. Plehn, and R. Winterhalder, How to GAN LHC Events, SciPost Phys. 7 (2019) 075, [arXiv:1907.03764].
 (50) J. Lin, W. Bhimji, and B. Nachman, Machine Learning Templates for QCD Factorization in the Search for Physics Beyond the Standard Model, JHEP 05 (2019) 181, [arXiv:1903.02556].
 (51) R. Di Sipio, M. Faucci Giannelli, S. Ketabchi Haghighat, and S. Palazzo, DijetGAN: A GenerativeAdversarial Network Approach for the Simulation of QCD Dijet Events at the LHC, JHEP 08 (2020) 110, [arXiv:1903.02433].
 (52) B. Hashemi, N. Amin, K. Datta, D. Olivito, and M. Pierini, LHC analysisspecific datasets with Generative Adversarial Networks, arXiv:1901.05282.
 (53) V. Chekalina, E. Orlova, F. Ratnikov, D. Ulyanov, A. Ustyuzhanin, and E. Zakharov, Generative Models for Fast Calorimeter Simulation: the LHCb case, EPJ Web Conf. 214 (2019) 02034, [arXiv:1812.01319].
 (54) ATLAS Collaboration, Deep generative models for fast shower simulation in ATLAS, ATLSOFTPUB2018001 (Jul, 2018).
 (55) K. Zhou, G. Endrodi, L.G. Pang, and H. Stocker, Regressive and generative neural networks for scalar field theory, Phys. Rev. D100 (2019), no. 1 011501, [arXiv:1810.12879].
 (56) F. Carminati, A. Gheata, G. Khattak, P. Mendez Lorenzo, S. Sharan, and S. Vallecorsa, Three dimensional Generative Adversarial Networks for fast simulation, J. Phys. Conf. Ser. 1085 (2018), no. 3 032016.
 (57) S. Vallecorsa, Generative models for fast simulation, J. Phys. Conf. Ser. 1085 (2018), no. 2 022005.
 (58) K. Datta, D. Kar, and D. Roy, Unfolding with Generative Adversarial Networks, arXiv:1806.00433.
 (59) P. Musella and F. Pandolfi, Fast and Accurate Simulation of Particle Detectors Using Generative Adversarial Networks, Comput. Softw. Big Sci. 2 (2018), no. 1 8, [arXiv:1805.00850].
 (60) M. Erdmann, L. Geiger, J. Glombitza, and D. Schmidt, Generating and refining particle detector simulations using the Wasserstein distance in adversarial networks, Comput. Softw. Big Sci. 2 (2018), no. 1 4, [arXiv:1802.03325].
 (61) K. Deja, T. Trzcinski, and u. Graczykowski, Generative models for fast cluster simulations in the TPC for the ALICE experiment, EPJ Web Conf. 214 (2019) 06003.
 (62) D. Derkach, N. Kazeev, F. Ratnikov, A. Ustyuzhanin, and A. Volokhova, Cherenkov Detectors Fast Simulation Using Neural Networks, in 10th International Workshop on Ring Imaging Cherenkov Detectors (RICH 2018) Moscow, Russia, July 29August 4, 2018, 2019. arXiv:1903.11788.
 (63) H. Erbin and S. Krippendorf, GANs for generating EFT models, arXiv:1809.02612.
 (64) M. Erdmann, J. Glombitza, and T. Quast, Precise simulation of electromagnetic calorimeter showers using a Wasserstein Generative Adversarial Network, Comput. Softw. Big Sci. 3 (2019), no. 1 4, [arXiv:1807.01954].
 (65) J. M. Urban and J. M. Pawlowski, Reducing Autocorrelation Times in Lattice Simulations with Generative Adversarial Networks, arXiv:1811.03533.
 (66) J. W. Monk, Deep Learning as a Parton Shower, JHEP 12 (2018) 021, [arXiv:1807.03685].
 (67) A. Andreassen, I. Feige, C. Frye, and M. D. Schwartz, JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics, Eur. Phys. J. C79 (2019), no. 2 102, [arXiv:1804.09720].
 (68) A. Andreassen, I. Feige, C. Frye, and M. D. Schwartz, Binary JUNIPR: an interpretable probabilistic model for discrimination, Phys. Rev. Lett. 123 (2019), no. 18 182001, [arXiv:1906.10137].
 (69) D. Rezende and S. Mohamed, Variational inference with normalizing flows, in Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.), vol. 37 of Proceedings of Machine Learning Research, (Lille, France), pp. 1530–1538, PMLR, 07–09 Jul, 2015.
 (70) M. S. Albergo, G. Kanwar, and P. E. Shanahan, Flowbased generative models for Markov chain Monte Carlo in lattice field theory, Phys. Rev. D100 (2019), no. 3 034515, [arXiv:1904.12072].
 (71) G. Papamakarios, T. Pavlakou, and I. Murray, Masked autoregressive flow for density estimation, in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 2338–2347. Curran Associates, Inc., 2017.
 (72) G. Kasieczka, B. Nachman, and D. Shih, R&D Dataset for LHC Olympics 2020 Anomaly Detection Challenge, Apr., 2019. https://doi.org/10.5281/zenodo.2629073.
 (73) J. A. AguilarSaavedra, Stealth multiboson signals, Eur. Phys. J. C77 (2017), no. 10 703, [arXiv:1705.07885].
 (74) J. A. AguilarSaavedra and F. R. Joaquim, The minimal stealth boson: models and benchmarks, JHEP 10 (2019) 237, [arXiv:1905.12651].
 (75) K. Agashe, J. H. Collins, P. Du, S. Hong, D. Kim, and R. K. Mishra, Detecting a Boosted Diboson Resonance, JHEP 11 (2018) 027, [arXiv:1809.07334].
 (76) K. Agashe, J. H. Collins, P. Du, S. Hong, D. Kim, and R. K. Mishra, Dedicated Strategies for Triboson Signals from Cascade Decays of Vector Resonances, Phys. Rev. D99 (2019), no. 7 075016, [arXiv:1711.09920].
 (77) E. M. Metodiev, B. Nachman, and J. Thaler, Classification without labels: Learning from mixed samples in high energy physics, JHEP 10 (2017) 174, [arXiv:1708.02949].
 (78) D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res. 3 (Mar., 2003) 993?1022.
 (79) ATLAS Collaboration, M. Aaboud et al., Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector, Phys. Lett. B784 (2018) 173–191, [arXiv:1806.00425].
 (80) CMS Collaboration, A. M. Sirunyan et al., Measurements of Higgs boson properties in the diphoton decay channel in protonproton collisions at 13 TeV, JHEP 11 (2018) 185, [arXiv:1804.02716].
 (81) J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses, Phil. Trans. R. Soc. Lond. A 231 (1933) 289.
 (82) M. Frate, K. Cranmer, S. Kalia, A. VandenbergRodes, and D. Whiteson, Modeling Smooth Backgrounds and Generic Localized Signals with Gaussian Processes, arXiv:1709.05681.
 (83) G. Louppe, M. Kagan, and K. Cranmer, Learning to Pivot with Adversarial Networks, arXiv:1611.01046.
 (84) J. Dolen, P. Harris, S. Marzani, S. Rappoccio, and N. Tran, Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure, JHEP 05 (2016) 156, [arXiv:1603.00027].
 (85) I. Moult, B. Nachman, and D. Neill, Convolved Substructure: Analytically Decorrelating Jet Substructure Observables, JHEP 05 (2018) 002, [arXiv:1710.06859].
 (86) J. Stevens and M. Williams, uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers, JINST 8 (2013) P12013, [arXiv:1305.7248].
 (87) C. Shimmin, P. Sadowski, P. Baldi, E. Weik, D. Whiteson, E. Goul, and A. Søgaard, Decorrelated Jet Substructure Tagging using Adversarial Neural Networks, Phys. Rev. D96 (2017) 074034, [arXiv:1703.03507].
 (88) L. Bradshaw, R. K. Mishra, A. Mitridate, and B. Ostdiek, Mass Agnostic Jet Taggers, arXiv:1908.08959.
 (89) ATLAS Collaboration, Performance of massdecorrelated jet substructure observables for hadronic twobody decay tagging in ATLAS, ATLPHYSPUB2018014 (2018).
 (90) G. Kasieczka and D. Shih, DisCo Fever: Robust Networks Through Distance Correlation, arXiv:2001.nnnnn.
 (91) L.G. Xia, QBDT, a new boosting decision tree method with systematical uncertainties into training for High Energy Physics, Nucl. Instrum. Meth. A930 (2019) 15–26, [arXiv:1810.08387].
 (92) C. Englert, P. Galler, P. Harris, and M. Spannowsky, Machine Learning Uncertainties with Adversarial Neural Networks, Eur. Phys. J. C79 (2019), no. 1 4, [arXiv:1807.08763].
 (93) S. Wunsch, S. Jórger, R. Wolf, and G. Quast, Reducing the dependence of the neural network function to systematic uncertainties in the input space, arXiv:1907.11674.
 (94) ATLAS Collaboration, M. Aaboud et al., Search for heavy ZZ resonances in the and final states using proton?proton collisions at TeV with the ATLAS detector, Eur. Phys. J. C78 (2018), no. 4 293, [arXiv:1712.06386].
 (95) ATLAS Collaboration, M. Aaboud et al., Search for topsquark pair production in final states with one lepton, jets, and missing transverse momentum using 36 fb of TeV pp collision data with the ATLAS detector, JHEP 06 (2018) 108, [arXiv:1711.11520].
 (96) CMS Collaboration Collaboration, Search for direct top squark pair production in events with one lepton, jets and missing transverse energy at 13 TeV, Tech. Rep. CMSPASSUS19009, CERN, Geneva, 2019.
 (97) ATLAS Collaboration, M. Aaboud et al., Search for new phenomena with large jet multiplicities and missing transverse momentum using largeradius jets and flavourtagging at ATLAS in 13 TeV collisions, JHEP 12 (2017) 034, [arXiv:1708.02794].
 (98) CMS Collaboration, A. M. Sirunyan et al., Search for pairproduced resonances decaying to quark pairs in protonproton collisions at 13 TeV, Phys. Rev. D98 (2018), no. 11 112014, [arXiv:1808.03124].
 (99) ATLAS Collaboration, M. Aaboud et al., Search for longlived particles produced in collisions at TeV that decay into displaced hadronic jets in the ATLAS muon spectrometer, Phys. Rev. D99 (2019), no. 5 052005, [arXiv:1811.07370].
 (100) CMS Collaboration, A. M. Sirunyan et al., Search for longlived particles decaying into displaced jets in protonproton collisions at 13 TeV, Phys. Rev. D99 (2019), no. 3 032011, [arXiv:1811.07991].
 (101) ATLAS Collaboration, G. Aad et al., Search for new resonances in mass distributions of jet pairs using 139 fb of collisions at TeV with the ATLAS detector, arXiv:1910.08447.
 (102) CMS Collaboration, A. M. Sirunyan et al., Search for high mass dijet resonances with a new background prediction method in protonproton collisions at 13 TeV, arXiv:1911.03947.
 (103) D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling, Improved variational inference with inverse autoregressive flow, in Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 4743–4751. Curran Associates, Inc., 2016.
 (104) B. Uria, M.A. Côté, K. Gregor, I. Murray, and H. Larochelle, Neural autoregressive distribution estimation, Journal of Machine Learning Research 17 (2016), no. 205 1–37.
 (105) M. Germain, K. Gregor, I. Murray, and H. Larochelle, Made: Masked autoencoder for distribution estimation, in Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.), vol. 37 of Proceedings of Machine Learning Research, (Lille, France), pp. 881–889, PMLR, 07–09 Jul, 2015.
 (106) T. Sjöstrand, S. Mrenna, and P. Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026, [hepph/0603175].
 (107) T. Sjöstrand, S. Mrenna, and P. Z. Skands, A Brief Introduction to PYTHIA 8.1, Comput. Phys. Commun. 178 (2008) 852, [arXiv:0710.3820].
 (108) DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaitre, A. Mertens, and M. Selvaggi, DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057, [arXiv:1307.6346].
 (109) A. Mertens, New features in Delphes 3, J. Phys. Conf. Ser. 608 (2015) 012045.
 (110) M. Selvaggi, DELPHES 3: A modular framework for fastsimulation of generic collider experiments, J. Phys. Conf. Ser. 523 (2014) 012033.
 (111) M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual, Eur. Phys. J. C72 (2012) 1896, [arXiv:1111.6097].
 (112) M. Cacciari and G. P. Salam, Dispelling the myth for the jetfinder, Phys. Lett. B641 (2006) 57, [hepph/0512210].
 (113) M. Cacciari, G. P. Salam, and G. Soyez, The anti jet clustering algorithm, JHEP 04 (2008) 063, [arXiv:0802.1189].
 (114) ATLAS Collaboration, G. Aad et al., Search for diboson resonances in hadronic final states in 139 fb of collisions at TeV with the ATLAS detector, JHEP 09 (2019) 091, [arXiv:1906.08589].
 (115) CMS Collaboration, A. M. Sirunyan et al., A multidimensional search for new heavy resonances decaying to boosted WW, WZ, or ZZ boson pairs in the dijet final state at 13 TeV, arXiv:1906.05977.
 (116) J. Thaler and K. Van Tilburg, Maximizing Boosted Top Identification by Minimizing Nsubjettiness, JHEP 02 (2012) 093, [arXiv:1108.2701].
 (117) J. Thaler and K. Van Tilburg, Identifying Boosted Objects with Nsubjettiness, JHEP 03 (2011) 015, [arXiv:1011.2268].
 (118) A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: An imperative style, highperformance deep learning library, in Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
 (119) D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980.
 (120) F. Chollet, “Keras.” https://github.com/fchollet/keras, 2017.
 (121) M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., Tensorflow: A system for largescale machine learning., in OSDI, vol. 16, pp. 265–283, 2016.
 (122) N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research 15 (2014) 1929–1958.
 (123) C. Huang, D. Krueger, A. Lacoste, and A. C. Courville, Neural autoregressive flows, CoRR abs/1804.00779 (2018) [arXiv:1804.00779].
 (124) C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios, Neural spline flows, arXiv:1906.04032.
Comments
There are no comments yet.