The CERN Large Hadron Collider (LHC) collides protons every 25 ns. Each collision can result in any of hundreds of physics processes. The total data volume exceeds by far what the experiments could record. This is why the incoming data flow is typically filtered through a set of rule-based algorithms, designed to retain only events with particular signatures (e.g., the presence of a high-energy particle of some kind). Such a system, commonly referred to as trigger, consists of hundreds of algorithms, each designed to accept events with a specific topology. The ATLAS Aaboud:2016leb and CMS Adam:2005zf trigger systems are based on this idea. In their current implementation, given the throughput capability and the typical event size, these two experiments can write on disk events/sec. A few processes, e.g., QCD multijet production, constitute the vast majority of the produced events. One is typically interested to select a fraction of these events for further studies. On the other hand, the main interest of the LHC experiments is related to selecting and studying the many rare processes which occur at the LHC. In a typical data flow, these events are overwhelmed by the large amount of QCD multijet events. The trigger system is put in place to make sure that the majority of these rare events are part of the stored events/sec.
Trigger algorithms are typically designed to maximize the efficiency (i.e., the true-positive rate), resulting in a non-negligible false-positive rate and, consequently, in a substantial waste of resources at trigger level (i.e., data throughput that could have been used for other purposes) and downstream (i.e., storage disk, processing power, etc.).
The most commonly used selection rules are inclusive, i.e., more than one topology is selected by the same requirement. The so-called isolated lepton triggers are a typical example of this kind of algorithms. These triggers select events with a high-momentum electron or muon and no surrounding energetic particle, a typical signature of an interesting rare process, e.g., the production of a boson decaying to a neutrino and an electron or muon. With such a requirement, one can simultaneously collect bosons produced in the primary interaction ( events) or from the cascade decay of other particles, e.g., top quarks (mainly in events where a top quark-antiquark pair is produced). A sample selected this way is dominated by events but it retains a substantial () contamination from QCD multijet. The contribution is smaller than . Events from production are sometimes triggered by a set of dedicated lepton+jets algorithms, capable of using looser requirements on the lepton at the cost of introducing requirements on jets.111 A jet is a spray of hadrons, typically originating from the hadronization of gluons and quarks produced in the proton collisions. Due to this additional complexity, the use of these triggers in a data analysis comes with additional complications. For instance, the applied jet requirements produce distortions on offline distributions of jet-related quantities. To avoid having this effect, any typical data analysis applies a tighter offline selection. This means that many of the selected events close to the online-selection threshold are discarded. This is not necessarily the most cost-effective way to retain an unbiased dataset for offline analysis.
In this paper, we investigate the possibility of using machine learning to disentangle events from different event topologies at trigger level. Doing so, one could customize the trigger-selection strategy on individual processes (depending on the physics goals) while keeping the selection loose and simple. As a benchmark case, we consider a stream of data selected by requiring the presence of one electron or muon with transverse momentumGeV 222 In this paper, we set units in such a way that = = 1. and a loose requirement on the isolation. Details on the applied selection can be found in Sec. 2.
The considered benchmark sample is dominated by direct production, with a sizable contamination from QCD multijet events and a small contribution of events. Other interesting processes (e.g., , , and production) are usually selected with more exclusive and dedicated trigger algorithms (e.g., di-muon or di-electron triggers), or share the same kinematic properties of the two main interesting processes ( and ). For the sake of simplicity, we ignore these sub-leading processes in our study, without compromising the validity of our conclusions. Fig. 1 shows the composition of a sample with one electron or muon within the defined acceptance ( GeV and pseudorapidity , where is the polar angle), before and after applying the trigger requirements ( GeV and loose isolation).
Such a loose set of requirements would translate into an event acceptance rate of Hz for a luminosity of cm s, well beyond the currently allocated budget for these triggers. We suggest that, using the score of our topology classifier, one could tune the amount of each process to be stored for further analysis, within the boundaries of the allocated resources (typically Hz). For instance, one might be interested to retain all the events and some fraction of events, while rejecting the QCD multijet events. We envision two main applications: for a given total rate, one could loosen the baseline trigger requirements, increasing the acceptance efficiency at no cost. Or, for a given acceptance efficiency (true positive rate), one could save resources by reducing the overll rate, rejecting the contribution of unwanted topologies (see Appendix A).
and gated recurrent units (GRUs)GRU . We consider four different representations of the collision events: (i) a set of physics-motivated high-level features, (ii) the raw image of the detector hits, (iii) a sequence of particles, characterized by a limited set of basic features (energy, direction, etc.), and (iv) an abstract representation of this list of particles as an image.
The paper is structured as follows. In Sec. 2 we describe the four data representations. In Sec. 3 we describe the corresponding classification models. Results are discussed in Sec. 4. In Sec. 5 we investigate the generalization properties of the four classifiers to scenarios of other topologies. In Sec. 6 we briefly discuss applications of machine learning algorithms to similar problems. Conclusions are given in Sec. 7. Appendix A describes a different scenario, in which the classifier is used to save resources by reducing the trigger acceptance rate, as opposed of using it to sustain a loose trigger selection that could otherwise require too many resources.
Synthetic data corresponding to , and QCD multijet production topologies are generated using the PYTHIA8 event generation library pythia . The setup of the proton-beam simulation is loosely inspired by the LHC running configuration in 2015-2016: two proton beams, each with 6.5 TeV, generate on average 20 proton-proton collisions per crossing.
Generated samples are processed with the DELPHES library delphes
, which applies a parametric model of a detector response. Detector performances is tuned to the CMS upgrade design foreseen for the High-Luminosity LHCCMS_TP , as implemented in the corresponding default card provided with DELPHES. We run the DELPHES particle-flow (PF) algorithm, which combines the information from all the CMS detector components to derive a list of reconstructed particles, the so-called PF candidates. For each particle, the algorithm returns the measured energy and flight direction. Each particle is associated to one of three classes: charged particles, photons, and neutral hadrons.
The basic event representation consists of a list of reconstructed PF candidates. For each candidate , the following information is given: (i) The particle four-momentum in Cartesian coordinates (, , , ); (ii) The particle three-momentum in cylindrical coordinates: the transverse momentum , the pseudorapidity , and the azimuthal angle ; (iii) The Cartesian coordinates (, , ) of the particle point of origin. For all neutral particles, (0, 0, 0) is used in the absence of pointing information; (iv) The electric charge; (v) The particle isolation with respect to charged particles (ChPFIso), photons (GammaPFIso), or neutral hadrons (NeuPFIso). For each particle class, the isolation is quantified as
where the sum extends over all the particles of the appropriate class with angular distance from the particle .
The particle identity is categorized via a one-hot-encoded representation (, , ), corresponding to a charged particle, a neutral hadron, or a photon. In addition, two boolean flags are stored ( and ) to identify if a given particle is an electron or a muon. In total, each particle is then described by 19 features.
The trigger selection is emulated by requiring all the events to include one isolated electron or muon with transverse momentum GeV and particle-based isolation . This baseline selection, which follows the typical requirements of an inclusive single-lepton trigger algorithm, accepts QCD multijet events and events for every event. Despite its large and efficiency, this trigger selection comes with a large cost in terms of QCD multijet events written on disk and processed offline. The cost is even larger if the main physics target is events and the contribution is seen as an additional source of background (e.g., in a high-statistics scenario, with all measurements of properties limited in precision by systematic uncertainties).
All particles are ranked in decreasing order of . For each event, the isolated lepton is the first entry of the list of particles. To avoid double counting of this isolated lepton as a charged particle, each charged particle is required to have
. In addition to the isolated lepton, we consider the first 450 charged particles, the first 150 photons, and the first 200 neutral hadrons. This corresponds to a total of 801 particles per event, each characterized by the 19 features described above. If fewer particles are found in the event, zero padding is used to guarantee a fixed length of the particle list across different events. The events are then stored as numpy arrays in a set of compressed HDF5 files. The dataset is planned to be released on the CERN OpenData portal, accessible atopendata.cern.ch.
In addition to this raw-event representation, we provide a list of physics-motivated high-level features, computed from the full event (the HLF dataset):
The missing transverse energy , defined as the absolute value of the missing transverse momentum, computed summing over the full list of reconstructed PF candidates:
The squared transverse mass, , of the isolated lepton and the system, defined as:
with the transverse momentum of the lepton and the azimuthal separation between the lepton and vector.
The azimuthal angle of the vector, .
The number of jets entering the sum.
The number of these jets identified as originating from a quark.
The isolated-lepton momentum, expressed in polar coordinates (, , )
The three isolation quantities (ChPFIso, NeuPFIso, GammaPFIso) for the isolated lepton.
The lepton charge.
The flag for the isolated lepton.
The list of 801 particles is used to generate two visual representations of the events. In the first one, the (, ) plane corresponding to the detector acceptance is divided into a barrel region (), two end-cap regions ( and ), and two forward regions ( and ). The barrel and endcap regions of the electromagnetic calorimeter, as well as the endcap of the hadronic calorimeter (HCAL), are binned in cells of size . The barrel region of the HCAL is binned with cells of size . The forward regions are binned with cells of size 0.175 in , while the dimension in varies from 0.175 to 0.35. Each cell is filled with the scalar sum of the of the particles pointing to that cell. The three classes of particles (charged particles, photons, and neutral hadrons) are considered separately, resulting in three adjacent images. An example is shown in Fig. 2 for a event. This representation corresponds to the raw image recorded by the detector.
Recently, it was proposed to represent LHC collision events as abstract images where reconstructed physics objects (jets, in that case) are represented as geometric shapes whose size reflects the energy of the particle Madrazo . We generalize this approach by applying it to the full list of particles. Each particle is represented as a unique geometric shape, centered at the particle’s coordinates and with size proportional to its . The geometric shapes are chosen as follow: (i) pentagons for the selected isolated electron or muon; (ii) triangles for photons; (iii) squares for charged particles; (iv) hexagons for neutral hadrons. The images are digitized as arrays of size , where each of the first four channels contains a separated particle class, and the last channel contains the , represented as a circle. As an example, the abstract representation for the event in Fig. 2 is shown in Fig. 3.
This abstract representation allows mitigating the sparsity problem of the raw images. On the other hand, there is no guarantee that the physics information is fully retained in this translation. As a result, there could be a reduction of discrimination power. This is one of the points we aim to investigate in this study.
3 Model description
In this section, we describe five types of multi-class classifiers, trained on the four data representations described in the previous section. We start by considering a state-of-the art HEP application, based on the high-level features listed in Sec. 2. We then consider a convolutional neural network taking as input the raw images. This model offers the baseline point of comparison for the classifier using the abstract images. In order to have a fair comparison between the two approaches, the same kind of network architecture is used for the two sets of images. Next, we consider recurrent neural networks based on LSTMs and GRUs, trained directly on the lists of 801 particles. Finally, we consider a classifier taking both the high-level features and the list of 801 particles as inputs, using a combination of recurrent neural networks and fully connected neural networks.
. The recurrent neural networks and feed-forward neural networks are implemented inKeras and trained using Theano theano as a back-end. The Adam optimizer Adam
is used to adapt the learning rate. The training is capped at 50 epochs, and can be stopped early if there is no improvement in terms of validation loss after 8 epochs. Categorical cross entropy is used as the loss function. All trainings are performed on a cluster of GeForce GTX 1080 GPUs. In an early stage of this work, experiments on the recurrent models were performed on the CSCS Piz Daint super computer, using thempi-learn library mpi-learn for multiple-GPU training.
3.1 High-level-features classifier using feed-forward neural networks
A fully connected feed-forward DNN based on a set of high-level features (HLF classifier) is the closest approach to the currently used rule-based trigger algorithms. We train a model of this kind taking as input the 14 features contained in the HLF dataset (see Sec. 2). The 14 features are normalized to take values between 0 and 1.
The final network configuration is the result of an optimization process performed using the scikit-learn scikit-learn
optimizer, which performs an exhaustive cross-validated grid-search over a set of hyperparameters related to the network architecture and the training setup. The number of layers, the number of nodes in each layer, and the choice of optimizer have been considered in the scan. For a given number of layers, discrimination performances were found to be constant over the considered range of number of nodes per layer. We believe that this is a direct consequence of the simple problem at hand: even a relatively small networks achieve good classification performances. We then took the smallest network as the best compromise between performance and architecture minimality.
3.2 Raw-image and abstract-image classifiers using convolutional neural networks
To classify events represented as raw calorimeter images (raw-image classifier) and abstract images (abstract-image classifier), we use DenseNet-121, an instantiation of the Densely Connected Convolutional Network huang2017densely
. The DenseNet-121 architecture includes 4 dense blocks, each of which contains 6, 12, 24, 16 dense layers, respectively. Each dense layer contains two 2D convolutional layers preceded by batch normalization layers. A dropout rate of 0.5 is applied after each dense layer. Between two subsequent dense blocks is a transition layer consisting of a batch normalization layer, a 2D convolutional layer, and an average pooling layer.
3.3 Particle-sequence classifier using recurrent neural networks
A particle-sequence classifier is trained using a recursive layer, taking as input the 801 candidates. To feed these particles into a recurrent network, particles are ordered according to their increasing or decreasing distance from the isolated lepton. Different physics-inspired metrics are considered to quantify the distance (, , , antikt , or anti- kt ). The best results are obtained using the decreasing distance ordering.
We use gated recurrent units (GRU) to aggregate the input sequence of particle flow candidate features into a fixed size encoding. The fixed encoding is fed into a fully connected layer with 3 softmax activated nodes. Input data is standardized so that each feature has zero mean and unit standard deviation. The zero-padded entries in the particle sequence are skipped with the Masking layer. The best internal width of the recurrent layers was found to be 50, determined by k-fold cross validation on a training set of 300,000 events. We also considered using long short-term memory networks (LSTM) to replace the GRU, but we found that the GRU architecture outperformed the LSTM architecture for the same number of internal cells.
3.4 Inclusive classifier
In order to inject some domain knowledge in the GRU classifier, we consider a modification of its architecture in which the 14 features of the HLF dataset are concatenated to the output of the GRU layer after some dropout (see Fig. 4). As for the other classifiers, the final output layer consists of 3 nodes, activated by a softmax activation function. We refer to this model as inclusive classifier.
Each of the models presented in the previous section returns the probability of each event to be associated to a given topology:, , and . By applying a threshold requirement on or , one can define a or a classifier, respectively. By changing the threshold value, one can build the corresponding receiver operating characteristic (ROC) curve. Fig. 5 shows the comparison of the ROC curves for five classifiers: the DenseNets based on raw images and abstract images, the GRU using the list of particles, the DNN using the HLFs, and the inclusive classifier using both the HLFs and the list of particles. Results for both a and selectors are shown.
Acceptable results are obtained already with the raw-image classifier. On the other hand, the use of abstract images allows us to reach better performances. A further improvement is observed for those models not using an image-based representation of the event. The fact that the HLF selectors perform so well doesn’t come as a surprise, given a considerable amount of physics knowledge implicitly provided by the choice of the relevant features. On the other hand, the fact that the particle-sequence classifier reaches comparable performances to the HLF selector is remarkable, as is the further improvement observed by merging the two approaches in the inclusive classifier. In some sense, the GRU layer is gaining a good part of the physics intuition that motivated the choice of the HLF quantities, but not entirely. Fig. 6 shows the Pearson correlation coefficients between the GRU scores ( and ) and the HLF quantities. As one would expect, exhibits a stronger correlation with those features that quantify jet activity, as well as with the b-jet multiplicity. On the contrary, events shows an anti-correlation with respect to jet quantities, since the production of associated jets in events is much more penalized than for events. As expected, both scores are anti-correlated to the isolation quantities, which takes larger values for non-isolated leptons.
The performance of each of the five classifiers is summarized in Tab. 1 in terms of false-positive rate (FPR) and trigger rate (TR) as a function of the true-positive rate (TPR). The best QCD rejection is obtained by the inclusive classifier, which can retain 99% of the or events with a false-positive rate of .
|FPR @99% TPR||76.5%||43.6%||41.1%||15.2%||7.9%|
|FPR @95% TPR||41.3%||13.7%||7.3%||4.0%||1.3%|
|FPR @90% TPR||26.5%||6.7%||3.5%||1.8%||0.4%|
|TR @99% TPR||382 Hz||250 Hz||202 Hz||78 Hz||42 Hz|
|TR @95% TPR||208 Hz||82 Hz||39 Hz||22 Hz||9 Hz|
|TR @90% TPR||134 Hz||39 Hz||20 Hz||11 Hz||4 Hz|
|FPR @99% TPR||79.0%||58.6%||26.3%||20.0%||8.0%|
|FPR @95% TPR||60.5%||26.4%||10.6%||7.5%||2.7%|
|FPR @90% TPR||48.1%||14.9%||5.8%||3.7%||1.2%|
|TR @99% TPR||488 Hz||462 Hz||316 Hz||290 Hz||262 Hz|
|TR @95% TPR||454 Hz||366 Hz||258 Hz||249 Hz||239 Hz|
|TR @90% TPR||408 Hz||301 Hz||235 Hz||228 Hz||223 Hz|
selector. Rate values are estimated scaling the TPR and process-dependent FPR values by the acceptance and efficiency, assuming a leading-order (LO) production cross section and luminosity of 2cm s. TR values should be taken only as suggestions of the actual rates, since the accuracy is limited by the use of LO cross sections and a parametric detector simulation.
The trigger baseline selection we use in this study, looser than what is used nowadays in CMS, gives an overall trigger rate (i.e., summing electron and muon events) of Hz, more than a factor two larger than what is currently allocated. Using the 99% working points of the two classifiers, one would reduce the overall rate to Hz (counting the overlap between the two triggers). This would be comparable to what is currently allocated for these triggers, but with a looser selection, i.e., with a less severe bias on the offline analysis. In addition, the trigger efficiency (the TPR) is so large that the bias imposed on offline quantities is quite minimal. This is illustrated in Fig. 7, where the dependence of the TPR on the most relevant HLF quantities is shown. In our experience, any rule-based algorithm with the same target trigger rate would result in larger inefficiencies at small values of at least some of these quantities, e.g., the lepton . One should also consider that the principle of a topology classifier could be generalized to other physics cases, as well as to other uses (e.g., labels for fast reprocessing or access to specific subsets of the triggered samples).
5 Impact on other topologies
While reducing the resource consumption of standard physics analyses is the main motivation behind this study, it is important to evaluate the impact of the proposed classifiers on other kind of topologies. For this purpose, we consider a handful of beyond-the-standard-model (BSM) scenarios, and we compute the TPR as a function of the most relevant kinematic quantities, similar to what was done in Fig. 7 for the standard topologies.
We consider the following BSM processes:
: a heavy Higgs boson with mass 425 GeV decaying to a charged Higgs boson of mass 325 GeV and a boson. The then decays to a final state, where is the 125 GeV Higgs boson, which we force to decay to a bottom quark-antiquark pair. This model, introduced in Ref. baldi , generates a 22 topology similar to that given by events.
High-mass : a high-mass variation of the previous model, in which the and masses are set to 1025 GeV and 625 GeV, respectively.
: a light neutral scalar particle with mass 20 GeV, decaying to two neutral scalars of 5 GeV each, both decaying to muon pairs, for a total of four muons in the final state.
resonance with mass 300 GeV, decaying inclusively with -like couplings.
resonance with mass 600 GeV, decaying to a pair of electrons of muons.
These events are filtered with the baseline selection described in Sect. 2.
For each of these models, we consider the inclusive classifier and apply the 99%-TPR thresholds on and . We then consider the fraction of events passing at least one of the two selectors. Results are shown in Fig. 8 for the most relevant kinematic quantities. While the individual selectors might show local inefficiencies, the combination of the two trigger paths is perfectly capable of retaining any event with features different from that of a QCD multijet event. In this respect, the logical OR of our two exclusive topology classifiers is robust enough to also select a large spectrum of BSM topologies. On the other hand, one cannot guarantee that QCD-like topologies (e.g., a dark photon produced in jet showers and decaying to lepton pairs) would not be rejected, a limitation which also affects traditional inclusive trigger strategies.
6 Related work
Several classification algorithms have been studied in the context of LHC physics application, notably for jet tagging deOliveira:2015xxd ; Guest:2016iqz ; Macaluso:2018tck ; Datta:2017lxt ; Butter:2017cot ; Kasieczka:2017nvn ; Komiske:2016rsd ; Schwartzman:2016jqu and event topology identification baldi ; Bhimji:2017qvb ; Madrazo using feed-forward neural networks, convolutional neural networks or physics-inspired architectures. Lists of particles have been used to define jet and event classifiers starting from a list of reconstructed particle momenta RecursiveJets ; Egan:2017ojy ; Cheng:2017rdo . These studies typically consider data analysis as the main use case, focusing on small FPR selections. This is the main difference with respect to this study, which is more related to an optimization of the data-taking procedure.
We show how deep neural networks can be used to train topology classifiers for LHC collision events, which could be used as a cleanup filter to select or reject specific event topologies in a trigger system. We consider several network architectures, applied to different representations of the same collision datasets. The best results are obtained by combining a set of physics-motivated high-level features with the output of a GRU unit applied to a list of particle-level features. For the most difficult case, i.e., selecting rare events, we show how a trigger based on this concept would retain 99% of the events while reducing the FPR by as much as times. We show that such a trigger would have a minimal impact on the main kinematic features of the event topologies under consideration. In addition, the logic OR of the and selections would also catch a broad class of new-physics topologies, on which the classifiers were not trained. In view of the challenging trigger environment foreseen for the High-Luminosity LHC, it would be important to test this trigger strategy as a way to preserve a good experimental reach with a substantial reduction of computational resources. In this respect, we look forward to the LHC Run III as an opportunity to experiment this technique with real data.
This work is partially supported by a grant from the Swiss National Supercomputing Center (CSCS) under project ID d59. We thank CERN OpenLab for supporting DW during his internship at CERN. We are grateful to Caltech and the Kavli Foundation for their support of undergraduate student research in cross-cutting areas of machine learning and domain sciences. Part of this work was conducted at "iBanks", the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro and the Kavli Foundation for their support of "iBanks". TN would like to thank Duc Le for valuable discussions. This project is partially supported by the United States Department of Energy, Office of High Energy Physics Research under Caltech Contract No. DE-SC0011925. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement n 772369).
- (1) ATLAS Collaboration, M. Aaboud et al., Performance of the ATLAS Trigger System in 2015, Eur. Phys. J. C77 (2017), no. 5 317, [arXiv:1611.09661].
- (2) W. Adam et al., The CMS high level trigger, Eur. Phys. J. C46 (2006) 605–667, [hep-ex/0512077].
- (3) Y. LeCun, B. E. Boser, J. S. Denker, et al., Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems 2 (D. S. Touretzky, ed.), pp. 396–404. Morgan-Kaufmann, 1990.
- (4) S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (Nov., 1997) 1735–1780.
- (5) K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation: Encoder-decoder approaches, CoRR abs/1409.1259 (2014) [arXiv:1409.1259].
- (6) T. Sjöstrand, S. Ask, J. R. Christiansen, et al., An Introduction to PYTHIA 8.2, Comput. Phys. Commun. 191 (2015) 159–177, [arXiv:1410.3012].
- (7) DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, et al., DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057, [arXiv:1307.6346].
- (8) D. Contardo, M. Klute, J. Mans, L. Silvestris, and J. Butler, Technical Proposal for the Phase-II Upgrade of the CMS Detector, .
- (9) M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual, Eur. Phys. J. C72 (2012) 1896, [arXiv:1111.6097].
- (10) M. Cacciari, G. P. Salam, and G. Soyez, The anti- jet clustering algorithm, Journal of High Energy Physics 2008 (2008), no. 04 063.
- (11) C. F. Madrazo, I. H. Cacha, L. L. Iglesias, and J. M. de Lucas, Application of a Convolutional Neural Network for image classification to the analysis of collisions in High Energy Physics, arXiv:1708.07034.
- (12) A. Paszke, S. Gross, S. Chintala, et al., Automatic differentiation in PyTorch, .
- (13) R. Al-Rfou, G. Alain, A. Almahairi, et al., Theano: A Python framework for fast computation of mathematical expressions, arXiv e-prints abs/1605.02688 (May, 2016).
- (14) D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, ArXiv e-prints (Dec., 2014) [arXiv:1412.6980].
- (15) D. Anderson, M. Spiropulu, and J.-R. Vlimant, An MPI-Based Python Framework for Distributed Training with Keras, arXiv:1712.05878.
- (16) F. Pedregosa, G. Varoquaux, A. Gramfort, et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
V. Nair and G. E. Hinton,
Rectified linear units improve restricted Boltzmann machines, in Proceedings of ICML, vol. 27, pp. 807–814, 06, 2010.
- (18) G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in , 2017.
- (19) S. Catani, Y. L. Dokshitzer, M. H. Seymour, and B. R. Webber, Longitudinally invariant clustering algorithms for hadron hadron collisions, Nucl. Phys. B406 (1993) 187–224.
- (20) P. Baldi, P. Sadowski, and D. Whiteson, Searching for exotic particles in high-energy physics with deep learning, Nature Communication 5 (07, 2014) 4308.
- (21) L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, Jet-images — deep learning edition, JHEP 07 (2016) 069, [arXiv:1511.05190].
- (22) D. Guest, J. Collado, P. Baldi, et al., Jet Flavor Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D94 (2016), no. 11 112002, [arXiv:1607.08633].
- (23) S. Macaluso and D. Shih, Pulling Out All the Tops with Computer Vision and Deep Learning, arXiv:1803.00107.
- (24) K. Datta and A. J. Larkoski, Novel Jet Observables from Machine Learning, arXiv:1710.01305.
- (25) A. Butter, G. Kasieczka, T. Plehn, and M. Russell, Deep-learned Top Tagging with a Lorentz Layer, arXiv:1707.08966.
- (26) G. Kasieczka, T. Plehn, M. Russell, and T. Schell, Deep-learning Top Taggers or The End of QCD?, JHEP 05 (2017) 006, [arXiv:1701.08784].
- (27) P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination, JHEP 01 (2017) 110, [arXiv:1612.01551].
- (28) A. Schwartzman, M. Kagan, L. Mackey, B. Nachman, and L. De Oliveira, Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics interpretation of LHC events, J. Phys. Conf. Ser. 762 (2016), no. 1 012035.
- (29) W. Bhimji, S. A. Farrell, T. Kurth, et al., Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC, in 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017) Seattle, WA, USA, August 21-25, 2017, 2017. arXiv:1711.03573.
- (30) G. Louppe, K. Cho, C. Becot, and K. Cranmer, QCD-aware recursive neural networks for jet physics, arXiv:1702.00748.
- (31) S. Egan, W. Fedorko, A. Lister, J. Pearkes, and C. Gay, Long Short-Term Memory (LSTM) networks with jet constituents for boosted top tagging at the LHC, arXiv:1711.09059.
- (32) T. Cheng, Recursive Neural Networks in Quark/Gluon Tagging, arXiv:1711.02633.
In this paper, we showed how one could use a topology classifier to keep the overall trigger rate under control while operating triggers with otherwise unsustainable loose selections. In this appendix we discuss how topology classifiers could be used to save resources for a pre-defined baseline trigger selection by rejecting events associated to unwanted topologies. In this case, the main goal is not to reduce the impact of the online selection. Instead, we focus on reducing resource consumption downstream for a given trigger selection.
To this purpose, we consider a copy of the dataset described in Sec. 2, obtained tightening the threshold from 23 to 25 GeV and the isolation requirement from ISO < 0.45 to ISO < 0.20. Doing so, the sample composition changes as follow: 7.5% QCD; 92% ; 0.5% . With such selections, the trigger acceptance rate would decrease from 690 Hz to 390 Hz, closer to what is currently allocated for these triggers in the CMS experiment.
We then define a set of trigger filters applying a lower threshold to the normalized score of the classifier, choosing the threshold value that corresponds to a certain TPR value. The result is presented in Table 2, in terms of the FPR and the trigger rate.
|FPR @99% TPR||76.7%||55.5%||44.3%||13.4%||10.2%|
|FPR @95% TPR||43.5%||20.2%||9.1%||2.1%||1.5%|
|FPR @90% TPR||24.8%||9.9%||4.2%||0.6%||0.5%|
|TR @99% TPR||285 Hz||230 Hz||219 Hz||57 Hz||42 Hz|
|TR @95% TPR||148 Hz||85 Hz||37 Hz||10 Hz||9 Hz|
|TR @90% TPR||73 Hz||42 Hz||19 Hz||4 Hz||4 Hz|
|FPR @99% TPR||81.3%||68.9%||45.7%||17.3%||14.9%|
|FPR @95% TPR||58.4%||43.9%||19.6%||6.1%||5.2%|
|FPR @90% TPR||46.9%||30.2%||11.7%||3.0%||2.5%|
|TR @99% TPR||385 Hz||384 Hz||376 Hz||363 Hz||362 Hz|
|TR @95% TPR||367 Hz||360 Hz||349 Hz||343 Hz||342 Hz|
|TR @90% TPR||343 Hz||336 Hz||328 Hz||325 Hz||324 Hz|
The trigger baseline selection we use in this study, close to what is used nowadays in CMS for muons, gives an overall trigger rate (i.e., summing electron and muon events) of 390 Hz (i.e., 190 Hz per lepton flavor). If one was willing to take (as an example) half the events and all the events, this number could be reduced to Hz using the inclusive selectors presented in this study (taking into account the partial overlap between the two triggers). A more classic approach would consist in prescaling the isolated lepton triggers, i.e. randomly accepting half of the events. The effect on events would be the same, but one would lose half of the events while still writing 15 times more QCD than events. In this respect, the strategy we propose would allow a more flexible and cost-effective strategy.