NNArchTeraScale2021
None
view repo
In general-purpose particle detectors, the particle flow algorithm may be used to reconstruct a coherent particle-level view of the event by combining information from the calorimeters and the trackers, significantly improving the detector resolution for jets and the missing transverse momentum. In view of the planned high-luminosity upgrade of the CERN Large Hadron Collider, it is necessary to revisit existing reconstruction algorithms and ensure that both the physics and computational performance are sufficient in a high-pileup environment. Recent developments in machine learning may offer a prospect for efficient event reconstruction based on parametric models. We introduce MLPF, an end-to-end trainable machine-learned particle flow algorithm for reconstructing particle flow candidates based on parallelizable, computationally efficient, scalable graph neural networks and a multi-task objective. We report the physics and computational performance of the MLPF algorithm on on a synthetic dataset of ttbar events in HL-LHC running conditions, including the simulation of multiple interaction effects, and discuss potential next steps and considerations towards ML-based reconstruction in a general purpose particle detector.
READ FULL TEXT VIEW PDFNone
Reconstruction algorithms at general-purpose high-energy particle detectors aim to provide a coherent, well-calibrated physics interpretation of the collision event. Variants of the particle-flow (PF) algorithm have been used at the PETRA Behrend and others (1982), ALEPH Buskulic and others (1995), CMS Sirunyan and others (2017) and ATLAS Aaboud and others (2017) experiments to reconstruct a particle-level interpretation of high-multiplicity hadronic collision events, given individual detector elements such as tracks and calorimeter clusters from a multi-layered, heterogeneous, irregular-geometry detector. The PF algorithm generally correlates tracks and calorimeter clusters from detector layers such as the electromagnetic calorimeter (ECAL), hadron calorimeter (HCAL) and others to reconstruct charged and neutral hadron candidates as well as photons, electrons, and muons with an optimized efficiency and resolution. Existing PF reconstruction implementations are optimized using simulation for each specific experiment because detailed detector characteristics and geometry must be considered for the best possible physics performance.
Recently, there has been significant interest in adapting the PF reconstruction approach for future high-luminosity experimental conditions at the CERN Large Hadron Collider (LHC), as well as for proposed future collider experiments like the Future Circular Collider (FCC). While reconstruction algorithms are often based on an imperative, rule-based approach, the use of supervised machine learning (ML) to define reconstruction parametrically based on data and simulation samples may improve the physics reach of the experiments while offering a modern computing solution that could scale better with the expected progress on ML-specific computing infrastructures, e.g., at high-performance computer centers. In addition to potentially improving the physics performance, one of the motivations for developing ML-based reconstruction is an improved computational performance over standard algorithms in a high-luminosity configuration, which ultimately would allow a more detailed reconstruction to be deployed given a fixed computing budget, as ML algorithms are well-suited to emerging highly parallel computing architectures.
ML-based reconstruction approaches have been proposed for various tasks, including PF Duarte and Vlimant (2020). The clustering of energy deposits in detectors with a realistic, irregular-geometry detector using graph neural networks has been first proposed in Ref. Qasim et al. (2019). The ML-based reconstruction of overlapping signals without a regular grid was further developed in Ref. Kieseler (2020), where an optimization scheme for reconstructing a variable number of particles based on a potential function using an object condensation approach was proposed. The clustering of energy deposits from particle decays with potential overlaps is an essential input to PF reconstruction. In Ref. Di Bello et al. (2020), various ML models including GNNs
and computer-vision models have been studied for reconstructing neutral hadrons from multi-layered granular calorimeter images and tracking information. In particle gun samples, the
ML-based approaches achieved a significant improvement in neutral hadron energy resolution over the default algorithm, an important step towards a fully parametric, simulation-driven reconstruction using ML.In this paper, we build on the previous ML-based reconstruction approaches by extending the ML-based PFs algorithm to reconstruct particle candidates in events with a large number of simultaneous pileup (PU) collisions. In Section 2, we propose a benchmark dataset that has the main components for a particle-level reconstruction of charged and neutral hadrons with PU. In Section 3, we build on the existing ML-based reconstruction and propose a GNN-based machine-learned particle-flow (MLPF) algorithm where the runtime scales approximately linearly with the input size. Furthermore, in Section 4, we characterize the performance of the MLPF model on the benchmark dataset in terms of hadron reconstruction efficiency, fake rate and resolution, comparing it to the baseline PF reconstruction, while also demonstrating using synthetic data that MLPF reconstruction can be computationally efficient and scalable. Finally, in Section 5 we discuss some potential issues and next steps for ML-based PF reconstruction.
We use pythia 8 Sjöstrand et al. (2006, 2008) and delphes 3 de Favereau et al. (2014) from the HepSim software repository Chekanov (2015) to generate a particle-level dataset of 50,000 top quark-antiquark () events produced in proton-proton collisions at 14, overlaid with minimum bias events corresponding to a PU of 200 on average. The dataset consists of detector hits as the input, generator particles as the ground truth and reconstructed particles from delphes for additional validation. The delphes model corresponds to a CMS-like detector with a multi-layered charged particle tracker, an electromagnetic and hadron calorimeter.
Although this simplified simulation does not include important physics effects such as pair production, Brehmsstrahlung, nuclear interactions, electromagnetic showering or a detailed detector simulation, it allows the study of overall per-particle reconstruction properties for charged and neutral hadrons in a high-PU environment. Different reconstruction approaches can be developed and compared on this simplified dataset, where the expected performance is straightforward to assess, including from the aspect of computational complexity.
The inputs to PF are charged particle tracks and calorimeter clusters. We use these high-level detector inputs (elements), rather than low-level tracker hits or unclustered calorimeter hits to closely follow how PF is implemented in existing reconstruction chains, where successive reconstruction steps are decoupled, such that each step can be optimized and characterized individually. In this toy dataset, tracks are characterized by transverse momentum () ^{1}^{1}1
As common for collider physics, we use a Cartesian coordinate system with the
axis oriented along the beam axis, the axis on the horizontal plane, and the axis oriented upward. The and axes define the transverse plane, while the axis identifies the longitudinal direction. The azimuthal angle is computed with respect to the axis. The polar angle is used to compute the pseudorapidity . The transverse momentum () is the projection of the particle momentum on the (, ) plane. We fix units such that ., charge, and the pseudorapidity and azimuthal angle coordinates on the inner () and outer surfaces () of the tracker. The track and coordinates are additionally smeared with a 1% Gaussian resolution to model a finite tracker resolution. Calorimeter clusters are characterized by electromagnetic or hadron energy and coordinates. In this simulation, an event has detector inputs on average.The targets for PF reconstruction are stable generator-level particles that are associated to at least one detector element, as particles that leave no detector hits are not reconstructable. Generator particles are characterized by a particle identification (PID) which may take one of the following categorical values: charged hadron, neutral hadron, photon, electron, or muon. In case multiple generator particles all deposit their energy completely to a single calorimeter cluster, we treat them as reconstructable only in aggregate. In this case, the generator particles are merged by adding the momenta and assigning it the PID of the highest-energy sub-particle. In addition, charged hadrons are indistinguishable outside the tracker acceptance from neutral hadrons, therefore we label generated charged hadrons with to neutral hadrons. We also set a lower energy threshold on reconstructable neutral hadrons to based on the delphes rule-based PF reconstruction, ignoring neutral hadrons that do not pass this threshold. A single event from the dataset is visualized in Fig. 1, demonstrating the input multiplicity and particle distribution in the event. We show the differential distributions of the generator-level particles in the simulated dataset on Fig. 2.
We also store the PF candidates reconstructed by delphes for comparison purposes. The delphes rule-based PF algorithm is described in detail in Ref. de Favereau et al. (2014), identifying charged and neutral hadrons based on track and calorimeter cluster overlaps and energy subtraction. Photons, electrons and muons are identified by delphes based on the generator particle associated to the corresponding track or calorimeter cluster. Each event is now fully characterized by the set of generator particles
(target vectors), the set of detector inputs
(input vectors), with(1) | ||||
(2) | ||||
(3) | ||||
(4) |
For input tracks, only the type, , , , , , and features are filled. Similarly, for input clusters, only the type, , , and entries are filled. Unfilled features for both tracks and clusters are set to zero. In future iterations of MLPF, it may be beneficial to represent input elements of different type with separate data matrices to improve the computational efficiency of the model.
Functionally, the detector is modelled in simulation by a function that produces a set of detector signals from the generator-level inputs for an event. Reconstruction imperfectly approximates the inverse of that function . In the following section, we approximate the reconstruction as set-to-set translation and implement a baseline MLPF reconstruction using graph neural networks.
For a given set of detector inputs , we want to predict a set of particle candidates that closely approximates the target generator particle set . The target and predicted sets may have a different number of elements, depending on the quality of the prediction. For use in ML using gradient descent, this requires a computationally efficient set-to-set metric
to be used as the loss function.
We simplify the problem numerically by first zero-padding the target set
such that . This turns the problem of predicting a variable number of particles into a multi-classification prediction by adding an additional “no particle” to the classes already defined by the target PID and is based on Ref. Kieseler (2020). Since the target set now has a predefined size, we may compute the loss function which approximates reconstruction quality element-by-element:(5) | ||||
(6) |
where the target values and predictions
are decomposed such that the multi-classification is encapsulated in the scores and one-hot encoded classes
, while the momentum and charge regression values in . We use CLS to denote the multi-classification loss (e.g. categorical cross-entropy), while REG denotes the regression loss (e.g. mean-squared error) for the momentum components weighted appropriately by a coefficient . This per-particle loss function serves as a baseline optimization target for the ML training. Further physics improvements may be reached by extending the loss to take into account event-level quantities, either by using an energy flow distance as proposed in Ref. Komiske et al. (2019a, b); Romao et al. (2020), or using a generative adversarial network (GAN) setup by optimizing the reconstruction network in tandem with a classifier that is trained to distinguish between the target and reconstructed events, given the detector inputs.
Furthermore, for PF reconstruction, the target generator particles are often geometrically and energetically close to well-identifiable detector inputs. In physics terms, a charged hadron is reconstructed based on a track, while a neutral hadron candidate can always be associated to at least one primary source cluster, with additional corrections taken from other nearby detector inputs. Therefore, we may choose to preprocess the inputs such that for a given arbitrary ordering of the detector inputs (sets of vectors are represented as matrices with some arbitrary ordering for ML training), the target set is arranged such that if a target particle can be associated to a single detector input, it is arranged to be in the same location in the sequence. This data preprocessing step speeds up model convergence, but does not introduce any additional assumptions to the model.
Given the set of detector inputs for the event , we adopt a message passing approach for reconstructing the PF candidates . First, we need to construct a trainable graph adjacency matrix for the given set of input elements, represented with the graph building block in Fig. 3. The input set is heterogeneous, containing elements of different type (tracks, ECAL clusters, HCAL clusters) in different feature spaces. Therefore, defining a static neighborhood graph in the feature space in advance is not straightforward. A generic approach to learnable graph construction using kNN in an embedding space, known as GravNet, has been proposed in Ref. Qasim et al. (2019), where the authors demonstrated that a learnable, dynamically-generated graph structure significantly improves the physics performance of an ML-based reconstruction algorithm for calorimeter clustering.
However, naive kNN graph implementations have time complexity: for each set element out of , we must order the other elements by distance and pick the closest. For reconstruction, given equivalent physics performance, both computational efficiency (a low overall runtime) and scalability (subquadratic time and memory scaling with the input size) are desirable.
We build on the GravNet approach Qasim et al. (2019) by using an approximate kNN graph construction algorithm based on LSH to improve the time complexity of the graph building algorithm. The LSH approach has been recently proposed Kitaev et al. (2020) for approximating and thus speeding up ML models that take into account element-to-element relations using an optimizable matrix known as self-attention Vaswani et al. (2017). The method divides the input into bins using a hash function, such that nearby elements are likely to be assigned to the same bin. The bins contain only a small number of elements, such that constructing a kNN graph in the bin is fast.
In the kNN+LSH approach, the input elements are projected into a -dimensional embedding space by a trainable, elementwise feed-forward network . As in Ref. Kitaev et al. (2020), we now assign each element into one of bins indexed by integers using , where is a hash function that assigns nearby
to the same bin with a high probability. We define the hash function as
where denotes the concatenation of two vectors and and is a random projection matrix of sizedrawn from the normal distribution at initialization.
We now build kNN graphs based on the embedded elements in each of the LSH bins, such that the full sparse graph adjacency in the inputs set
is defined by the sum of the subgraphs. The embedding function can be optimized with backpropagation and gradient descent using the values of the nonzero elements of
. Overall, this graph building approach has time complexity and does not require the allocation of an matrix at any point. The LSH step generates disjoint subgraphs in the full event graph. This is motivated by physics, as we expect subregions of the detector to be reconstructable approximately independently. The existing PF algorithm in the CMS detector employs a similar approach by producing disjoint PF blocks as an intermediate step of the algorithm Sirunyan and others (2017).Having built the graph dynamically, we now use a variant of message passing Gilmer et al. (2017) to create hidden encoded states of the input elements taking into account the graph structure. As a first baseline, we use a variant of graph convolutional network (GCN) that combines local and global node-level information Kipf and Welling (2017); Wu et al. (2019); Xin et al. (2020). This choice is motivated by implementation and evaluation efficiency in establishing a baseline. This message passing step is represented in Fig. 3 by the GCN block. Finally, we decode the encoded nodes
to the target outputs with an elementwise feed-forward network that combines the hidden state with the original input element
using a skip connection.We have a joint graph building, but separate graph convolution and decoding layers for the multi-classification and the momentum and charge regression subtasks. This allows each subtask to be retrained separately in addition to a combined end-to-end training should the need arise. The classification and regression losses are combined with constant empirical weights such that they have a approximately equal contribution to the full training loss. It may be beneficial to use specific multi-task training strategies such as gradient surgery Yu et al. (2020) to further improve the performance across all subtasks.
The multi-classification prediction outputs for each node are converted to particle probabilities with the softmax operation. We choose the PID with the highest probability for the reconstructed particle candidate, while ensuring that the probability meets a threshold that matches a fake rate working point defined by the baseline delphes PF reconstruction algorithm.
The predicted graph structure is an intermediate step in the model and is not used in the loss function explicitly—we only optimize the model with respect to reconstruction quality. However, using the graph structure in the loss function when a known ground truth is available may further improve the optimization process. In addition, access to the predicted graph structure may be helpful in evaluating the interpretability of the model.
The set of networks for graph building, message passing and decoding has been implemented with TensorFlow 2.3 and can be trained end-to-end using gradient descent. The inputs are zero-padded to elements, with the LSH bin size chosen to be such that the number of bins and the number of nearest neighbors . We use two hidden layers for each encoding and decoding net with 256 units each, with two successive graph convolutions between the encoding and decoding steps. Exponential linear activations (ELU) Clevert et al. (2016) were used between hidden layers, linear activations were used for the outputs. Overall, the model has approximately 1.5 million trainable weights and 25,000 constant weights for the random projections. For optimization, we used the Adam Kingma and Ba (2015) algorithm with
for 300 epochs, training over
events, with events used for testing. The events are processed in minibatches of five simultaneous events per graphics processing unit (GPU), we train for approximately 24 hours using five RTX 2070S GPUs using data parallelism. We report the results of the multi-task learning problem in the next section. The code and dataset to reproduce the training are made available on the Zenodo platform Pata et al. (2021a, b).In the model assessment, we focus on the charged and neutral hadron performance in the simulation events that were not used for training. In typical PF reconstruction, charged hadrons are reconstructed based on tracking information, while neutral hadrons are reconstructed from HCAL clusters not matched to tracks. In Fig. 4, we see that both the baseline rule-based PF in delphes and MLPF models generally predict the charged and neutral particle multiplicity with a high degree of correlation, suggesting that the multi-classification model is appropriate for reconstructing variable-multiplicity events. We note that the particle multiplicities from the MLPF model generally correlate better with the generator-level target than the rule-based PF.
In Fig. 5
, we compare the per-particle multi-classification confusion matrix for both reconstruction methods. We see overall a similar classification performance, with the neutral hadron identification efficiency being around
for both, while the MLPF algorithm has a slightly higher efficiency ( MLPF versus for the rule-based PF). Improved Monte Carlo generation, subsampling, or weighting may further improve reconstruction performance for particles or kinematic configurations that occur rarely in a physical simulation. In this set of results, we apply no weighting on the events or particles in the event.In Fig. 6, we see that the -dependent charged hadron efficiency (true positive rate) for the MLPF model is somewhat higher than for the rule-based PF baseline, while the fake rate (false positive rate) is equivalently zero, as the delphes simulation includes no fake tracks. From Fig. 7, we observe a similar result for the energy-dependent efficiency and fake rate of neutral hadrons. Both algorithms exhibit a turn-on at low energies and show a constant behaviour at high energies, with MLPF being comparable or slightly better than the rule-based PF baseline.
Furthermore, we see on Figs. 9 and 8 that the energy, energy () and angular resolution of the MLPF algorithm are generally comparable to the baseline for neutral (charged) hadrons.
Overall, these results demonstrate that formulating PF reconstruction as a multi-task ML problem of simultaneously identifying charged and neutral hadrons in a high-PU environment and predicting their momentum may offer comparable or improved physics performance over hand-written algorithms in the presence of sufficient simulation samples and careful optimization. The performance characteristics for the baseline and the proposed MLPF model are summarized in Table 1.
We also characterize the computational performance of the GNN-based MLPF algorithm. In Fig. 10, we see that the average inference time scales roughly linearly with the input size, which is necessary for scalable reconstruction at high PU. We also note that the GNN-based MLPF algorithm runs natively on a GPU, with the current runtime at around 50 ms/event on a consumer-grade GPU for a full 200 PU event. The algorithm may be relatively simple to port efficiently to any computing architecture that supports common ML frameworks like TensorFlow without significant investment. This includes GPUs and potentially even field-programmable gate arrays or ML-specific processors such as the GraphCore intelligence processing units Mohan et al. (2020) through specialized ML compilers Duarte et al. (2018); Iiyama and others (2021); Heintz et al. (2020). These coprocessing accelerators can be integrated into existing CPU-based experimental software frameworks as a scalable service that grows to meet the transient demand Duarte et al. (2019); Krupa et al. (2020); Rankin et al. (2020).
Charged hadrons | Neutral hadrons | |||
Metric | Rule-based PF | MLPF | Rule-based PF | MLPF |
Efficiency | 0.903 | 0.952 | 0.888 | 0.906 |
Fake rate | 0 | 0 | 0.191 | 0.069 |
() resolution | 0.211 | 0.137 | 0.351 | 0.324 |
resolution | 0.245 | 0.250 | 0.05 | 0.059 |
resolution | 0.009 | 0.004 | 0.032 | 0.013 |
We have proposed an algorithm for MLPF reconstruction in a high-pileup environment for a general-purpose multilayered particle detector based on transforming input sets of detector elements to the output set of reconstructed particles. The MLPF implementation with GNNs is based on graph building with a LSH approximation for kNN, dubbed LSH+kNN, and message passing using graph convolutions. Based on a benchmark particle-level dataset generated using pythia 8 and delphes 3, the MLPF GNN reconstruction offers comparable physics performance for charged and neutral hadrons to the baseline rule-based PF algorithm in delphes, demonstrating that a purely parametric ML-based PF reconstruction can reach the physics performance of existing reconstruction algorithms, while allowing for greater portability across various computing architectures at a possibly reduced cost. The inference time empirically scales approximately linearly with the input size, which is useful for efficient evaluation in the high-luminosity phase of the LHC. In addition, the ML
-based reconstruction model may offer useful features for downstream physics analysis like per-particle probabilities for different reconstruction interpretations, uncertainty estimates, and optimizable particle-level reconstruction for rare processes including displaced signatures.
The MLPF model can be further improved with a more physics-motivated optimization criterion, i.e. a loss function that takes into account event-level, in addition to particle-level differences. While we have shown that a per-particle loss function already converges to an adequate physics performance overall, improved event-based losses such as the object condensation approach or energy flow may be useful. In addition, an event-based loss may be defined using an adversarial classifier that is trained to distinguish the target particles from the reconstructed particles.
Reconstruction algorithms need to adapt to changing experimental conditions—this may be addressed in MLPF by a periodic retraining on simulation that includes up-to-date running condition data such as the beam-spot location, dead channels, and latest calibrations. In a realistic MLPF training, care must be taken that the reconstruction qualities of rare particles and particles in the low-statistics tails of distributions are not adversely affected and that the reconstruction performance remains uniform. This may be addressed with detailed simulations and weighting schemes. In addition, for a reliable physics result, the interpretability of the reconstruction is essential. The reconstructed graph structure can provide information about causal relations between the input detector elements and the reconstructed particle candidates.
In order to develop a usable ML-based PF reconstruction algorithm, a realistic high-pileup simulated dataset that includes detailed interactions with the detector material needs to be used for the ML model optimization. To evaluate the reconstruction performance, efficiencies, fake rates, and resolutions for all particle types need to be studied in detail as a function of particle kinematics and detector conditions. Furthermore, high-level derived quantities such as pileup-dependent jet and missing transverse momentum resolutions must be assessed for a more complete characterization of the reconstruction performance. With ongoing work in ML-based track and calorimeter cluster reconstruction upstream of PF, and ML-based reconstruction of high-level objects including jets and jet classification probabilities downstream of PF, care must be taken that the various steps are optimized and interfaced coherently.
Finally, the MLPF algorithm is inherently parallelizable and can take advantage of hardware acceleration of GNNs via GPUs, FPGAs or emerging ML-specific processors. Current experimental software frameworks can easily integrate coprocessing accelerators as a scalable service. By harnessing heterogeneous computing and parallelizable, efficient ML, the burgeoning computing demand for event reconstruction tasks in the high-luminosity LHC era can be met while maintaining or even surpassing the current physics performance.
GPU coprocessors as a service for deep learning inference in high energy physics
. Note: Submitted to Mach. Learn.: Sci. Technol. External Links: 2007.10359 Cited by: §4.