Scalable Deep Convolutional Neural Networks for Sparse, Locally Dense Liquid Argon Time Projection Chamber Data

by   Laura Dominé, et al.

Deep convolutional neural networks (CNNs) show strong promise for analyzing scientific data in many domains including particle imaging detectors such as a liquid argon time projection chamber (LArTPC). Yet the high sparsity of LArTPC data challenges traditional CNNs which were designed for dense data such as photographs. A naive application of CNNs on LArTPC data results in inefficient computations and a poor scalability to large LArTPC detectors such as the Short Baseline Neutrino Program and Deep Underground Neutrino Experiment. Recently Submanifold Sparse Convolutional Networks (SSCNs) have been proposed to address this challenge. We report their performance on a 3D semantic segmentation task on simulated LArTPC samples. In comparison with standard CNNs, we observe that the computation memory and wall-time cost for inference are reduced by factor of 364 and 33 respectively without loss of accuracy. The same factors for 2D samples are found to be 93 and 3.1 respectively. Using SSCN, we present the first machine learning-based approach to the reconstruction of Michel electrons using public 3D LArTPC samples. We find a Michel electron identification efficiency of 93.9% with 98.8% of true positive rate. Reconstructed Michel electron clusters yield 96.1% in average pixel clustering efficiency and 97.3% in purity. The results are compelling to show strong promise of scalable data reconstruction technique using deep neural networks for large scale LArTPC detectors.



There are no comments yet.


page 3

page 6

page 7

page 8

page 9


Single-shot 3D shape reconstruction using deep convolutional neural networks

A robust single-shot 3D shape reconstruction technique integrating the f...

An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation

Deep convolutional neural networks (CNNs) have shown excellent performan...

SBNet: Sparse Blocks Network for Fast Inference

Conventional deep convolutional neural networks (CNNs) apply convolution...

Inferring Convolutional Neural Networks' accuracies from their architectural characterizations

Convolutional Neural Networks (CNNs) have shown strong promise for analy...

Scalable Graph Learning for Anti-Money Laundering: A First Look

Organized crime inflicts human suffering on a genocidal scale: the Mexic...

Detecting Malicious PowerShell Commands using Deep Neural Networks

Microsoft's PowerShell is a command-line shell and scripting language th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep convolutional neural networks (CNNs) have become the standard state-of-the-art machine learning (ML) technique in the fields of computer vision, natural language processing, and other scientific research domains 

LeCun et al. (2015). Applications of CNNs are actively developed for neutrino oscillation experiments Acciarri et al. (2017a); Adams et al. (2018); Radovic et al. (2018) that employ a liquid argon time projection chamber (LArTPC). LArTPCs are a type of particle imaging detector which can take 2D or 3D photographs of charged particles’ trajectories with a breathtaking resolution (mm/pixel). The current and future LArTPC experiments include the MicroBooNE Acciarri et al. (2017b), Short Baseline Near Detector (SBND) Antonello et al. (2015), ICARUS Amerio et al. (2004) and the Deep Underground Neutrino Experiment (DUNE) Acciarri et al. (2016).

Particle trajectories in LArTPC data, many of which have the shape of 1D lines, are recorded in 2D or 3D matrix format with an approximate pixel resolution of 3 mm to 5 mm. Each image has millions to billions of pixels for large LArTPC detectors (e.g. MicroBooNE produces 80 mega-pixels images). Those trajectories are produced by ionization electrons, and are thin (a few pixels in width) and continuous. In each recorded data, depending on the experimental environment, there may be a few to dozens of particle trajectories. Therefore LArTPC images are generally sparse, yet locally dense (i.e. no gap in between pixels that form a trajectory). This characteristic of LArTPC data poses two serious challenges for the application of CNNs. First, it is computationally inefficient for LArTPC data which is mostly filled with zeros. Second, in photographs for which CNNs are originally developed, all pixels carry information. The strength of CNNs to automatically extract signal features may be affected when applied on mostly zero-filled LArTPC data.

Recently a Submanifold Sparse Convolutional Network (SSCN) Graham and van der Maaten (2017); Graham et al. (2018) has been proposed to address these concerns with data represented by sparse matrix or point clouds. In this paper we demonstrate that SSCN holds strong promise for analyzing LArTPC image data with respect to both accuracy and computational efficiency, thus for being scalable to future large detectors including DUNE. Our contributions include:

  • Demonstration of the better performance and scalability of sparse techniques for LArTPC data reconstruction tasks.

  • Study of typical mistakes made by the algorithms and mitigation methods.

  • First ML-based approach for the reconstruction of Michel electrons using a public simulation sample, and quantification of the purity and efficiency of this approach.

All studies in this paper are reproducible using a Singularity software container111 Sochat VV (2017), our implementation of semantic segmentation algorithms222, and public data samples333 Adams et al. (2019) provided and maintained by DeepLearnPhysics collaboration.

Section II gives an overview of the public data set used in this benchmark. Section III details the design of a neural network, U-ResNet, chosen for studying the impact of SSCN. Section IV describes our experiment including the performance metrics and training setup. Section V presents the results including the performance comparison between a SSCN (sparse) and standard (dense) implementations of U-ResNet. We discuss causes of poor performance of U-ResNet and propose mitigation methods in Section VI. Lastly, in Section VII, we present our approach and the results of reconstructing Michel electrons in the public simulation sample using the sparse U-ResNet.

Figure 1: Simulated LArTPC event data (left) and labels (right). The data shows energy deposits from charged particle trajectories. The color corresponds to an energy scale. In the label image, each pixel is assigned one of five colors: minimum ionizing particles (MIP) in cyan, heavily ionizing particles (HIP) in blue, electromagnetic showers in green, delta rays in yellow and Michel electrons in orange.

Ii Data Set

ii.1 Particle Images

In this paper we use 2D and 3D LArTPC simulation samples made publicly available by the community Adams et al. (2019). These are images of particles traversing a cubic volume of liquid argon, whose size can be px, px or px. The spatial resolution of each pixel is 3 mm. The dataset contains 100,000 images for each size. We split the whole sample into 80 % and 20 % fractions as train and test sets respectively. There are two sources of particles in this dataset:

  • Single, isolated particle: an electron, muon, anti-muon or proton. 1 to 10 such particles are generated in a larger volume, and some are captured and recorded in data.

  • Multi-particle vertex: multiples particles produced at the same 3D point, including electrons, gamma-rays, muons, anti-muon, charged pions, and protons.

Particle interactions with the liquid argon medium are simulated using Geant4 Agostinelli et al. (2003) and LArSoft Snider and Petrillo (2017). The particle energy depositions are recorded in each pixel. The drift simulation is not included in this dataset. However, energy depositions are smeared by a few pixels to mimic a diffusion effect while total energy is conserved.

ii.2 Labels

Among the available benchmark methods in this public dataset, we choose the semantic segmentation. The task is to predict a class of particle at a pixel-level. The labels in the dataset for supervised learning include five possible classes for each pixel:

HIP MIP Shower Delta rays Michel
Fraction 17 % 34 % 47 % 1 % 1 %
Table 1: Average fractions of non-zero pixels per event for each class in the dataset.
  • Michel electrons from the decay of muons

  • Delta ray electrons from hard scattering

  • Showers created by electromagnetic particles, and , with kinetic energy above critical value (about MeV in argon)

  • Protons, referred to as heavily ionizing particle (HIP)

  • Minimum ionizing particles (MIP) such as muons or pions

The statistics of each class in the dataset are shown in Table 1. Figure 1 shows an example of a simulated image from this dataset and the corresponding pixel-wise labels. More details about the dataset can be found in the reference Adams et al. (2019).

Figure 2:

U-ResNet architecture for semantic segmentation. In this example we say that the U-ResNet has a depth of 3 since we perform 3 downsamplings. Turquoise boxes represent convolutions with stride 2 and increasing the number of filters. Dark blue boxes are transpose convolutions with stride 2 and decreasing the number of filters. Purple boxes are convolutions with stride 1 that decrease the number of filters. The spatial size of feature maps is constant across the horizontal dimension.

Iii Network architectures

iii.1 Dense U-ResNet: baseline

We use a network architecture which we call U-ResNet. It is a hybrid between two popular architectures: U-Net Ronneberger et al. (2015) and ResNet He et al. (2016).

U-Net is an auto-encoder network architecture (Figure 2

) which has been successful for medical image segmentation. It is made of two parts: the first half downsizes the spatial size (using strided convolutions in our case) the input image with several convolution blocks. This part learns the image features on different scales in a hierarchical manner, yielding a tensor with a low spatial resolution but a large number of channels. These channels contain a lot of compressed feature information: hence it is called the encoder part of the U-Net. The number of downsizing operations is referred to as

depth in this paper, and affects the receptive field area of the network. The second half of the network applies to this tensor several up-sampling (we use transpose convolutions) and convolution operations. It is called a decoding path. Concatenation between the feature map of the previous layer in the decoding path and the feature map of the same spatial size in the encoding path helps the decoder to restore the original image resolution. The input image is single channel but the output has as many channels as there are classes.

U-Net is a generic CNN architecture. In our case each block of convolutions/up- or down- sampling is made of two convolution layers, followed by a batch normalization and a rectified linear unit (ReLU) function. According to the ResNet architecture we also add residual skip connections which allow the network to learn faster and be deeper. In our implementation the number of filters at each layer increases with depth in a power law for our dense U-ResNet and linearly for our sparse U-ResNet.

The strong performance of this network for 2-classes semantic segmentation (between particle track and electromagnetic shower) at a pixel level on real detector data was already demonstrated Adams et al. (2018) by MicroBooNE experiment, which makes it a network of choice to benchmark a sparse technique on LArTPC simulation data.

iii.2 Submanifold Sparse Convolutional Networks

The key element of these networks Graham and van der Maaten (2017) is a so-called submanifold sparse convolution operation. It was designed for cases where the effective dimension of the data is lower than the ambient space, for example a 2D surface or a 1D curve in a 3D space. For such cases the standard dense convolutions are not suited for several reasons:

  • Traditional convolutions involve a number of dense matrix multiplication operations, which are computationally inefficient for sparse data.

  • The submanifold dilation problem is described in the reference Graham and van der Maaten (2017): a single non-zero site in the image yields non-zero sites in the next feature map after a dense convolution, where is the spatial dimension (in our case or ). After 2 convolutions there will be non-zero sites, and so on. This inescapable growth ”dilates” the originally sparse image which becomes denser.

The idea of SSCNs is to keep the same level of sparsity throughout the network computations, especially convolutions. It has been shown to require significantly less computations while outperforming the dense CNNs on two 3D semantic segmentation challenges in the field of computer vision Graham et al. (2018).

The reference Graham and van der Maaten (2017) defines two new operations. First, sparse convolutions with input features, output features, filters and a stride . They address the first issue mentioned above and work in the same way as standard convolutions except they assume that the input from non-active pixels, which are zero or close to zero, is zero. The output feature map will have a size where is the size of the input. Secondly they define a submanifold sparse convolution with similar notations as a modified

: the input is padded with

zeros on each side to ensure that the output image will have the exact same size. An output pixel will be nonzero if and only if the central pixel of the receptive field is nonzero in the input feature map. SSC operation tackles the second issue by constraining the output sparsity. In order to build complete CNNs based on these two operations, the authors also define a set of other custom operations such as activation functions and batch normalization layers by restricting the corresponding standard operations to the set of nonzero pixels.

Iv Experiments

We perform two sets of experiments. First, we compare the performance between dense and sparse U-ResNet using several evaluation metrics for 2D and 3D samples. Secondly we study the variation of the performance of sparse U-ResNet with key architecture hyper-parameters and different image sizes.

iv.1 Evaluation Metrics

The network is trained by minimizing a loss which is a softmax cross-entropy loss averaged over all the pixels of an image. We define different metrics of interest:

  • Non-zero accuracy: fraction of non-zero pixels whose label is correctly predicted.

  • Class-wise non-zero accuracy: for each event and for each class, fraction of non-zero pixels in that class that are correctly predicted.

  • Resources usage during the training and testing time:

    • GPU memory occupied

    • Computation wall time

iv.2 Implementation and Training Details

All networks were implemented using the PyTorch

Paszke et al. (2017) (version 1.0) deep learning framework. SSCN relies on the library SparseConvNet 444 We use LArCV2 555 to interface with the LArTPC data files. To train the networks we used ADAM optimizer Kingma and Ba (2014) with the default learning rate of . We trained the networks for 30k iterations in 3D and 40k iterations in 2D. We used NVIDIA V100 GPUs with 32Gb memory. On 3D images of size px approximately 10 and 212 hours were required for convergence of the sparse and dense networks respectively.

V Results

Notation: we write, for example, [2D, 512px, 5-16] to represent ”2D images of size 512px, and U-ResNet of depth 5 with 16 filters”.

v.1 Sparse vs dense U-ResNet

We start by comparing the performance of dense versus sparse U-ResNet using the non-zero accuracy metric as well as the computational resources usages at train and inference (or test) time.

Dense Sparse
Batch size 4 4 64
Image size 192px 192px 192px 512px 768px
Nonzero accuracy mean 92% 94% 98% 99% 99%
Nonzero accuracy std 0.096 0.088 0.049 0.014 0.0037
Nvidia V100 GPU
Memory (test) [GB] 16 0.044 0.19 0.67 1.3
Memory (train) [GB] 26x4 0.21 1.3 5.1 9.3
Wall-time (test) [s] 3.3 0.10 0.66 2.4 4.4
Wall-time (train) [s] 25 0.21 1.2 5.0 8.8
Intel Xeon Silver 4110 CPU
Memory (test) [GB] - 0.57 0.81 1.9 3.0
Memory (train) [GB] - 0.59 1.9 3.9 4.0
Duration (test) [s] - 0.25 1.7 8.0 16
Duration (train) [s] - 1.1 6.1 24 47
Table 2: Sparse and dense U-ResNet scalability with the 3D image spatial size. The dense U-ResNet could not fit 3D images of size 512px nor 768px on a single GPU. Both sparse and dense networks here have a depth 5 and number of filters 16.

As shown in Table 2, for a fixed 3D image size of 192px and identical training parameters (notably batch size), the final non-zero accuracy mean value over the whole dataset for the sparse U-ResNet is slightly higher than the dense counterpart by 2%. However using the exact same architecture doesn’t do justice to the real feat of SSCN: the GPU memory usage and computation duration are drastically cut down using sparse convolutions, which allows us to train the sparse U-ResNet with a dramatically larger batch size and a larger 3D image size. Harnessing both of these advantages allows one to beat the baseline dense 3D U-ResNet by a large margin in non-zero accuracy.

Figures 3 and 4 show the variation of memory and computation wall-time for sparse U-ResNet in [2D, 512px, 5-16]. The latter grows linearly but slowly as a function of batch size, which makes larger batch sizes practical not only for training but also for the inference. In particular, the sparse U-ResNet can easily process a whole MicroBooNE event data with a conventional GPU (memory of 4 to 11GB). The resource usage scales well with the batch size to handle ICARUS detector, which is about 6 times larger than MicroBooNE. At the batch size 88, which is the maximum possible for a single NVIDIA V-100 GPU with the dense version, the reduction factors for memory and computation wall-time with the sparse U-ResNet are 93 and 3.1 respectively. Further, because the computational cost scales with nonzero pixel count instead of the total pixel count in the bounded volume, sparse U-ResNet will be an ideal solution for DUNE far detector which will be sparser in the absence of cosmic rays. These benefits apply to a training phase of an algorithm. Figure 5 shows how using sparse U-ResNet speeds up the training by several orders of magnitude. This is crucial for reconstruction algorithm R&D work which often requires a short turn-around time for development.

Figure 3: GPU memory usage as a function of batch size at inference time [2D, 512px, 5-16].
Figure 4: Computation wall-time as a function of batch size at inference time [2D, 512px, 5-16].
Figure 5: Nonzero accuracy as a function of wall-time during the training [3D, 192px, 5-16]. The sparse U-ResNet uses a batch size of 64, dense U-ResNet uses a batch size of 4.
Figure 6: Standard deviation of the mean softmax value of pixels predicted as shower pixels in an image, as a function of the training iteration step. The sparse U-ResNet appears to learn in a more uniform manner across the pixels [2D, 512px, 5-16].
Figure 7: Top: energy depositions in the image, the pixel color corresponds to an energy scale. Bottom: labels, each color corresponds to a different class. Electromagnetic shower pixels are colored in purple and MIP pixels are in green.
Figure 8: Dense U-ResNet evolution of softmax value for EM shower (top) and MIP (bottom) across training iterations.
Figure 9: Sparse U-ResNet evolution of softmax value for EM shower (top) and MIP (bottom) across training iterations.

Finally looking at the evolution of the softmax scores for different classes across iterations indicates that the sparse U-ResNet may be learning more uniformly over pixels than its dense equivalent. Figure 6

shows how the standard deviation of the mean softmax value in the image evolves with the training iterations. The dense network results in a much higher variance. Their variances end up converging after about 1000 iterations. This observation is illustrated in Figure 

7, in which a MIP trajectory crosses an EM shower. Figures 8 and 9 compare how the softmax scores for track and shower particles change over training iterations between sparse and dense U-ResNet. The difference appears most strikingly at the iteration 40, where the dense network is extremely confident in some pixels (yellow ones) and still very unsure about others (in dark blue), while the sparse one is increasing its confidence level much more uniformly across all pixels.

v.2 Sparse U-ResNet Performance Variation

Figure 10: Memory usage of sparse U-ResNet with depth 6, 3D 512px images and a varying number of initial filters.
Figure 11: Computation wall-time of sparse U-ResNet with depth 6, 3D 512px images and a varying number of initial filters.

We study the influence of the two main parameters of the network architecture on performance and resource usage: depth (number of layers) and the number of filters in the first layer. Table 3, Figure 10 and 11 show the results of non-zero accuracy and computational resource usage respectively. The filter counts have a larger effect on achieving a higher accuracy while it also causes a linear increase in memory usage. The increase in the computation wall-time is only % between 8 and 32 filter counts.

Filters 8 16 32
Depth 6 98.94% 99.16% 99.23%
Depth 5 98.86% 99.07% 99.06%
Depth 4 98.74% 99.00% 99.07%
Table 3: Comparison of the non-zero accuracy at inference time on the test set of 3D 512px images for sparse U-ResNet, for different depths and initial number of filters.
Test image 192px 768px
Train image 192px 512px 768px 192px 512px 768px
HIP 96.0% 95.6% 93.7% 98.8% 99.0% 98.9%
MIP 96.2% 96.6% 95.4% 99.4% 99.7% 99.6%
EM shower 97.6% 96.9% 96.6% 99.5% 99.6% 99.7%
Delta rays 74.3% 76.7% 75.1% 85.9% 89.6% 90.1%
Michel 36.5% 42.6% 43.9% 62.6% 70.0% 70.4%
Overall 98.0% 98.1% 97.7% 98.9% 99.2% 99.3%
Table 4: Class-wise non-zero accuracy. Comparing the performance of sparse U-ResNet with different 3D image sizes at train and test time. The batch size, the depth, and the initial number of filters are 64, 5, and 16 respectively.

Table 4 shows the result of comparing network class-wise non-zero accuracies for varying 3D image sizes at the train and test times. For a given image size at train time, using a larger image size at test time systematically improves the performance. This table also shows that the class-wise accuracy of delta rays and Michel electrons is lower than other classes across all image sizes.

v.3 Mistakes Analysis

One may consider that a poor performance may be partially due to particle trajectories being cut out at the recorded volume boundaries. We looked at the distribution of the fraction of mis-classified pixels as a function of their distance to the boundaries, which is defined as follows in



where runs from 1 to 3 for 3D data. In other terms, the distance of the pixel to the image boundaries is the distance from the pixel to the closest face of the cubic image boundaries.

Figure 12 shows that, in general, pixels are more likely mis-classified near image boundaries as expected. We can see that, however, this is not clearly visible for Michel electrons and delta rays. Therefore this is not an explanation for the poor performance on these two classes. We investigated possible explanations beyond originally planned experiments, and report our findings in the following sections.

Figure 12: Fraction of misclassified pixels as a function of the pixel distance to the image boundaries [3D, 192px, 5-16].

Figure 13 shows some 3D images of size 512px, which are examples of typical mistakes made by the sparse U-ResNet. This includes: Michel electrons mistaken for an electromagnetic shower and vice-versa, HIP mistaken for a MIP and the track-like beginning of a short EM shower mistaken for a MIP.

Figure 13: Typical mistakes of sparse U-ResNet [3D, 512px, 5-16]. Images are selected among the worst 0.05% with respect to the non-zero accuracy metric. Mistakes are circled in red. Left column: data. Middle column: labels. Right column: predictions of the network. First row: an electromagnetic shower is mistaken for a Michel electron. Second row: a Michel electron is mistaken for an electromagnetic shower. Third row: a part of a HIP is mistaken for a MIP. Fourth row: a short shower is mistaken for a (MIP) track.

Vi Michel Electrons and Delta Rays

We propose two hypothetical explanations for low prediction accuracies on Michel and delta ray pixels by U-ResNet. The first is statistical imbalance in the fraction of pixels of each class: delta rays and Michel electrons represent each about 1 % of the total pixels. The second is an ambiguous definition of these two classes: both Michel electrons and delta rays can emit gamma rays (e.g. Bremsstrahlung radiation) which appear to be indistinguishable from EM shower class. During training, we employed a softmax loss for classifying pixels under the assumption of exclusive class definitions, which may not hold for these classes.

We implemented two changes in order to test our hypothesis. The first is a modification to the pixel labels used in our supervised training. For Michel electrons and delta rays, pixels are re-labeled as EM shower except for those that belong to a primary ionization trajectory, which carries distinctive features. Secondly we experimented a pixel-wise loss weighting factor to accommodate statistical imbalance across five classes. This allows U-ResNet to focus more on pixels with low statistics, inspired by attention mechanisms.

Train data Regular Relabeled Relabeled+Weights
Test data Regular Relabeled
HIP 98.0% 98.1% 98.1% 99.3%
MIP 99.4% 99.2% 99.4% 98.1%
Shower 99.4% 97.9% 99.2% 99.2%
Delta rays 85.7% 94.8% 96.0% 97.2%
Michel 56.6% 94.4% 94.7% 95.7%
Overall 99.2% 99.2% 99.6% 99.1%
Table 5: A comparison of class-wise nonzero accuracies between 3 flavors of sparse U-ResNet: regular, trained with relabeled dataset, and trained with both the relabeled dataset and the weighting scheme [3D, 512px, 5-32]. We also compare the performance of the regular sparse U-ResNet on a test relabeled dataset.

We train a sparse U-ResNet using the re-labeled dataset and optionally the pixel-wise loss weighting scheme. The results are presented in Table 5. First, regardless of whether the U-ResNet was trained on the regular or relabeled dataset, the non-zero accuracy on Michel electrons increased by more than 40%. This implies that the algorithm did learn the distinctive features of Michel electrons and delta rays without relabeling. Secondly, we see a slight improvement for delta rays and EM shower pixels by training with the re-labeled dataset. Finally, pixel-wise loss weighting further improved the accuracy of both Michel and delta ray classes as expected.

Vii Michel Electron Reconstruction

Finally we present a study on reconstructing Michel electron energy spectrum using the public simulation sample. Michel electron is one of well understood physics signals, and thus useful for detector calibrations. This analysis has been done by LArTPC experiments with real data including MicroBooNE and ICARUS Acciarri et al. (2017c); Amoruso et al. (2004). Our contribution is to show the first ML-based approach with quantification of both efficiency and purity of reconstructed signal.

vii.1 Reconstruction Method

Our goal is to quantify the efficiency and purity of clustering Michel electron energy depositions by only using the primary ionization component of its trajectory. We use the 3D 512px images from the re-labeled sample presented in the previous section. After running the U-ResNet for semantic segmentation we isolate pixels that belong to each of the five classes. We run a common density-based spatial clustering algorithm DBSCAN Ester et al. (1996) to identify different predicted Michel electron clusters and MIP clusters. We then select the candidate Michel electron clusters that are attached to the edge of a predicted MIP cluster. Here ”attached” is defined as less than 1px distance between the nearest pixels of a Michel electron and MIP clusters. The ”edginess” of a given pixel is evaluated by masking surrounding pixels within the radius of 15px, and making sure that the DBSCAN algorithm only finds one cluster when run over the remaining MIP cluster pixels.

vii.2 Performance Metrics

After identifying candidate Michel electron clusters, we match each of them to a true Michel cluster by maximizing the overlap pixel count between true and predicted Michel cluster. We can then define several performance metrics. Let us define notations: is the total number of pixels in the predicted Michel electron cluster , the total number of pixels in the matched true Michel electron cluster , and the number of pixels which belong to the intersection of both candidate and matched Michel electron clusters. Then we define clustering efficiency and purity as and respectively.

Similarly if is the total number of true Michel electron clusters in the sample, is the total number of candidate Michel electron clusters, and is the number of matched candidate Michel electron clusters over the whole sample, then we define ID efficiency and purity as and respectively.

vii.3 Results

Figure 14 shows the pixel count of matched true Michel electron clusters against the pixel count of reconstructed Michel electron clusters. As expected, most of the clusters lie on the diagonal. The majority of strayed clusters are present below the diagonal and are under-clustered.

Figure 14: A comparison of pixel counts between the true and candidate Michel electron clusters.

Table 6 shows the evaluation metrics with and without an analysis quality cut, which requires reconstructed Michel electron clusters to contain 10 or more pixels. We find that 91.8% of the reconstructed Michel electrons have both cluster efficiency and purity above 95%. MicroBooNE collaboration has published Michel electron reconstruction study with 2% ID efficiency and 80-90% ID purity where the focus of the analysis was to maximize the purity of the sample for accurate energy reconstruction Acciarri et al. (2017c). The outcome of this study with the public simulation sample cannot be directly compared with others using real detector data because the public simulation sample lacks complicated detector effects. However, the results are compelling to show the promise of ML-based reconstruction approach.

Cut None 10
Sample size 7105 7068
ID purity 98.8 % 99.2 %
ID efficiency 93.9 % 93.4 %
Cluster efficiency 96.1 % 96.7 %
Cluster purity 97.3 % 97.8 %
Table 6: ID purity and efficiency as well as cluster purity and efficiencies of reconstructed Michel electrons. The sample size is the number of true positives. The cluster efficiency and purity are averaged over all reconstructed Michel electron clusters.

Finally, using the matched candidate Michel electrons, we compare the reconstructed and true energy distribution in the primary ionization component. Figure 15 shows a reasonable agreement. In order to reconstruct the total true Michel electron energy, the reconstruction step needs to account for EM shower pixels resulting from Bremsstrahlung radiation as described in the MicroBooNE publication Acciarri et al. (2017c). This is out of the scope of this paper.

Figure 15: Energy spectrum of Michel electrons. The true Michel electron energy is the total energy of the Michel electron. The primary ionization energy is the energy of Michel after relabeling. The candidate Michel energy is the sum of pixel values predicted as Michel electron by the U-ResNet [3D, 512px, 5-32].

Viii Conclusions

In this paper, we demonstrated the strong performance of SSCN against our baseline dense CNN for LArTPC data reconstruction, specifically for the task of semantic segmentation to identify five particle classes at a pixel-level. We employed U-ResNet, an architecture pioneered by MicroBooNE collaboration, and showed that the implementation using SSCN makes a drastic improvement in the computational resource usage. For U-ResNet under the same condition of batch size 4 with 192px 3D images, SSCN reduces the computational cost in memory and wall-time at inference by a factor of 354 and 33 respectively as shown in Table 2. For 2D samples, using batch size 88, those reduction factors are 93 and 3.1 respectively. While a naive application of standard CNN for 3D data (e.g. the DUNE near detector) comes with prohibitive and extremely inefficient computational resource usage, we demonstrated that SSCN can mitigate such costs and generalize U-ResNet for 3D data samples without loss in the algorithm performance.

We presented the first demonstration of reconstructing Michel electron clusters, defined as the primary ionization component of a trajectory, using a primarily ML-based method. Our result using the public simulation sample shows a naive approach with DBSCAN on U-ResNet output can yield 93.9% Michel electron identification efficiency with 98.8% true positive rate. Pixel clustering efficiency for reconstructed Michel electrons is found to be 96.1% with the purity of 97.3%. In particular, 91.8% of reconstructed Michel electrons are found to carry both the efficiency and purity of clusters above 95%.

SSCN is a solution to address scalable CNN applications for LArTPC data, which is generically sparse but locally dense. Furthermore, SSCN is a generic alternative to dense CNN, and can be applied to tasks beyond semantic segmentation including image classification, object detection and more. We strongly recommend SSCN for any CNN applications that exist for LArTPC experiments including MicroBooNE, ICARUS, SBND, and DUNE.

Ix Acknowledgement

This work is supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics, and Early Career Research Program under Contract DE-AC02-76SF00515.