Deep Ensemble Analysis for Imaging X-ray Polarimetry

07/08/2020 ∙ by A. L. Peirson, et al. ∙ Stanford University 0

We present a method for enhancing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on the Imaging X-ray Polarimetry Explorer (IXPE). Our analysis measures photoelectron directions, X-ray absorption points and X-ray energies for 2-8keV event tracks, with estimates for both the statistical and systematic (reconstruction) uncertainties. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNet convolutional neural networks, trained on Monte Carlo event simulations. We define a figure of merit to compare the polarization bias-variance trade-off in track reconstruction algorithms. For power-law source spectra, our method improves on current state-of-the-art (and previous deep learning approaches), providing  45 increase in effective exposure times. For individual energies, our method produces 20-30 polarized events, while keeping residual systematic modulation within 1 sigma of the finite sample minimum. Absorption point location and photon energy estimates are also significantly improved. We have validated our method with sample data from real GPD detectors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 5

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

X-ray polarization measurements offer rich opportunities to probe the magnetic field topology and emission physics of high energy astrophysical sources [krawczynski_using_2019]. However, in the classical soft X-ray band (1-10 keV) such measurements have long been elusive, with only the Crab nebula providing a solid detection [weisskopf_measurement_1976]. Happily, the recent development of photo-electron tracking detectors [costa_efficient_2001] has greatly improved soft X-ray polarimetry prospects. These imaging X-ray polarimeters offer lower background and better control of systematic signals, and allow the study of important extended sources, such as Supernova Remnants (SNR) and Pulsar Wind Nebulae (PWNe). The gas pixel detector (GPD) [see bellazzini_sealed_2007] has brought this capability to the PolarLight CubeSat test [feng_x-ray_2020], the scheduled IXPE mission [sgro_imaging_2019], and the potential Chinese mission, eXTP [zhang_extp_2017].

IXPE [weisskopf_overview_2018, odell_imaging_2018, planned for launch in 2021] will use three co-aligned X-ray telescopes, whose focal planes are imaged by GPDs with hexagonal pixels. IXPE’s sensitivity is limited by the track analysis algorithm used to recover source polarization, spatial structure and energy, given a measured set of electron track images. In this work, we demonstrate a substantial improvement over the current state of the art track reconstruction. While the results shown here are specific to IXPE’s GPDs, the methods are general, and can be applied to other imaging detector geometries.

In the keV range, the cross-section for photoelectron emission is proportional to cos, where

is the angle between the normal incidence X-ray’s electric vector position angle (EVPA) and the azimuthal emission direction of the photoelectron. By measuring a large number of individual photoelectron emission angles, one can recover the above distribution to extract the source polarization parameters: polarization fraction (

, or equivalently ) and EVPA (). In practice, the recovery of photoelectron emission angles from track images is imperfect. Track images are noisy due to Coulomb scattering and diffusion, and, especially for low energies, are often barely resolved from the Bragg peak emission at their ends. In many cases a secondary Auger electron track further complicates the analysis. The modulation factor , defined as the recovered for a 100% polarized source, provides a useful description of the quality of the image reconstruction. For a measured source the true polarization is then given by calibrating: . Track reconstruction challenges make highly energy dependent [muleri_spectral_2010].

Critically, track reconstruction methods must not introduce significant bias for an unpolarized source (). For IXPE the hexagonal GPD pixels and a rotation between the telescopes are designed to minimize such x-y systematic biases. Thus the efficacy of an X-ray polarimeter depends on the recovered at both and polarization. For imaging X-ray polarimeters one also wishes to reconstruct the X-ray absorption (conversion) points and event energy.

Figure 1: Example IXPE GPD electron track images. The colormap represents charge deposited in each pixel. The top row is simulated (5.9 keV, 2.7 keV), the bottom shows real events of similar energy. For the simulated events, the blue dots and lines mark the true absorption point and initial direction. Lower energy tracks are typically smaller and less elliptical, thus harder to reconstruct.

The current track reconstruction method for the GPD is a moment analysis described by

bellazzini_novel_2003. Impressive accuracies for the absorption point and EVPA angle are achieved from a simple re-weighted combination of track moments, with the track barycenter replacing the moment localization at low energies. Track energy estimates are proportional to the total collected GPD charge. The track ellipticity also provides a rough proxy for track reconstruction quality. High ellipticity tracks typically have more accurate angle estimates. However, simple moments cannot capture all of the image information, especially for long high energy tracks, and so a more sophisticated image analysis scheme should allow improved track parameters, as well as better assessment of reconstruction quality. Recently, machine learning techniques have been discussed as useful for X-ray polarization measurements [moriakov_inferring_2020]; this is an ideal problem for such image analysis.

1.1 Deep learning for X-ray polarimetry

Deep neural networks (NNs) have achieved state-of-the-art performance on a wide variety of machine learning tasks and are becoming increasingly popular in domains such as speech recognition [graves_connectionist_2006]

, natural language processing

[young_recent_2018], bioinformatics [tang_recent_2019]

and especially computer vision

[krizhevsky_imagenet_2012]

. Going from track images to numerical estimates can be classified as a computer vision problem, so it is not surprising that NNs would be well suited to track reconstruction. The Cherenkov Telescope Array (CTA)

[brill_investigating_2019] team have applied closely related deep learning methods to differentiate between cosmic rays and gamma rays. Notably they also have to deal with a hexagonal pixel grid. The IceCube collaboration has begun use of graph neural networks to identify 3D neutrino tracks with great success [choma_graph_2018].

kitaguchi_convolutional_2019 have recently described a NN photoelectron track analysis for the non-imaging detector geometry that was intended for the PRAXyS X-ray polarimetry mission [tamagawa_x-ray_2017]. Using convolutional neural networks (CNNs), they show significant improvements in modulation factor over a standard moment analysis for a square pixel grid polarimeter while maintaining modulation for unpolarized data. While an excellent start, this analysis did not recover track absorption points or energies, showed unexplained biases at the level, used event cuts rather than weights and provide only a binned polarization analysis. They also trained for only a handful of event energies and were not able to validate against real detector data. We have been able to improve on this analysis addressing all of the issues above while delivering superior performance across a wide (and continuous) range of energies.

The cornerstone of our approach involves deep ensembles [lakshminarayanan_simple_2017]. These not only provide more accurate and less biased estimates than single NNs, but also give state-of-the-art estimates of predictive uncertainty. Using uncertainties in each of our track angle estimates, we developed an unbinned weighted maximum likelihood estimate (WMLE) approach to determine the final polarization parameters. This removes the need for excising data, making use of all measurements. With bootstrap analysis [efron_introduction_1994] we are able to infer the final error on our polarization estimates and define an appropriate figure-of-merit (FoM) to compare different track reconstruction approaches.

This paper describes our track reconstruction algorithm using deep ensembles. Section 2 explains the extraction of angles, absorption points and energies from individual tracks. In section 3 we describe our WMLE approach that takes an ensemble of track angles and uncertainties to final polarization parameters and their errors. There we define a figure-of-merit to compare the moment analysis, our approach and that of kitaguchi_convolutional_2019. Section 4 briefly outlines our NN training procedure and selection. Section 5 shows our results and their interpretation. We conclude the study, mentioning prospects for additional improvements, in section 6.

2 Deep ensembles for track reconstruction

We considered end-to-end deep learning approaches (as suggested by kitaguchi_convolutional_2019) to go directly from a set of tracks to source polarization. There are a number of difficulties with this approach. Training would be very expensive (perhaps infeasible) and it is not obvious how to account for different track ensemble sizes. Also the observer may wish to adjust partitioning of the data set into e.g. time, energy and spatial subsets, with differing polarization. The best partition will often not be obvious before analysis starts, thus combining the properties of individual tracks allows a more efficient exploration of binning options. Finally detector-dependent artifacts can best be handled from individual tracks (with fine spatial positioning), rather than PSF-weighted ensembles. Forming such ensembles in a second analysis tier allows additional flexibility.

For these reasons we do not consider a direct end-to-end approach but use a two step process: (1) extract features from individual tracks (angles, uncertainties, absorption points, energies) (2) combine an ensemble of the features to measure the final polarization statistics. However we do use the expected properties of polarized and unpolarized ensembles to guide the training and select the most effective networks. We first describe step (1): the event characterization.

Figure 2: Example square conversions of a 6.4 keV hexagonal track (left panel). The six panels to the right show shifts along the 120

GPD axes; shifting odd rows (upper) or even rows (lower). For each hexagonal track, NNs are fed column-wise pairs of square conversions, along with the energy, absorption point (blue dot) and track angle (blue line) as labels.

2.1 Deep Ensembles

To extract the angles, absorption points and energies from individual tracks we use a supervised deep learning technique known as deep ensembles [lakshminarayanan_simple_2017]. Deep ensembles are made up of an ensemble of individual NNs, each trained independently on the same data set to predict the desired output features. It has been shown that different random initializations of the same NN at the start of training leads to widely different prediction functions [fort_deep_2019]. Deep ensembles exploit this property by incorporating the results of many differently initialized NNs, increasing the diversity of predictors. Considering all of the individual NN predictions together leads to a more robust, accurate, and generalizable model with better uncertainty estimates.

In our case, we have an image to feature(s) regression problem. Convolutional NNs (CNNs) have been designed with an inductive bias appropriate for image regression problems. So our deep ensemble will be made up of individual CNNs.

Deep ensembles provide estimates of the predictive uncertainty. There are two germane types of uncertainty one can model [kendall_what_2017]. ‘Aleatoric uncertainty’ captures noise inherent in the observations. This is equivalent to statistical uncertainty. On the other hand, ‘epistemic uncertainty’ accounts for uncertainty in the model parameters, uncertainty which captures our ignorance about which model generated our collected data. This uncertainty can be reduced given enough data, and is often referred to as model uncertainty or systematic uncertainty. We will model both of these uncertainties using deep ensembles and use them in our final polarization predictions in §3.

2.2 Hexagonal to square conversion

The hexagonal grid used in the IXPE GPDs is designed to minimize polarization systematics, since hexapolar grid effects are orthogonal to the quadrupolar polarization signal. Example imaged photoelectron tracks at different energies are shown in fig. 1. Unfortunately, a hexagonal grid is not natively compatible with typical CNNs. It is possible to transform from a hexagonal to a square grid, however a naive transformation can lead to polarization biases and suboptimal NN performance. This is partly because the CNN convolutional kernels are not spatially equivariant in hexagonal space.

There are two main ways of converting between hexagonal and square grids: interpolation and pixel shifting. Interpolation places a fine square grid on top of the hexagonal image and interpolates. We avoid using interpolation since it adds noise to the raw data and is not easily reversible. Pixel shifting rearranges pixels by shifting alternate rows and then rescaling. The HexagDLy

[steppa_hexagdly_2019] software, designed for use in CTA, allows vanilla CNNs to operate on hexagonal images. It does this by pixel shifting the images to square arrays and then applying specialized convolutional kernels that preserve equivariance in hexagonal space. Unfortunately, in practice, this method proved too slow to train for the large event sets required for polarization estimation. Accordingly, we simply pixel shift each track along each of the six hexagonal axes (to avoid bias). A single hexagonal track produces six square conversions (fig. 2), two for each angle. A single training example for the NNs is formed by stacking the corresponding square conversion pair. The stacked images can be treated as color channels in an rgb image CNN problem. At test time all 3 pairs are evaluated by the NNs and the predicted angles are rotated back to their original direction. Details on pixel shifting can be found in the HexagDLy documentation.

It should be noted that clean track images, such as those shown in fig. 1, already require a set of thresholding and clustering steps to isolate individual tracks from a detector (or simulation) snapshot. These are at present handled by IXPE’s GPD software. The track rectification into the six square projections is simply an additional pre-processing step that we apply in collecting a data set for supervised training.

2.3 Training criteria

In a typical CNN regression problem, during training the CNN takes as input a single image with feature label and outputs single prediction . The NN weights are optimized to minimize to the mean squared error (MSE) on the training data set . In order to model the statistical uncertainty in predictions, individual networks in a deep ensemble each minimize the negative log-likelihood:

(1)

where corresponds to the predicted mean and to the predicted variance. Thus the NN produces an estimate of the feature and its statistical error . In practice however we train the NN to predict the log variance

(2)

since this is more numerically stable [kendall_what_2017].

We estimate the track angle and its statistical error with the following loss function,

(3)

where and play the roles of the features . We parameterize the track angle as a unit 2D vector to incorporate periodicity. By adding a dipolar loss term , with hyper-parameter , we account for the EVPA ambiguity. Without this additional term the sign of the principal axis for poorly resolved tracks is ambiguous and the CNN hedges its bets by selecting orthogonal to the principal axis; low energy polarization resolution suffers. Indeed kitaguchi_convolutional_2019 include only this latter term in their loss function. However, since recovering the full directional information is important for reducing detector bias and aids in post-processing analysis, a loss function with both terms produces the best results.

Since we also want to measure the absorption point and event energy, we include the two additional terms in the loss function:

(4a)
(4b)

where are the event energies and the coordinates of the absorption point in the rectified grid. Since our primary objective is polarization, we do not include separate location and energy error parameters in the minimization. Nevertheless, the track angle error above provides some measure of track reconstruction quality and can improve the localization and energy measurements.

In sum, our total multi-component loss function is

(5)

with hyperparameters

that are tuned during training (§4). The final term is a -norm regularization on the NN parameters . This is a common regularization in machine learning to prevent overfitting. Each individual CNN is trained to minimize eq. 5. The CNN computations take individual (square) track images as input and output the feature vector

(6)

In practice, after training a set of NNs, we select a best-performing subset for inclusion in the final deep ensemble. This selection is described in §5. We can model the systematic uncertainty in our feature estimates by combining the final feature predictions from all the NNs in our ensemble. For each track, , a deep ensemble of networks produces output feature vectors (eq. 6). The distribution over the feature vectors provides a systematic uncertainty estimate.

Figure 3: Distribution of deep ensemble predicted statistical errors across the IXPE energy spectrum (left). The right-hand plot shows the distribution for two specific energies keV (red), keV (blue). Low energy tracks tend to cluster around higher predictive uncertainty. High energy tracks have a tail of poorly predicted examples. Many of these correspond to the events converting outside the gas volume (see falling black curve in fig. 4).

3 Polarization estimation

Unlike many typical deep learning problems, polarization estimation requires predicting ensemble statistics from a large number of events. We have split the problem in two, first extracting features via NNs (§2). Here we combine our deep ensemble feature predictions to produce best estimates for the polarization fraction , EVPA and their errors.

The basic problem is to estimate and from a set of measured track angles . As described in the introduction, the track angles exhibit a sinusoidal modulation with period

(7)

where , and . The intrinsic polarization fraction is then given by . While simple analyses often bin the data set and then fit to estimate , this provides an inevitable loss of information, and poor performance when the bin counts are limited. Maximum likelihood methods are preferable, being unbinned. kislat_analyzing_2015 have developed an unbinned method, working with total Stokes parameters. They define

(8a)
(8b)

where is total number of track angles . Then can be simply calculated as

(9a)
(9b)

The strength of this kind of analysis is that the errors on the Stokes parameters can be computed easily and are well behaved. This method is unbiased and faster than a full maximum likelihood fit. This is the polarization estimation method used in current IXPE analysis.

However, when estimating the polarization of spatial, spectral and temporal features, the different events can give quite varied constraining power. For example, low-energy short tracks inevitably have poorly constrained (and, given the typical astrophysical energy spectrum and the detector response, most events will have low energy!). A subset of events with higher energy or cleaner, longer tracks may be most useful. For this reason it is essential to incorporate some form of quality control in the tracks we use for our polarization estimates. Traditionally one applies track cuts, but this is a sub-optimal since one can throw away a large fraction of the data. A weighting scheme should be preferred. Fig. 3 shows the distribution of NN predicted statistical errors for track data sets spanning all energies. These are equivalent to the track reconstruction quality. Even high energy data sets show tails of poorly reconstructed tracks.

3.1 Importance weighted maximum likelihood

Our deep NN ensembles provide us with statistical and systematic error estimates for each event. Our approach is to incorporate these into our measurement of the polarization parameters. In maximum likelihood estimation, the negative log-likelihood of the polarization parameters given the measured track angles is minimized over the likelihood domain:

(10)

where eq. 7 is used as the likelihood function and we have dropped constant terms, since they do not affect the minimization. The optimal values of eq. 10, , are the final polarization estimates. An approximation of errors on these estimates (or equivalently on the Stokes fluxes from the source) can be calculated analytically using Fisher information. We make the substitution to get

(11a)
(11b)

Note both errors depend strongly on the modulation factor , with higher modulation factors leading to smaller prediction errors. There is also a dependence on the total number of measured tracks , making extensive track cuts detrimental.

We modify equation 10 to an importance-weighted maximum likelihood estimation [hu_weighted_2002] problem

(12)

where is the number of NNs in the deep ensemble and are the predicted track angles and statistical errors respectively. The th NN in the ensemble makes a prediction () for the th track. By including the full distribution of track angle predictions from our ensemble we account for systematic/model uncertainties. The predicted statistical errors are used as importance weights [karampatziakis_online_2011, hu_weighted_2002], so that low tracks are weighted more strongly. The parameter controls a simple weighting scheme: for the log-likelihood is unweighted and we recover the Stokes method (Equations 9 and 10). With high the best-measured tracks dominate the polarization estimate. This increases but leads to larger fluctuations in the estimated values, as well as increased sensitivity to reconstruction biases (e.g. when aligns with the underlying hexagonal grid, giving cleaner tracks). In §3.2 we provide a prescription for selecting . While this analysis depends on the source spectrum, for many purposes a default will suffice.

We can rewrite eq. 12 as a convex optimization problem:

(13)

where and . By recasting eq. 12 as a convex optimization problem, we have a guaranteed globally optimal solution for . We can solve eq. 13

quickly and efficiently using second order Newton methods. In practice we use the robust open source software IpOpt

[wachter_implementation_2006].

Unfortunately, with importance weights, the minimization objective is no longer a normalized likelihood and we lack a simple closed-form expressions for the error formula like eqs. 11a, 11b. We turn to parametric bootstrapping to calculate error estimates on .

3.2 Figure of merit

We require a figure-of-merit (FoM) for the polarization measurement, a scalar value representing the signal to noise ratio of a given detector and analysis scheme. The standard figure-of-merit used in X-ray polarimetry currently is the minimum detectable polarization (MDP)

[weisskopf_understanding_2010]. MDP

is the polarization fraction that has a 1% probability of being exceeded by chance for an unpolarized source. The probability

of measuring modulation and EVPA given that the true modulation and phase are and is [weisskopf_understanding_2010]

(14)

Here is the total number of measured tracks as usual. The MDP may be found by integrating eq. 14 for , resulting in

(15)

where accounts for imperfect polarization recovery; the source polarization will be . Note the similarity to eq. 11a: MDP is effectively a (inverse) ratio of recovered signal to noise . For a given detector, track reconstruction algorithms with lower MDPs are better.

MDP works well as a FoM if track angle estimates are unbiased and follow simple Gaussian statistics. Of course prediction bias and incompletely understood correlations in the NN predictions can violate these assumptions. However, with numerical simulations we can study the variance of our NN results and measure the effective , if any, from the offset and from the width. This allows us to use MDP as our ’Figure of Merit’ for comparing analysis schemes, including cuts and weights, when we replace by in Eq. 15, where we determine by fitting Eq. 14 to a bootstrap-sampled distribution of for an unpolarized source. In this fit, we maximize the likelihood , where is our set of bootstrap samples. The optimal value defines . The expected source polarization gives a measurement of the modulation bias in our track angle estimator and its direction. An ideal track angle estimator would have and (in this case our FoM is equivalent to the MDP). We do not include the estimator bias directly in the FoM because any useful estimator should either have a negligible bias or the bias, if known, should be divided out (as in kitaguchi_convolutional_2019); is simply a tool to evaluate any residual bias in the track reconstruction. Indeed, we find that the are small, consistent with the residual statistical polarization in our finite-sized event sets.

Figure 4: Modulation response for analysis of simulated GPD tracks. Top: The available signal for a 100% polarized source, after dilution by window/GEM conversion events (black line). Standard moments analysis (blue solid line, or with 20% ellipticity cuts, dashed blue line) is compared with the deep ensemble analysis with uniform weighting (red line). The cyan and red dashed lines show the improvement using and weights, respectively. Middle: Response to an unpolarized signal. Line types as above. Bottom: EVPA error for a polarized signal.

Bootstrapping works for any algorithm (when one can compute sufficient bootstrap samples), so this FoM can be used for all polarization estimation methods discussed, with or without event cuts or weights. For a given set of tracks, the weighting scheme for our importance weighted MLE (eq. 12) is chosen by evaluating the FoM for a set of s and choosing the best performer (with negligible bias). This is the FoM we use to evaluate the performance of different track reconstruction algorithms in §5. We show examples of the bootstrap fits in §5.

Figure 5: Track angle recovery for unpolarized (top row) and polarized (bottom row) simulated data for and keV. The ground truth angle distribution is shown in black; standard moment analysis in blue and unweighted NNs in red.

4 NN training and selection

Here we describe the training procedure for individual NNs (§4.1) in the ensemble and the process of NN selection for ensemble membership.

4.1 Data Sets

Our training and validation data sets consist of simulated tracks (e.g. fig. 1) generated via a Monte Carlo geant4 [agostinelli_geant4simulation_2003] simulation, part of IXPE’s GPD software suite [bellazzini_novel_2003]. The track energies uniformly span keV, IXPE’s most sensitive range and are unpolarized (uniform track angle distribution). We simulate for the expected IXPE gas pressure of 680 mbar. Each track is labelled with its 2D track angle vector , absorption point coordinates (on the square grid) and its energy . This gives a final feature vector . We find training on a power law distribution of energies improves performance: the majority of tracks IXPE will detect are at low energy and will be difficult to measure. We simulate million tracks with energy distributions and . We split these 3 million tracks into a training set and a validation set, where the validation set makes up of the total. Additionally, we have simulated track test sets. These are used in the net selection process ( tracks, at a range of energies). Finally, we have access to actual GPD calibration data at keV polarized (100%) and keV unpolarized. Results on both simulated and real track data are presented in §5.

Figure 6: Bootstrapped modulation distributions for a PL2 test data set (1.07 million simulated tracks with 2000 bootstrap samples). The residual modulation from the original simulated polarization set are the black histogram, NN are in red and moment analysis in blue. The left panel (a) show the uniformly weighted NNs and standard moment analysis; the right panel (b) shows weighted NNs and moments with standard cuts. The solid lines show the fits of Eq. 14 to these distributions (the fitting also includes the bootstrapping distribution on , not shown here). From these fits we get an estimate of (where ) for each method to use in the FoM (Eq. 14). The FoM results incorporating these fits are in Table 1.

4.2 Training

We use a ResNet-19 [he_deep_2015] convolutional NN architecture as our base NN. ResNets and their variants (like DenseNet [huang_densely_2018]) are the current state-of-the-art in image classification. They contain ’skip’-connections in between their layers that lead to faster and more robust training. This particular architecture is large enough to over-fit the training set, and trains in a reasonable amount of time (

GPU hours). Before training we center the square pixel arrays and rescale, subtracting the pixel-wise mean from each track image and dividing by the pixel-wise standard deviation (where the mean and standard deviation are calculated over the full training set). The track energy labels are similarly processed. Centering and re-scaling the training data helps prevent vanishing and exploding gradients during the NN training procedure and lead to faster convergence. We use stochastic gradient descent with momentum as our optimizing algorithm, typical in computer vision tasks

[sutskever_importance_2013], with a stepped decaying learning rate starting at 0.01. We choose batch sizes of between tracks. The training procedure seeks to minimize the loss function given by eq. 5 over the NN parameters. We tune the hyperparameters to minimize the MSE on the predicted track angle for the validation set, while retaining energy and absorption point accuracy. We find, typically, , and work well for angle accuracy and minimum NN bias (important for track selection, §4.3). The values of and are small, reflecting our choice to focus on the measurement of track angles.

4.3 Selection

The training procedure above minimizes the track reconstruction errors (5), but does not minimize the polarization estimation error. We use network selection to effect this minimization. To select NNs for the deep ensemble, we train individual NNs in the manner described above. We use the entire training data set to train each NN since deep NNs typically perform better with more data. After training these individual NNs, we down-select to the best performing , as measured by a quality metric. Since we save NN checkpoints throughout the training, we consider NNs at all middle-late stages of their training in our selection. This is a form of early-stopping to prevent over-fitting on the training set.

When measuring an unpolarized, finite size data set any method will have a mean square error, MSE, in the determined for the events generated with . In addition, the ensemble will have a residual total polarization , which combines finite sample size effects with reconstruction inaccuracies and ML bias, if any. A good analysis maximizes and minimizes MSE and . These quantities vary strongly with energy. Accordingly we define a NN ‘quality’ measure as

(16)

where we combine over energy bins of width comparable to the detector resolution. While each NN attempts to minimize the MSE, ranking networks by the product (16) selects the subset with the minimum polarization bias.

In principle, absorption point and energy resolution errors could be factored into the down-selection, but since polarization is our main interest (and the most difficult to train), we use the basic defined above. The results in §5 obtain from an ensemble of the 14 networks with the highest .

5 Results

All of the training, testing and validation have been performed on the simulated event track images. These have been used to train NNs and down-select to an ensemble having maximum sensitivity and minimum residual error. We have performed limited tests with real GPD data to ensure that we can obtain comparable accuracy with these images, but unsimulated physical effects in the GPD hardware and the peculiarities of individual flight GPD will require networks tuned for specific devices. Here we focus on the general performance.

Method (%) (%) MDP(%)
True 100. 0.29 0.93 1.01
Mom 26.6 0.13 -0.10 1.02 5.11
Mom w/ cut 30.2 0.10 -0.93 0.83 5.00
NN 27.2 0.17 -0.89 1.08 4.77
NN 32.6 0.17 -0.65 0.95 4.26
NN 36.2 0.20 -0.50 0.79 4.21
NN 40.4 0.25 -0.38 0.58 4.41
Table 1: Sensitivity analysis for 2-8 keV photons with a spectrum and IXPE’s energy response. ‘True’ describes the statistical residual polarization of unpolarized events, which leave insignificant and (see Eq. 14). gives the sensitivities for the various cuts and weights with 2-8 keV photons.
Method (%) (%) MDP(%)
True 99.9 0.59 0.60 0.98
Mom 28.7 0.32 -0.48 1.00 4.69
Mom w/ cut 33.6 0.19 -0.45 0.81 4.48
NN 30.5 0.34 -0.42 1.08 4.28
NN 37.5 0.36 -0.35 0.93 3.74
NN 42.0 0.37 -0.32 0.75 3.75
NN 47.6 0.39 -0.28 0.54 3.88
Table 2: Sensitivity analysis, as for Table 1, but for 2-8 keV photons with a spectrum and IXPE’s energy response.
(%) (%)
Energy [keV] 3.69 4.51 6.40 5.89
Mom 35.3 37.0 47.2 0.8
Mom w/ cut 39.7 41.6 54.2 0.8
NN 37.5 41.6 54.1 0.7
NN 42.2 48.3 62.2 0.7
Table 3: Preliminary modulation factor results with simulation trained NN applied to real detector data (190k unpolarized tracks for and 40k polarized tracks). We recover simulated data performance on real data within low track-number statistical errors, except at low energies where known, unsimulated detector artefacts are present. These can be mitigated in post-processing.

5.1 Polarization

To maximize polarization sensitivity, the appropriate metric for performance is the FoM defined in §3.2. However, it is instructive to examine the and unpolarized energy dependence across the IXPE spectral range. We show these results for simulated data in fig. 4 for our method with and without weights, and the moment analysis with and without cuts. In the upper panel, the black line at top shows the modulated signal from a simulated data set corresponding to a 100% polarized () source. The decrease toward high energy represents the increasing fraction of events which convert in the detector window or the GEM foil. These are truncated and/or highly scattered, with a fraction of the normal gas energy deposition and retain little polarization information. Thus perfect reconstruction can at best reach the black line. The blue solid line show the performance of the mission default (Moments) analysis. If one knows the event energy a priori (e.g. for a calibration source) one can cut on recovered energy to remove many of the incomplete tracks, improving the polarization response (blue dashed line). For astrophysical sources with unknown event energies, this simple effective prescription cannot be applied. More sophisticated shape cuts could in principle recover a fraction of this improvement – however one must recall that these cuts decrease the sample size and when considering the polarization sensitivity of a given data set, this substantially reduces the gains from the cuts.

The performance of our initial NN analysis (red solid line) is better than the moments analysis and at high energies, even matches moments after the ellipticity cuts. Of course, we can use our event quality metric to make a better weighted measurements (red and cyan dashed lines) and this substantially increases the modulation sensitivity, at a modest cost in the effective number of sample events.

While high polarization sensitivity is desired, it is essential that the analysis chain not induce spurious signals from unpolarized sources. The middle panel summarizes tests for this effect. The finite size of the simulated unpolarized data set guarantees a residual statistical polarization. This ‘ground truth’ is shown by the black dotted line; an ideal measurement should not induce polarization significantly in excess of this value. The colored traces follow the scheme above. We see that the NN analysis induces no significant polarization and, like the moments analysis, meets mission requirements at all energies. Here we are measuring a small residual signal so the energy bins are larger, containing test tracks.

We can compare with the PRAXyS NN simulation results of kitaguchi_convolutional_2019 for the non-imaging low pressure PRAXyS detector. While their sensitivity (at three energies) is roughly at the position of our (cyan) curve, their unpolarized signal had large systematics. In part this is because their networks had very large prediction biases (large in the middle panel) – they were forced to divide out the pattern of these biases to obtain acceptable 1% polarization levels. Our NN ensemble produces negligible bias at these energies, so we did not have to perform such a normalization, yet our residual polarization is substantially lower at all energies.

Figure 7: Photon absorption point localization. The NN analysis (red) does appreciably better that the moments analysis and keeps pace with the barycenter estimate at the lowest energies. All methods are adequate, as localization is much better than the size of the point source image produced by IXPE’s mirror resolution.

The bottom panel of Fig. 4 shows the quality of EVPA recovery for the polarized case. Again NN performance is comparable to moments in ground truth recovery. Both methods have somewhat increased error at low energy, due to low recovered and limited counts analyzed in these energy bins (eq. 11b).

Figure 8: Fractional width of the event energy estimate. The distributions are measured with the Median Absolute Deviation (MAD) statistic, which is less sensitive to the long tail from partial tracks (see Fig. 9). Values are converted as FWHM=3.46MAD, appropriate for a Gaussian peak. The NN analysis does slightly better than the standard PI count summing, especially at high energy; the sharp drop at 2keV may be an effect of our limited training energy range. The theoretical resolution limit for the GPD Fano factor is shown by the dashed line.

To help visualize the reconstruction accuracy, we show the binned modulation curves for 0% and 100% polarization for two energies in fig. 5. Recall that we do not simply fit these histograms to recover polarization parameters (as in some less sophisticated analysis). Nevertheless, these are useful to show the lack of bias in our angle predictions. Some ML polarization analyses suffer from strong prediction bias, producing imperfect symmetry and narrow peaks in the angular distribution; no such artifacts are evident in our reconstructions. We can see that the unpolarized distributions for our NN method are as flat as those for the moments, but the NNs recover significantly more modulation in the polarized data.

Fig. 6 shows the bootstrap distributions used to calculate values used in the denominators in the FoM (eq. 15).

Tables 1 and 2 show our FoM results for two power law datasets and several values of the importance weighting control parameter . These source spectrum power laws (PL1 , and PL2 ) extend across the whole IXPE energy spectrum. They are convolved with IXPE’s effective area function [weisskopf_imaging_2016], to provide event energy distributions expected for real astrophysical sources. We compute modulations for the 2-8 keV events only, with values normalized to events in this range, as achievable for a moderately bright X-ray source.

Weighted NNs outperform the uncut moment analysis by in this FoM, and by when ellipticity cuts are applied. Note that for these spectra, shallow weighting is best. Since the FoM is proportional

(for an unbiased estimator), the number of photons

required for an observation, the FoM improvement corresponds to a 1-% increase in effective exposure time. The effective area peaks at keV, greatly favouring low energy tracks in the analyzed data set. Of course this disfavors our NN analysis which tends to perform best at high energies, but is conservative and realistic for soft astrophysical sources. Sources with harder spectra (e.g. highly absorbed AGN or accreting X-ray pulsars) will show even larger NN performance boosts. Note that we also do not include the events detected outside of the  keV energy range. A small MDP improvement can be expected from inclusion of the many (poorly measured) low energy events, especially for very soft sources.

All of the results shown so far are on simulated track data. Table 3 shows our preliminary results on real track data. We are able to maintain much of the NN improvements over moments on real data.

Figure 9: Left: Recovered Energy distribution. NN estimates are somewhat tighter than the standard count summing energy estimate. They also truncate the tail due to low recovered energy partial tracks.

5.2 Absorption points

Our NN analysis also returns the photon absorption point and energy. Fig. 7 shows the performance compared with the standard moment analysis for the localization. In the standard pipeline, the localization shifts to the event barycenter (black curve) at low energies. It seems that our trained NN automatically shifts to a similar estimate at low energies as the red curve is better at all . In practice the performance of all methods is adequate for IXPE imaging; the dotted line shows one quarter of the mirror Half Power Diameter (HPD). However, the excellent NN photon localization on the detector can be very helpful in characterizing and mitigating position-dependent detector effects.

5.3 Energy

For the energy resolution, we find that the NN out-performs moments for all energies in the IXPE spectrum. Results are shown in figs. 8 and 9. While we expect that all sources observed by IXPE will have much higher resolution spectra available (e.g. from CCD data), this energy improvement can help in isolating polarization signatures to narrow spectra features, e.g. the Fe K line. Further simulation with real astronomical spectra are needed to see if such line affects are within IXPE’s reach.

6 Summary and Discussion

We have developed a state-of-the-art track reconstruction and polarization prediction framework for imaging X-ray polarimeters. We recover X-ray photon track angles, absorption points and energies using deep ensembles; predictive errors in track angle are used to inform a weighted maximum likelihood estimation of source polarization parameters. We define a new FoM for polarization recovery, using bootstrap error distributions, which allows us to compare, on equal footing, our method with the IXPE project’s current moment analysis, including cuts to the data sets and weights from the reconstruction quality measurements. We have tested our method with simulated and real detector data. On simple simulated power law spectra, our analysis implies that the increases to IXPE’s polarization sensitivity provided by this method can increase the effective exposures by as much as . Preliminary results suggest improvements on real data are similar.

Further verification with real X-ray calibration data sets is needed before this method can be applied to flight data. The networks would also need to be tuned, with calibration data, for the peculiarities of each flight detector. Of course the weighting scheme could be tuned for individual source spectra. Indeed, considering variation in Fig. 3 the optimal weighting will be an energy dependent but we feel it likely that adoption of a standard weighting scheme will help in the reproducible analysis of data from IXPE and related missions; weighting seems a good default. The method that we present here is general and may offer even larger gains for future missions with larger effective areas and higher spatial resolution from tighter mirror PSFs, such as eXTP and future proposed electron tracking polarization projects.

7 Acknowledgements

We would like to thank Bruce Tidor and Kevin Shi of MIT for early efforts related to this project, Fabio Muleri for detector calibration data and Marius Tirlea for discussions on statistical models. We would also like to thank the NASA FINESST grant 80NSSC19K1407 and NASA grant NNM17AA26C for supporting this work. The Italian contribution to IXPE is supported by the Italian Space Agency through the agreement ASI-INFN n.2017-13-H.O. Funding for this work was provided in part by contract 80MSFC17C0012 from the Marshall Space Flight Center (MSFC) to MIT in support of IXPE, a NASA Astrophysics Small Explorers mission.

References