A deep ensemble approach to X-ray polarimetry

by   A. L. Peirson, et al.
Stanford University

X-ray polarimetry will soon open a new window on the high energy universe with the launch of NASA's Imaging X-ray Polarimetry Explorer (IXPE). Polarimeters are currently limited by their track reconstruction algorithms, which typically use linear estimators and do not consider individual event quality. We present a modern deep learning method for maximizing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on IXPE. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNets, trained on Monte Carlo event simulations. We derive and apply the optimal event weighting for maximizing the polarization signal-to-noise ratio (SNR) in track reconstruction algorithms. For typical power-law source spectra, our method improves on the current state of the art, providing a  40 exposure times for a given SNR.



There are no comments yet.


page 1

page 2

page 3

page 4


Deep Ensemble Analysis for Imaging X-ray Polarimetry

We present a method for enhancing the sensitivity of X-ray telescopic ob...

First optical reconstruction of dust in the region of SNR RX J1713.7-3946 from astrometric Gaia data

The origin of the radiation observed in the region of the supernova remn...

Fast Computational Ghost Imaging using Unpaired Deep Learning and a Constrained Generative Adversarial Network

The unpaired training can be the only option available for fast deep lea...

deepCR: Cosmic Ray Rejection with Deep Learning

Cosmic ray (CR) identification and removal are critical components of im...

DeepSource: Point Source Detection using Deep Learning

Point source detection at low signal-to-noise is challenging for astrono...

First Full-Event Reconstruction from Imaging Atmospheric Cherenkov Telescope Real Data with Deep Learning

The Cherenkov Telescope Array is the future of ground-based gamma-ray as...

Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging

X-ray security screening is widely used to maintain aviation/transport s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Measuring X-ray polarization, the degree of order in X-ray electric field oscillations, has been a major goal in astrophysics for the last 50 years. X-ray polarization measurements offer rich opportunities to probe the magnetic field topology and emission physics of high energy astrophysical sources, such as accreting black holes and astrophysical jets krawczynski_using_2019; weisskopf_overview_2018. The recent development of photoelectron tracking detectors bellazzini_novel_2003 has greatly improved the prospects of doing so. The gas pixel detector (GPD) bellazzini_sealed_2007 has brought soft X-ray polarimetry (1-10 keV) to the PolarLight CubeSat test feng_x-ray_2020, the scheduled NASA IXPE mission sgro_imaging_2019, and the potential Chinese mission, eXTP zhang_extp_2017.

Imaging X-ray polarization telescopes feed GPDs which directly image charge tracks formed from photoelectrons scattered by incoming X-ray photons. IXPE (weisskopf_overview_2018; odell_imaging_2018, planned for launch December 9th 2021) will use three co-aligned X-ray telescopes, whose focal planes are imaged by GPDs with hexagonal pixels. IXPE’s sensitivity is limited by the track analysis algorithm used to recover source polarization, spatial structure and energy, given a measured set of electron track images. In the keV range, the cross-section for photoelectron emission is proportional to cos, where

is the angle between the normal incidence X-ray’s electric vector position angle (EVPA) and the azimuthal emission direction of the photoelectron. By measuring a large number of individual photoelectron emission angles

, one can recover the above distribution to extract the source polarization parameters: polarization fraction () and electric vector position angle (EVPA, ). In practice, the recovery of photoelectron emission angles from track images is imperfect. Track images are noisy due to Coulomb scattering and diffusion, and, especially for low energies, are often barely resolved. Emission angle estimates are highly heteroskedastic.

The current track reconstruction method for GPDs is a moment analysis described by


. Impressive accuracies are achieved from a simple weighted combination of track moments. However, simple moments cannot capture all of the image information, especially for long high energy tracks, and so a more sophisticated image analysis scheme may lead to improved track angle recovery. The problem of recovering polarization parameters from a dataset of (IXPE) electron track images has recently been announced as an open problem in the machine learning community


In this work, we aim to both improve individual event estimates and explicitly model their uncertainty to increase final polarization sensitivity. We demonstrate a two step method: (1) Use a deep ensemble of ResNets he_deep_2015 to predict electron track angles and their uncertainties. (2) Combine predicted angles and their uncertainties in a weighted estimator of the polarization parameters that maximizes the SNR. Our empirical findings indicate a substantial improvement over the current state of-the-art track reconstruction bellazzini_novel_2003; kitaguchi_convolutional_2019-1. While the results shown here are specific to IXPE’s GPDs, the methods are general, and can be applied to other imaging detector geometries. Extended results, including spatial and energy resolution are discussed in peirson_deep_2021; peirson_towards_2021.

2 Step I: deep ensemble

Figure 1: Example square conversions of a 6.4 keV hexagonal track (left panel). The six panels to the right show shifts along the 120

GPD axes; shifting odd rows (upper) or even rows (lower). For each hexagonal track, the ResNet-18s are fed column-wise pairs of square conversions, along with the initial photoelectron direction (blue line) as a label.

To extract the angles from individual tracks we use a supervised deep learning technique known as deep ensembles lakshminarayanan_simple_2017

. Deep ensembles are made up of an ensemble of individual neural network models, each trained independently on the same data set to predict the desired output features. Deep ensembles provide estimates of the predictive uncertainty by exploiting different random initializations of the same model at the start of training which leads to widely different prediction functions

fort_deep_2019. In terms of performance and scalability, deep ensembles remain the current state of the art in uncertainty quantification hoffmann_deep_2021

. In our case, we have an image to feature regression problem. Convolutional neural networks have been designed with a spatial inductive bias appropriate for image regression problems. So our deep ensemble will be made up of

individual ResNet-18s he_deep_2015.

To make the hexagonal track images admissable inputs to standard ResNet architectures, we first convert the hexagonal images to square image arrays by shifting every other column and rescaling the distance between points, as described in steppa_hexagdly_2019. Since there are two possible shifts (odd and even rows), we apply both and stack the two shifted images, similar to color channels in images. We do this to more closely approximate spatial equivariance of the ResNet convolution kernels in the hexagonal space. At test time, we apply the deep ensemble to the same track 3 times, each time rotated by in hexagonal space. We find this reduces all relevant prediction bias on (and later ) introduced when converting from hexagonal to square coordinates. Fig. 1 summarizes this process. An alternative solution is to use a model with native hexagonal convolutions, such as steppa_hexagdly_2019. In practice, we found it more effective and expedient to leverage existing square convolution architectures.

For part (1) of our method, we need to predict emission angle and its associated uncertainty for each track. Instead of the Gaussian negative log-likelihood (NLL) used in lakshminarayanan_simple_2017

, we use the von Mises (VM) distribution NLL as our loss function, the maximum entropy distribution for circular data with specified expectation value. This more appropriately reflects the distribution of our estimators

, which are clearly periodic. The VM distribution is parameterized by concentration parameter ; for large

the VM converges to a Gaussian with variance

and for small

it approaches a uniform distribution. We parameterize the true angle

as a 2D vector to capture the periodicity. Only , as opposed to , is required for polarization estimation since ; see eq. 2). Each of the ResNet-18 models in our ensemble computes the loss as


where is the modified Bessel function of the first kind. Each ResNet-18 in the ensemble () outputs the 3-vector for track . Then and defines the predicted aleatoric uncertainty. The final track angle prediction is the circular average over the ensemble predictions . The epistemic uncertainty on is also assumed to follow a VM distribution, with concentration parameter ; we estimate it using the appropriate maximum likelihood estimator for given the independent sample set , see appendix. The total predictive error on each track angle is given by summing the aleatoric and epistemic variances:


Our intial dataset consists of 3.5 million GEANT4 agostinelli_geant4simulation_2003 Monte Carlo simulated tracks, where each track is labelled with its 2D emission angle vector, an example of which is shown in fig. 1. The track energies uniformly span keV, IXPE’s most sensitive range, and are unpolarized (uniform track angle distribution). Since we don’t know the true event energy, we want a model that can make predictions for tracks of all energies. We have confirmed the simulated track data matches real flight detector data to extremely high precision, and that our method is robust to any remaining covariate shift, maintaining the relative improvement shown in §4 for real detector data.


We apply pixelwise normalization to the square track images. Each ResNet-18 model is trained with a Momentum Optimizer, and a step-wise learning rate beginning at and ending at on 2 NVIDIA GeForce RTX 2080 GPUs. ResNets with batch size are all considered for selection in the final ensemble. We randomly select trained ResNets to compose the final ensemble, whose distributed results estimate the epistemic error.

3 Step II: polarization estimation

The basic problem is to estimate and from a set of measured track angles . As described in the introduction, true track angles exhibit a sinusoidal modulation with period


where , and . One could estimate using the maximum likelihood estimator (MLE) for eq.2. Equivalently, kislat_analyzing_2015

have shown that the minimum variance unbiased estimator for

is given by


where , and . However, since are imperfectly recovered, we adjust eq. 2 to include the VM uncertainty on our estimates, . Computing the convolution of eq. 2 with the VM, we find:


In other words, follow the same distribution as but smeared by a modulation factor . Current analyses kitaguchi_convolutional_2019-1; moriakov_inferring_2020 treat as constant for all tracks and calculate it based on broadband calibration measurements: no connection to individual track predictions. In effect, they assume homoskedastic emission angle measurements. Here, we explicitly model the heteroskedasticity of our predictions by using a deep ensemble (§2) to estimate event uncertainties and include them in the likelihood for final predictions.

Instead of explicitly maximizing a likelihood function based on eq.5 and our predictions to estimate , we can define event weights and use eq.3-4 by defining , , and . As N approaches infinity, this converges to the MLE kislat_analyzing_2015. The posterior distribution is given in appendix §A.2.

Figure 2: Measured modulation factor as a function of predicted track weight for large test data sets of keV simulated events. Each bin is calculated from 20,000 individual track events. Open circles represent a PL1 source, closed circles a flat spectrum.
Method MDP(%)
Moment Analysis 4.45 0.02
Deep ensemble 4.21 0.02
Deep ensemble w/ weights 3.38 0.01
Table 1: Sensitivity analysis for 2-8 keV photons with a spectrum.

Figure of merit.

We require a figure-of-merit to define the quality of polarization reconstruction using different methods. Aligned with prior work, we adopt the standard figure-of-merit used in X-ray polarimetry: minimum detectable polarization (MDP) weisskopf_understanding_2010. MDP

is the polarization fraction that has a 1% probability of being exceeded by chance for an unpolarized (

) source. This can be found by integrating the posterior distribution (see appendix): for an unweighted polarization estimate. Here is the empirical modulation factor for the entire set of track observations (i.e. for ). For a weighted estimate is replaced with kislat_analyzing_2015. MDP is effectively a (inverse) ratio of recovered signal to noise . For a given detector, track reconstruction algorithms with lower MDP are better. In the appendix, we prove our chosen weighting minimizes the MDP (maximizes SNR).

4 Results and discussion

We evaluate our two step method and the standard moment analysis on a power law 1 dataset with tracks. Once emission angles are estimated, all methods use eqs.3-4 for polarization prediction. To further strengthen our analysis, we ablate using predicted uncertainties weights. Table 1 shows the results. Our weighted approach using deep ensemble predictions yields significant improvement over the standard analysis, with a decrease in MDP. The equivalent decrease in IXPE’s required exposure times to reach the same SNR is . Our method without any uncertainty weighting provides a significantly smaller improvement. The gain in sensitivity of our method comes in a small part from an improvement in accuracy, and in a larger part from properly modelling heteroskedasticity, using deep ensemble predicted uncertainties as appropriate event weights. In fig. 2 we assess the trustworthiness of our deep ensemble predicted VM uncertainties. We find the empirical modulation factor matches the deep ensemble predicted modulation factor (event weights) very closely for a selection of test spectra. In the appendix, we visualize our deep ensemble at multiple energies to assess any prediction bias.

Our results on real (non-simulated) detector data indicates similar exposure time reduction. A decrease in exposure time of

adds public value since NASA-funded projects like IXPE can observe significantly more sources, at better SNR, over the mission lifetime. We expect our approach of using deep ensemble predicted uncertainties to take into account heteroskedasticity could be applied to other problems in engineering and the physical sciences, for example in high energy particle physics: fitting Cauchy distributions to the frequencies of noisy (deep learning) measured decay states.


Appendix A Appendix

a.1 Epistemic error estimator

With the epistemic uncertainties assumed to follow ; can be estimated from the output of a deep ensemble with M models :


with the modified Bessel functions and .

a.2 Posterior distribution on

For a weighted polarization estimate with observations and weights using eqs. 34, kislat_analyzing_2015 derive the joint error distribution (posterior) for the polarization fraction and EVPA estimators as






This assumes a uniform prior over . Here, is the empirical modulation factor for the entire set of track observations (i.e. for ). is effectively the instrument polarization response.

Confidence intervals and the MDPA.3, the 99% upper limit for

) are derived from this posterior probability distribution. In cases where

and are not close to 0, the Gaussian approximation for the marginalized errors below is sufficient


High and high are both desirable to minimize the errors on recovered polarization parameters.

a.3 Minimum detectable polarization (MDP)

The MDP is calculated for an unpolarized source and is given by (using posterior eq. 8)


a.4 Maximising the SNR

We define the signal-to-noise ratio (SNR)


This is simply the inverse of the MDP (without constants); an optimal weighting scheme should maximize the SNR for a fixed number of events . We can expand the SNR explicitly using our weighted estimators from §3,


Expanding, squaring, and dropping constants (which do not affect maximization) we obtain


The estimators

are random variables. The true values

, also random variables, follow the distribution eq. 2 (since they are perfectly known). Assuming the are unbiased estimators of (approximately true for both moment analysis and ResNet-18s, §A.5) we can say


where the measurement errors are independent random variables drawn from the same family of distributions with


The specific distribution of the measurement errors will depend on the estimation method; however since are periodic, should follow a periodic distribution. For any distribution with the above properties, we can find the distribution for as the convolution of the distributions


where and . In other words, the distribution of estimators are the same as the distribution of the true values but with a reduced modulation factor . The measurement noise will decrease the sinusoidal modulation by a factor for the specific event .

Knowing the distributions of , eq. 20, we take the expectation over SNR (dropping constant )


Finally maximizing this expression with respect to in the large limit, we find


i.e., the optimal weight for an event with observation is given by its expected . Note for this already holds with high accuracy; useful polarization measurements typically have .

Under the assumption of von Mises distributed errors


we have shown in §3 that


These weights maximize the SNR.

Figure 3: Track angle recovery for unpolarized, , (left two panels) and 100% polarized, , (right two panels) simulated data for and  keV. The true photoelectron angle distribution is shown in black; standard moment analysis reconstruction is in blue and ResNet-18 deep ensemble in red.

a.5 Recovered emission angles

We show the recovered distribution for our deep ensemble and the moment analysis at two example energies in fig.3. The unpolarized examples show negligible residual polarization in our method while the polarized examples suggest an increase in deep ensemble polarization signal (), especially at higher energies. As described in the main text, the improvement in the actual polarization estimator is even greater when weighting using event uncertainties. All plots suggest a lack of prediction bias in our method.