Measuring X-ray polarization, the degree of order in X-ray electric field oscillations, has been a major goal in astrophysics for the last 50 years. X-ray polarization measurements offer rich opportunities to probe the magnetic field topology and emission physics of high energy astrophysical sources, such as accreting black holes and astrophysical jets krawczynski_using_2019; weisskopf_overview_2018. The recent development of photoelectron tracking detectors bellazzini_novel_2003 has greatly improved the prospects of doing so. The gas pixel detector (GPD) bellazzini_sealed_2007 has brought soft X-ray polarimetry (1-10 keV) to the PolarLight CubeSat test feng_x-ray_2020, the scheduled NASA IXPE mission sgro_imaging_2019, and the potential Chinese mission, eXTP zhang_extp_2017.
Imaging X-ray polarization telescopes feed GPDs which directly image charge tracks formed from photoelectrons scattered by incoming X-ray photons. IXPE (weisskopf_overview_2018; odell_imaging_2018, planned for launch December 9th 2021) will use three co-aligned X-ray telescopes, whose focal planes are imaged by GPDs with hexagonal pixels. IXPE’s sensitivity is limited by the track analysis algorithm used to recover source polarization, spatial structure and energy, given a measured set of electron track images. In the keV range, the cross-section for photoelectron emission is proportional to cos, where
is the angle between the normal incidence X-ray’s electric vector position angle (EVPA) and the azimuthal emission direction of the photoelectron. By measuring a large number of individual photoelectron emission angles, one can recover the above distribution to extract the source polarization parameters: polarization fraction () and electric vector position angle (EVPA, ). In practice, the recovery of photoelectron emission angles from track images is imperfect. Track images are noisy due to Coulomb scattering and diffusion, and, especially for low energies, are often barely resolved. Emission angle estimates are highly heteroskedastic.
The current track reconstruction method for GPDs is a moment analysis described bybellazzini_novel_2003
. Impressive accuracies are achieved from a simple weighted combination of track moments. However, simple moments cannot capture all of the image information, especially for long high energy tracks, and so a more sophisticated image analysis scheme may lead to improved track angle recovery. The problem of recovering polarization parameters from a dataset of (IXPE) electron track images has recently been announced as an open problem in the machine learning communitymoriakov_inferring_2020.
In this work, we aim to both improve individual event estimates and explicitly model their uncertainty to increase final polarization sensitivity. We demonstrate a two step method: (1) Use a deep ensemble of ResNets he_deep_2015 to predict electron track angles and their uncertainties. (2) Combine predicted angles and their uncertainties in a weighted estimator of the polarization parameters that maximizes the SNR. Our empirical findings indicate a substantial improvement over the current state of-the-art track reconstruction bellazzini_novel_2003; kitaguchi_convolutional_2019-1. While the results shown here are specific to IXPE’s GPDs, the methods are general, and can be applied to other imaging detector geometries. Extended results, including spatial and energy resolution are discussed in peirson_deep_2021; peirson_towards_2021.
2 Step I: deep ensemble
To extract the angles from individual tracks we use a supervised deep learning technique known as deep ensembles lakshminarayanan_simple_2017
. Deep ensembles are made up of an ensemble of individual neural network models, each trained independently on the same data set to predict the desired output features. Deep ensembles provide estimates of the predictive uncertainty by exploiting different random initializations of the same model at the start of training which leads to widely different prediction functionsfort_deep_2019. In terms of performance and scalability, deep ensembles remain the current state of the art in uncertainty quantification hoffmann_deep_2021
. In our case, we have an image to feature regression problem. Convolutional neural networks have been designed with a spatial inductive bias appropriate for image regression problems. So our deep ensemble will be made up ofindividual ResNet-18s he_deep_2015.
To make the hexagonal track images admissable inputs to standard ResNet architectures, we first convert the hexagonal images to square image arrays by shifting every other column and rescaling the distance between points, as described in steppa_hexagdly_2019. Since there are two possible shifts (odd and even rows), we apply both and stack the two shifted images, similar to color channels in images. We do this to more closely approximate spatial equivariance of the ResNet convolution kernels in the hexagonal space. At test time, we apply the deep ensemble to the same track 3 times, each time rotated by in hexagonal space. We find this reduces all relevant prediction bias on (and later ) introduced when converting from hexagonal to square coordinates. Fig. 1 summarizes this process. An alternative solution is to use a model with native hexagonal convolutions, such as steppa_hexagdly_2019. In practice, we found it more effective and expedient to leverage existing square convolution architectures.
For part (1) of our method, we need to predict emission angle and its associated uncertainty for each track. Instead of the Gaussian negative log-likelihood (NLL) used in lakshminarayanan_simple_2017
, we use the von Mises (VM) distribution NLL as our loss function, the maximum entropy distribution for circular data with specified expectation value. This more appropriately reflects the distribution of our estimators, which are clearly periodic. The VM distribution is parameterized by concentration parameter ; for large
the VM converges to a Gaussian with varianceand for small
it approaches a uniform distribution. We parameterize the true angleas a 2D vector to capture the periodicity. Only , as opposed to , is required for polarization estimation since ; see eq. 2). Each of the ResNet-18 models in our ensemble computes the loss as
where is the modified Bessel function of the first kind. Each ResNet-18 in the ensemble () outputs the 3-vector for track . Then and defines the predicted aleatoric uncertainty. The final track angle prediction is the circular average over the ensemble predictions . The epistemic uncertainty on is also assumed to follow a VM distribution, with concentration parameter ; we estimate it using the appropriate maximum likelihood estimator for given the independent sample set , see appendix. The total predictive error on each track angle is given by summing the aleatoric and epistemic variances:
Our intial dataset consists of 3.5 million GEANT4 agostinelli_geant4simulation_2003 Monte Carlo simulated tracks, where each track is labelled with its 2D emission angle vector, an example of which is shown in fig. 1. The track energies uniformly span keV, IXPE’s most sensitive range, and are unpolarized (uniform track angle distribution). Since we don’t know the true event energy, we want a model that can make predictions for tracks of all energies. We have confirmed the simulated track data matches real flight detector data to extremely high precision, and that our method is robust to any remaining covariate shift, maintaining the relative improvement shown in §4 for real detector data.
We apply pixelwise normalization to the square track images. Each ResNet-18 model is trained with a Momentum Optimizer, and a step-wise learning rate beginning at and ending at on 2 NVIDIA GeForce RTX 2080 GPUs. ResNets with batch size are all considered for selection in the final ensemble. We randomly select trained ResNets to compose the final ensemble, whose distributed results estimate the epistemic error.
3 Step II: polarization estimation
The basic problem is to estimate and from a set of measured track angles . As described in the introduction, true track angles exhibit a sinusoidal modulation with period
where , and . One could estimate using the maximum likelihood estimator (MLE) for eq.2. Equivalently, kislat_analyzing_2015
have shown that the minimum variance unbiased estimator foris given by
In other words, follow the same distribution as but smeared by a modulation factor . Current analyses kitaguchi_convolutional_2019-1; moriakov_inferring_2020 treat as constant for all tracks and calculate it based on broadband calibration measurements: no connection to individual track predictions. In effect, they assume homoskedastic emission angle measurements. Here, we explicitly model the heteroskedasticity of our predictions by using a deep ensemble (§2) to estimate event uncertainties and include them in the likelihood for final predictions.
Instead of explicitly maximizing a likelihood function based on eq.5 and our predictions to estimate , we can define event weights and use eq.3-4 by defining , , and . As N approaches infinity, this converges to the MLE kislat_analyzing_2015. The posterior distribution is given in appendix §A.2.
|Moment Analysis||4.45 0.02|
|Deep ensemble||4.21 0.02|
|Deep ensemble w/ weights||3.38 0.01|
Figure of merit.
We require a figure-of-merit to define the quality of polarization reconstruction using different methods. Aligned with prior work, we adopt the standard figure-of-merit used in X-ray polarimetry: minimum detectable polarization (MDP) weisskopf_understanding_2010. MDP
is the polarization fraction that has a 1% probability of being exceeded by chance for an unpolarized () source. This can be found by integrating the posterior distribution (see appendix): for an unweighted polarization estimate. Here is the empirical modulation factor for the entire set of track observations (i.e. for ). For a weighted estimate is replaced with kislat_analyzing_2015. MDP is effectively a (inverse) ratio of recovered signal to noise . For a given detector, track reconstruction algorithms with lower MDP are better. In the appendix, we prove our chosen weighting minimizes the MDP (maximizes SNR).
4 Results and discussion
We evaluate our two step method and the standard moment analysis on a power law 1 dataset with tracks. Once emission angles are estimated, all methods use eqs.3-4 for polarization prediction. To further strengthen our analysis, we ablate using predicted uncertainties weights. Table 1 shows the results. Our weighted approach using deep ensemble predictions yields significant improvement over the standard analysis, with a decrease in MDP. The equivalent decrease in IXPE’s required exposure times to reach the same SNR is . Our method without any uncertainty weighting provides a significantly smaller improvement. The gain in sensitivity of our method comes in a small part from an improvement in accuracy, and in a larger part from properly modelling heteroskedasticity, using deep ensemble predicted uncertainties as appropriate event weights. In fig. 2 we assess the trustworthiness of our deep ensemble predicted VM uncertainties. We find the empirical modulation factor matches the deep ensemble predicted modulation factor (event weights) very closely for a selection of test spectra. In the appendix, we visualize our deep ensemble at multiple energies to assess any prediction bias.
Our results on real (non-simulated) detector data indicates similar exposure time reduction. A decrease in exposure time of
adds public value since NASA-funded projects like IXPE can observe significantly more sources, at better SNR, over the mission lifetime. We expect our approach of using deep ensemble predicted uncertainties to take into account heteroskedasticity could be applied to other problems in engineering and the physical sciences, for example in high energy particle physics: fitting Cauchy distributions to the frequencies of noisy (deep learning) measured decay states.
Appendix A Appendix
a.1 Epistemic error estimator
With the epistemic uncertainties assumed to follow ; can be estimated from the output of a deep ensemble with M models :
with the modified Bessel functions and .
a.2 Posterior distribution on
For a weighted polarization estimate with observations and weights using eqs. 3–4, kislat_analyzing_2015 derive the joint error distribution (posterior) for the polarization fraction and EVPA estimators as
This assumes a uniform prior over . Here, is the empirical modulation factor for the entire set of track observations (i.e. for ). is effectively the instrument polarization response.
) are derived from this posterior probability distribution. In cases whereand are not close to 0, the Gaussian approximation for the marginalized errors below is sufficient
High and high are both desirable to minimize the errors on recovered polarization parameters.
a.3 Minimum detectable polarization (MDP)
The MDP is calculated for an unpolarized source and is given by (using posterior eq. 8)
a.4 Maximising the SNR
We define the signal-to-noise ratio (SNR)
This is simply the inverse of the MDP (without constants); an optimal weighting scheme should maximize the SNR for a fixed number of events . We can expand the SNR explicitly using our weighted estimators from §3,
Expanding, squaring, and dropping constants (which do not affect maximization) we obtain
are random variables. The true values, also random variables, follow the distribution eq. 2 (since they are perfectly known). Assuming the are unbiased estimators of (approximately true for both moment analysis and ResNet-18s, §A.5) we can say
where the measurement errors are independent random variables drawn from the same family of distributions with
The specific distribution of the measurement errors will depend on the estimation method; however since are periodic, should follow a periodic distribution. For any distribution with the above properties, we can find the distribution for as the convolution of the distributions
where and . In other words, the distribution of estimators are the same as the distribution of the true values but with a reduced modulation factor . The measurement noise will decrease the sinusoidal modulation by a factor for the specific event .
Knowing the distributions of , eq. 20, we take the expectation over SNR (dropping constant )
Finally maximizing this expression with respect to in the large limit, we find
i.e., the optimal weight for an event with observation is given by its expected . Note for this already holds with high accuracy; useful polarization measurements typically have .
Under the assumption of von Mises distributed errors
we have shown in §3 that
These weights maximize the SNR.
a.5 Recovered emission angles
We show the recovered distribution for our deep ensemble and the moment analysis at two example energies in fig.3. The unpolarized examples show negligible residual polarization in our method while the polarized examples suggest an increase in deep ensemble polarization signal (), especially at higher energies. As described in the main text, the improvement in the actual polarization estimator is even greater when weighting using event uncertainties. All plots suggest a lack of prediction bias in our method.