RoboPol (Ramaprakash et al., 2019) is a four-channel optical polarimeter installed on the 1.3m telescope at the Skinakas Observatory in Crete, Greece that is primarily used for polarimetry of point sources in the R band. Its successor the Wide Area Linear Optical Polarimeter (WALOP) is under development at the Inter-University Center for Astronomy and Astrophysics (IUCAA) in Pune, India. Images contain artefacts resulting from dust patterns, cosmic ray hits, satellite trails and pixel bleeding contaminating information from celestial objects. With the increasing number of images taken every night from such instruments, it is necessary to automate the analysis of data. However, with humans taken out of the loop it is possible for artefacts to get misidentified as a source and be used in the analysis. This would lead to erroneous results so the detection of such artefacts is imperative.
Early work on detection of artefacts in astronomical images dates to the early 2000s when Storkey et al. (2004)
used computer vision techniques such as the Hough Transform to detect linear artefacts like satellite trails, scratches, and diffraction spikes near bright stars. These methods were concerned with detection of linear features and highlighted some of the difficulties of using the Hough Transform when dealing with light-density variations.
Later on the focus shifted to object identification followed by extracting features for the objects and then classification using these features to separate out artefacts with methods like decision trees and random forests(Donalek et al., 2008)
. Recent years have seen the compilation of terrestrial datasets like ImageNet consisting of a million labelled images with a thousand categories such as human faces, digits, vehicles, flowers, animals etc.Deng et al. (2009) followed by the development of deep learning libraries and models using these datasets e.g. the VGG16 architecture Simonyan & Zisserman (2014).
With deep learning it is possible to skip the sometimes subjective step of feature extraction and go straight to classification after obtaining a labeled dataset(see e.g Cabrera-Vives et al., 2017; Duev et al., 2019a; Duev et al., 2019b). This is at the cost of explainability, but with proper validation and test datasets, the results are still reliable. Additional ways to improve the robustness, and faster convergence using techniques like Mask R-CNN and linear scaling combined with normalization are discussed in some recent papers such as He et al. (2017); Gonzalez et al. (2018); Burke et al. (2019).
Our task here is to classify objects in RoboPol images into stars and artefacts. RoboPol images contain reflections of bright stars due to the interface between the 2 Wollaston Prisms used in the instrument and it is this dominant class of artefacts that we target here. The interface between the Wollaston Prisms is shown in the diagram of the optical instrument design described in Ramaprakash et al. (2019), and an example of the reflection artefact in Figure 1. The green box in the upper left quadrant shows two horizontally extended artefacts separated vertically. A few stars in the vicinity also got included in the box.
In this paper, we propose to solve the problem of artefact detection for RoboPol images using an appropriately designed Convolutional Neural Network (CNN). In Section 2, we introduce our approach to the problem of detecting artefacts in RoboPol images. We detail the implementation of our method including pre-processing steps, CNN architecture, and visualization of the output, and in Section 3 we discuss our findings and future possibilities.
The RoboPol database consists of tens of thousands of images taken between 2013 and 2019. We first generate a data-set containing stars and artefacts and then develop a CNN to perform the classification.
The following is an outline of our method:
1. Create training data for artefacts and stars from RoboPol images through manual labeling. This includes data for validation and testing. The manual labeling was done by visually inspecting about 100 images and recording the pixel coordinates of the artefacts.
2. Develop a CNN Architecture tuned through hyperparameter variation.
3. Train the model using training data obtained in step 1.
4. Validate the model using validation and testing data.
5. Implement the model to find artefacts in an arbitrary RoboPol image.
2.1 Training Data for Artefacts and Stars
Reflection artefacts and stars in RoboPol images have x and y extents from several pixels to a few tens of pixels. We chose a size of 64x64 pixels for our cutouts, with the artefacts and stars centered. Each star appears at four locations due to splitting of light from a single source within the instrument, with the locations lying at the vertices of a diamond. A detailed design implementation is available in Ramaprakash et al. (2019).
For each image we generate a catalog of sources (including stars, reflection artefacts, and any other connected brightness peaks) using Sextractor111https:///sextractor.readthedocs.io. We have roughly 10 artifacts and about 250 stars per image. The catalog comes with flags indicating various conditions such as saturation, proximity to another source, proximity to edge of image etc.222https://sextractor.readthedocs.io/en/latest/Flagging.html
. About 70% of the visually inspected artefacts had no error. This indicated that relying on just flags is not sufficient to separate artefacts. To obtain training data of stars we make sure that from every image we extract stars of varying brightness and not just from a narrow brightness range. In each image, we chose this range to be 1 star per magnitude-bin for up to 5 magnitudes in each image. Likewise, our training data would contain an uniform distribution of magnitudes of brightness and ensure that we aren’t biasing our neural network by providing training images from a limited magnitude range. We do not use all sources so that the sets of stars and artefacts can stay roughly equal, and hence balanced for the classification process.
2.2 CNN Architecture
We follow the now standard image classification model developed by the Visual Graphics Group (VGG) at Oxford, UK (Simonyan & Zisserman, 2014). Our implementation uses 3 convolution layers, 3 max-pooling layers and 2 fully connected layers (see Figure 2
). The hidden layers are activated using a ReLU (Rectified Linear Unit) activation. Finally we use a sigmoid activation at the output layer. The first, second and third convolution layer consist of 32, 64 and 128 filters respectively each with a kernel size of 3x3 and stride length of 1. The max-pooling layers use a kernel size of 2x2 pixels. At the end of the 3rd max-pooling layer we use a dropout layer with a probability of dropping a node as 0.4 for regularization ensuring no single parameter of the neural network has a very high coefficient(Srivastava et al., 2014). The total number of trainable parameters in our configuraton are 2,452,993 and we use an Adam optimizer to perform back-propagation (Kingma & Ba, 2014)
. The loss function used is a binary cross-entropy loss.
2.3 Data Augmentation and Training
While going through the images we saw that most of the artefacts are due to internal reflections and had a horizontal streak-like shape. For proper training of the neural network we need to generate a large data-set of training images. Therefore, to augment the number of images for training, we rotated the cutouts by 180 degrees so that the horizontal nature of the artefacts is preserved.
To split the data into training and validation we used a 80-20 ratio and shuffled the data-set randomly while splitting to avoid bias. For training, we had a total of 836 images of stars and 925 images of artefacts. The training data was augmented using the ImageDataGenerator class in the high-level Keras333www.keras.io API of Python. We performed horizontal and vertical flipping, width and height shifts and shearing. The shifts were applied to account for possible inaccuracies in centering of the samples. The validation and training images were both normalized to [0,1] by dividing by 255, the maximum value of an 8 bit image.
The hardware used for performing training was a 2.3 GHz Intel Core i5 processor and the total training time was 40 minutes.
2.4 Training Performance
The total number of images chosen for training the model was 1408 (80% of the 1761 images), and the remaining 353 images were reserved for validation. Training data was used to update the parameters of the model while validation data was used to only evaluate the model’s performance after every update. The batch size used was 4. A total of 100 epochs were used in the training and the steps per epoch was set to 1408/4 i.e. 352. We used a learning rate of 0.001. The training accuracy reached about 95% while the validation accuracy was close to 96% at the end of 100 epochs as seen in Figure4.
Besides making small changes to the hyperparameters above, we also implemented a network with 2 and 4 convolution layers to see whether there was any advantage in using shallower (2 layers) or deeper (4 layers) CNNs. The training accuracy and validation accuracy reached about 90% in the shallower network while it reached about 96% in the deeper network. Although training and validation accuracy may be good indicators of the proper working of a CNN in a binary classification problem there are other important parameters we need to consider when the costs of misclassification are high. For example, we are interested in knowing the false positive rate (sources wrongly classified as stars), the false negative rate (sources falsely classified as artefacts), as also the precision (fraction of sources correctly classified) and recall (fraction of stars correctly classified). These numbers are summarized through two metrics viz. F1 score and Matthew’s Correlation coefficient, and indicate whether our model is working as expected or not.
We need our system to have a high precision and recall score and the F1 score summarizes the 2 scores by taking their harmonic mean. The Matthew’s Correlation Coefficient (MCC) is akin to a correlation coefficient measure between the predicted labels and true labels. A value of +1 indicates perfect positive correlation between the 2 quantities. Equations1, 2, 3 and 4 give the formulae for Precision, Recall, F1 score and MCC respectively. Note that TP, FP, TN, FN stand for True Positives, False Positives, True Negatives and False Negatives respectively.
Figures 5 and 6 show the confusion matrix and ROC curve respectively. The Confusion Matrix tells us the number of true positives, true negatives, false positives and false negatives. The Receiver Operating Characteristics (ROC) curve is a measure of the trade-off between the true positive rate and false positive rate for different values of the threshold used in the classifier. The threshold is a value between 0 and 1. If the probability of the source being a star is greater than the value of the threshold we classify the source as a star else as an artefact. Our threshold is set to 0.5. Ideally, we want the false positive rate to be 0 and true positive rate to be 1. Ideally we expect our ROC curve to have an area (under the curve) to be equal to 1. The Area Under the Curve (AUC) for our model is 0.996 while that for a random classifier is 0.5 as shown by the red dotted line. A zoom-in of Figure 6 is shown in Figure 7 which shows the values True Positive Rates and False Positive Rates at different thresholds. It shows that our threshold of 0.5 coincides with 0.75 indicating that our classifier can confidently achieve the same True Positive Rate and False Positive Rate at a higher threshold. Table 1 compares the performance of our model with a shallower and deeper neural network. Based on this comparison, we chose the model with 3 convolution layers as the model with 4 layers actually shows a reduction in F1 score and MCC with a deeper and hence computationally more expensive network, possibly due to over-fitting. Figures 8 and 9 show some of the false negatives and false positives.
|Parameter||2 Layers||3 Layers||4 Layers|
2.5 Testing and Implementation
We used our model to test 100 randomly chosen images - distinct from the training set - from the 40,000 images of RoboPol taken during the years 2013 and 2014. Our goal was to classify all the sources in each of the 100 images into stars and artefacts and analyze the results. For each image, we obtained a list of sources, their positions, instrumental magnitudes and extraction flag error by using Sextractor. A histogram of the predicted probability of the sources being stars is shown in Figure 13. Out of 91,000 sources, we have 88,000 sources classified as stars (90 to 100% probability of being a star) and 2500 sources classified as artefacts (0 to 10% probability of being a star). The inset plot shows that in the remaining prediction probability range there are fewer than 10 objects in each bin of size 10%. This means that of the sources had probabilities in the range between 10 to 90%.
Figures 10 and 11 show the results of classification on test images. Each image contains a single source with known label. At the top of each image is the probability of the source being a star. Sources in Figure 10 are artefacts while those in Figure 11 are stars. In both categories, our classification rates are almost always above 90%.
The implementation pipeline takes an input image and the corresponding Sextractor file as its arguments and produces a list of locations of the detected artefacts along with their location marked in the original image with an associated probability. A decision logic diagram is shown in Figure 12.
2.6 Visualization with Saliency Maps
A saliency map helps us find the locations of the pixels in input images which need to be changed the least to activate the output filter. This means we find the gradient of the input image with respect to the output score. To visualize a saliency map, positive gradients are chosen that would give us the locations of the pixels activating the output filter. In other words this gives us the location of the object of the relevant class in the input image. A saliency map thus gives us the salient features of the class-specific input image that maximize the class score. A detailed mathematical treatment can be found in Simonyan et al. (2013). Figure 14 shows the saliency maps for 9 different input images. Observe that for the image in row 3, column 3, only the artefact is visualized in the saliency and not the star at the top right corner of the image. Sextractor’s ellipticity measure alone is not sufficient to separate the artefacts.
We used Convolutional Neural Networks to solve the problem of detecting artefacts in polarimetric images. Although the use of CNNs in astronomical image classification is not new, this is the first time that they have been used for detecting artefacts in polarimetric images. The efficiency of the method shows its suitability for use in upcoming polarimetry surveys such as the Polar Areas Stellar Imaging in Polarization High Accuracy Experiment (Tassis et al., 2018), which will use the novel Wide Area Linear Optical Polarimeter (WALOP). Our implementation suggests that this method can be reliably used for detecting other kinds of artefacts as well given enough training data. The RoboPol instrument operates down to 16th magnitude in the R1 band. Figure 15
shows that our Deep Learning model can classify stars down to 15.9 magnitude with a prediction probability better than 0.9. We have also plotted the signal to noise ratio (SNR) of the stars on a separate axis. We see that our model works up to SNRs of 15. Thus, our implementation works well with objects within the magnitude range RoboPol observes.
In this paper, we do not use the spatial correlation for stars appearing as a diamond pattern in RoboPol images. That is because the diamond structure in each image of RoboPol is specific to the RoboPol polarimeter design and wouldn’t be present in a single image of future polarimeters such as WALOP.
In the RoboPol data-set, we had majority of artefacts due to scattering of light from off-axis stars at the interface of the Wollaston prisms (Ramaprakash et al., 2019). Our method demonstrates that a binary classifier trained on images of stars and artefacts can successfully differentiate between them. Our training data does not contain enough examples for artefacts such as satellite trails or bleeding pixels and as a result deep learning them is non-trivial without aggressive data augmentation. There already exist methods to remove such artifacts. Out-of-Distribution detection networks (Huang et al., 2019)
can also be used to detect such infrequent outliers. The final pipeline can incorporate such methods to deliver artefact-free products.
The work has been funded by the National Science Foundation under the NSF grant (161547). AM acknowledges support from the NSF (1640818, AST-1815034) and IUSSTF (JC-001/2017). KT acknowledges support from the European Research Council under the European Union’s Horizon 2020 research and innovation program, under grant agreement No771282.
- Burke et al. (2019) Burke C. J., Aleo P. D., Chen Y.-C., Liu X., Peterson J. R., Sembroski G. H., Lin J. Y.-Y., 2019, Monthly Notices of the Royal Astronomical Society
- Cabrera-Vives et al. (2017) Cabrera-Vives G., Reyes I., Förster F., Estévez P. A., Maureira J.-C., 2017, ApJ, 836, 97
et al. (2009)
Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., 2009, in 2009 IEEE conference on computer vision and pattern recognition. pp 248–255
- Donalek et al. (2008) Donalek C., Mahabal A., Djorgovski S., Marney S., Drake A., Glikman E., Graham M., Williams R., 2008, in AIP Conference Proceedings. pp 252–256
- Duev et al. (2019a) Duev D. A., et al., 2019a, MNRAS, p. 2039
- Duev et al. (2019b) Duev D. A., et al., 2019b, Monthly Notices of the Royal Astronomical Society, 486, 4158
- Gonzalez et al. (2018) Gonzalez C. G., Absil O., Van Droogenbroeck M., 2018, Astronomy & Astrophysics, 613, A71
- He et al. (2017) He K., Gkioxari G., Dollár P., Girshick R. B., 2017, arXiv preprint arXiv:1703.06870
- Huang et al. (2019) Huang Y., Dai S., Nguyen T., Baraniuk R. G., Anandkumar A., 2019, arXiv preprint arXiv:1907.04572
- Kingma & Ba (2014) Kingma D. P., Ba J., 2014, arXiv preprint arXiv:1412.6980
- Kotikalapudi & contributors (2017) Kotikalapudi R., contributors 2017, keras-vis, https://github.com/raghakot/keras-vis
- Ramaprakash et al. (2019) Ramaprakash A. N., et al., 2019, Monthly Notices of the Royal Astronomical Society, 485, 2355
- Simonyan & Zisserman (2014) Simonyan K., Zisserman A., 2014, arXiv preprint arXiv:1409.1556
- Simonyan et al. (2013) Simonyan K., Vedaldi A., Zisserman A., 2013, CoRR, abs/1312.6034
Srivastava et al. (2014)
Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R., 2014, The journal of machine learning research, 15, 1929
- Storkey et al. (2004) Storkey A. J., Hambly N. C., Williams C. K., Mann R. G., 2004, Monthly Notices of the Royal Astronomical Society, 347, 36
- Tassis et al. (2018) Tassis K., et al., 2018, arXiv e-prints, p. arXiv:1810.05652