Point Proposal Network: Accelerating Point Source Detection Through Deep Learning

by   Duncan Tilley, et al.
University of Pretoria

Point source detection techniques are used to identify and localise point sources in radio astronomical surveys. With the development of the Square Kilometre Array (SKA) telescope, survey images will see a massive increase in size from Gigapixels to Terapixels. Point source detection has already proven to be a challenge in recent surveys performed by SKA pathfinder telescopes. This paper proposes the Point Proposal Network (PPN): a point source detector that utilises deep convolutional neural networks for fast source detection. Results measured on simulated MeerKAT images show that, although less precise when compared to leading alternative approaches, PPN performs source detection faster and is able to scale to large images, unlike the alternative approaches.



There are no comments yet.


page 3

page 7

page 8

page 10

page 11


DeepSource: Point Source Detection using Deep Learning

Point source detection at low signal-to-noise is challenging for astrono...

DECORAS: detection and characterization of radio-astronomical sources using deep learning

We present DECORAS, a deep learning based approach to detect both point ...

CAESAR source finder: recent developments and testing

A new era in radioastronomy will begin with the upcoming large-scale sur...

A study of Neural networks point source extraction on simulated Fermi/LAT Telescope images

Astrophysical images in the GeV band are challenging to analyze due to t...

Evolutionary Map of the Universe (EMU):Compact radio sources in the SCORPIO field towards the Galactic plane

We present observations of a region of the Galactic plane taken during t...

Eliminating artefacts in Polarimetric Images using Deep Learning

Polarization measurements done using Imaging Polarimeters such as the Ro...

Fast Point Spread Function Modeling with Deep Learning

Modeling the Point Spread Function (PSF) of wide-field surveys is vital ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The long wavelengths observed in radio astronomy allow wide fields of view to be imaged with modern interferometers, while being sensitive to important radio source populations, including radio galaxies, supernovae, and a suite of explosive phenomena across the Universe. The Square Kilometre Array (SKA) telescope is a proposed radio interferometer that, when completed, will be able to conduct surveys with an unparallaled combination of sensitivity, angular resolution and imaging fidelity [1].

The SKA pathfinders such as the MeerKAT in South Africa and the ASKAP in Australia are already conducting both wider and deeper surveys than before, providing a steady build-up to the arrival of the SKA. In addition, wide-field Very Long Baseline Interferometry (VLBI) is a burgeoning field which routinely produces Terapixel images within a single field of view.

It is a necessary step in the scientific pipeline to extract properties of observable objects, or sources

from a survey image. At the very least, the position of a candidate source must be known to perform any further processing. Detecting or classifying sources are in some cases done by hand, as is the noble goal of the Radio Galaxy Zoo

[2]. For the sake of time, however, automated techniques are used which usually provide the minimal information necessary to describe point sources.

As radio telescope survey speeds and angular resolutions increase, the size of surveyed images increases. For this reason, recent years have seen the introduction of many new source detection techniques which attempt to improve the reliability [3][4][5], or in few cases, the speed [6] of previous techniques. Many recent surveys have also analysed the performance of existing source detection techniques when applied to modern telescopes [7][8][9]. These surveys emphasise the growing need for novel detection methods.

This paper limits its investigation to point sources rather than extended sources, the former having an extent significantly smaller than the instrument’s point spread function full width at half maximum (FWHM). This limitation is justified by the fact that point sources dominate detections in current surveys from the MeerKAT and ASKAP.

Most point source detectors roughly follow the same skeleton consisting of 3 steps [7]. The following is a brief description of each step:

  1. Background estimation

    , whereby the background noise of an image is estimated and subtracted from the original image in order to obtain a clearer image


  2. Source detection, which usually involves performing the computationally expensive flood-fill algorithm to detect contiguous islands of pixels that lie above a certain threshold. These islands are then identified as candidate sources.

  3. Source characterisation, in which various characteristics of each source are calculated. The simplest of these is the sky coordinates (right ascension, RA and declination, DEC) of the point source’s centre.

Hancock et al. [7] add the final step of cataloguing sources which will be considered as part of source characterisation here, as few detectors perform any notable post-processing. As mentioned, the flood-filling approach utilised by point source detectors is expensive, which doesn’t scale well to larger images.

This paper proposes a new point source detection technique that utilises modern advances in object detection. The technique is called the Point Proposal Network (PPN), which uses a deep convolutional neural network (CNN) [11][12][13] to perform all three steps lined out above in a single evaluation of the CNN. The technique therefore eliminates the need of a flood filling iteration and aims to provide a faster and more scalable point source detection method for use in current and future large surveys.

The remainder of this paper is laid out as follows. Section 2 discusses related works in point source and object detection. Section 3 describes the PPN approach in full. Section 4 discusses the experiments that were performed and illustrates the performance of PPN in comparison to DeepSource [5], both in terms of accuracy and speed. Finally, Section 5 provides concluding remarks.

2 Related Work

2.1 Point Source Detectors

SExtractor [10] is an early point source detector that first introduced the skeleton discussed in the introduction. This point source detector influenced virtually all later point source detectors, with many of them introducing only small variants to the different steps [4][3][6].

DeepSource is a point source detector that was recently proposed by Sadr et al. [5]. To the authors’ knowledge, it is the first point source detector that involves neural networks within the detection pipeline. DeepSource uses a CNN to perform background noise estimation by directly transforming the image to a cleaner version. This has improved the accuracy of point source detections in the presence of noise; however, the source detection is still performed using a variant of flood filling and suffers from the same poor scaling as earlier techniques.

2.2 Object Detectors

R-CNN is one of the first object detectors that utilised a deep CNN [14] and led to the development of many other modern object detectors. A notable variant of this technique is Faster R-CNN [15] which uses a CNN to perform localisation by proposing regions of possible objects. This CNN is aptly named the Region Proposal Network (RPN).

The output of the RPN is then used to generate subsets of the input image, which are fed through additional layers to perform classification. The use of an RPN greatly improves on the speed of R-CNN, but multiple evaluations of the later layers are still required for each proposed area. Other methods have proposed ways to perform both localisation and classification in a single evaluation of a CNN, such as YOLO and SSD [16][17][18].

The approaches used by these techniques allow for faster object detection. This makes them strong candidates for situations where high-speed detection of objects is required, such as in videos or possibly very large images. These techniques are what led to the point source detector proposed in this paper.

3 Point Proposal Network

In this section, the details of PPN are discussed111An implementation of PPN including data simulation can be found at https://github.com/tilleyd/point-proposal-net. Section 3.1 first fully describes the technique, with Section 3.2 showing the steps taken to choose the values for hyper-parameters and develop the full network architecture.

3.1 Proposed Method

The PPN approach described here considers point source detection as a regression problem. The network receives a single image as input and directly provides proposed positions of point sources. The technique therefore does not follow the traditional point source detection steps and provides a faster way of performing point source detection.

The architecture of PPN is based on the Faster-RCNN Region Proposal Network (RPN) [15]

. The RPN is specifically designed for only proposing the regions of objects and does not include classification information in the direct output, making it suitable for this problem. In addition, the RPN architecture and loss function are fairly loosely defined, allowing modification to perform point proposals instead of region proposals.

3.1.1 Architecture

Similar to the RPN, PPN performs regression based on anchors placed across the input image. However, in the case of PPN each anchor only outputs a single offset to a possible point source’s centre and doesn’t include any bounding box information. To this end, the anchors will instead be referred to as the offset origins.

PPN is separated into two logical parts: the base layers and the proposal layers. The base layers can have an arbitrary architecture and are explored later in this section. The shape of the base layers’ output feature map defines the number of origins. If the output feature map is of shape , then there will be rows and columns of origins. The number of output channels, , does not affect the shape of the final output.

The proposal layers are a number of convolutional layers that use the features extracted by the base layers to perform regression. The final output of the proposal layers is an

regression matrix, and an confidence matrix. The architecture of these layers is shown in Fig. 1.

Fig. 1:

An illustration of the 3 convolutional layers responsible for producing point proposals from the base layers’ feature map. The layers use the features extracted by the base layers and creates separate confidence and regression matrices. These layers are zero-padded with no activation functions. Only the confidence output is passed through a Sigmoid activation function to produce an output between 0 and 1.

The last dimension of the regression matrix contains tuples , which are the offsets from origin

’s position to a possible point source. The offsets are normalised such that an offset of 1 or -1 will line up with the neighbouring origin. The last dimension of the confidence matrix contains a probability score for origin

, which indicates the probability that there is in fact a point at the position pointed to by .

The base layers of PPN are crucial to effectively extract features from images. Experiments with the base layers (see Section 3.2.1) showed the best performing architecture among those evaluated is a 31-layer ResNet [19] architecture, as illustrated in Fig. 2. More residual blocks were added to the earlier layers in order to minimise the amount of information loss when down-sampling the feature maps. This has shown to perform slightly better than giving an equal number of blocks to each section, possibly due to the small size of point sources being highly susceptible to information loss.

Fig. 2:

The 31-layer ResNet base layer architecture. Each convolutional layer is zero-padded and followed by batch normalisation, before being passed through a ReLU activation function. The slanted convolutional layers use a stride of 2 in order to down-sample the input feature map.

3.1.2 Training

PPN uses a supervised learning approach to minimise the loss function. Since the output of PPN is two separate matrices, the loss function consists of two parts. The first part uses focal loss

[20] for the confidence output (1) and the second uses the mean-squared error for the regression output (2).

The use of focal loss allows more control over the landscape of the loss function by exponentially increasing the loss contribution of “difficult” samples and decreasing that of “easy” samples. This is controlled by the hyper-parameter. Additionally, an weighting factor is used for positive samples. The choice of the focal loss hyper-parameters and is explored in Section 3.2.2.


The confidence ground truth, , is 1 for origins which are within a radius from a point source, and 0 if there is no point source close enough. The regression ground truth, , is the normalised offset from the origin to the nearest point source and is only calculated if .

and are flag values that are either 1 or 0. This allows control over which origins should contribute to each part of the loss function. is 1 only for origins with , or where the nearest point source is further than . is 1 only for origins with .

The flags and allow only origins that are sufficiently close to a point source to contribute to the regression loss, and only origins that are either sufficiently close or sufficiently far to contribute to the confidence loss. Origins that are in the range from the nearest point source will not contribute to the loss function.

The confidence and regression losses are combined as in Eq. 3. Following the technique used by the RPN, and are balancing coefficients introduced to ensure that both parts have an equal weighting. The Faster R-CNN authors [15] recommend to simply use . These recommended values do not perform any balancing, but will instead result in the loss values being averaged across the batch.


For this paper, all training was done using the Adam optimiser [21] with a drop rate [22] of 20% in all layers. A training set of 8192 simulated image patches (see Sections 4.1) are used with a batch size of 128. Additionally, validation and testing sets of 512 image patches each are used to validate the model. During training, the model with the lowest validation loss value is saved and recalled at the end before testing.

3.1.3 Inferencing

The PPN input has a fixed shape, so all images given to the network must have the same shape. It is impractical to scale any image to the required size as this will cause a lot of information to be lost. To allow the use on different sized images, the input image is segmented into smaller patches, which may optionally overlap. The patches are then individually passed through the network to obtain the output proposals for each patch.

The regression output is in the form of normalised offsets, which must first be denormalised back into pixel coordinates. Next, the positions of each individual origin must be added to the regression offsets, which results in positions within the patch. To obtain positions within the original image, the patch’s origin must also be added to all the positions. These are all matrix additions, which can be efficiently computed on a GPU and are incorporated directly into the PPN model.

A problem that arises with PPN is that there are a large amount of duplicate proposals made. Multiple origins may propose the same point, and duplicates may also arise as a result of patching (whereby points at the edges of overlapping patches are proposed in both patches). Duplicate removal is performed on the combined proposals for the entire image using a form of Non-Maximum Suppression (NMS). In short, the algorithm takes the most confident point and removes all points within a radius from that point. During this process, points with a confidence score below are also removed. The procedure when applied to points is presented in the following algorithm.

  while  do
  end while

3.1.4 Performance Metrics

In order to measure the performance of PPN, multiple metrics are taken into account. Precision and recall metrics are primarily used, defined in Eq. (4). Precision is the ratio of the correct detections to the total number of detections made, measuring how reliable the results are. Recall is the ratio of correct detections to the total number of point sources in the image, measuring how complete the results are.


A prediction will be considered a true positive when it is within a radius, , of the ground truth centre of a point source. Additionally, if either a ground truth or a prediction has been counted, it is excluded from further checks. This ensures that when two or more predictions are near a single source, only one will count as a true positive. Conversely, if two or more true sources surround a single prediction, only one match will be counted as a true positive.

Note that the value of does not directly impact the performance of PPN, but rather determines how harsh the performance metrics are. The value of is therefore fixed to 0.4 for the process of hyper-parameter tuning and for all recall measurements. In Section 4, the precision is evaluated at different values of , which shows how close the inferred sources are to the true centres.

In addition to precision and recall, the

score is used as defined in Eq. (5). This provides a balanced combination of both precision and recall and is mostly used in this study to compare different configurations of hyper-parameters to choose the best balanced model. In practice however, it may be beneficial to consider either recall or precision to be more important.


3.2 Hyper-Parameter Selection

Hyper-parameter selection is important to maximise the performance of PPN. In order to choose good values for hyper-parameters, a manual search of different values was performed while attempting to maximise the precision and recall for images in the validation set. This section first discusses the selection of base layers in Section 3.2.1, then explores the values of all hyper-parameters used during both training and inferencing in Sections 3.2.2 and 3.2.3.

3.2.1 Base Architecture

Three different architectures were initially evaluated during base layer selection. The first is a 13-layer VGGNet variant, which is similar to VGG16 [23] but has the fully-connected layers dropped away. The second architecture is a 17-layer ResNet architecture. The third and final architecture is a custom 51-layer “pyramid” model, which builds on the premise that down-sampling should be minimised to prevent information loss. To this end, the architecture relies on the loss of size when no padding is used in convolutional layers to funnel down to the desired size.

The results showed that the ResNet architecture is the best choice as it acquires a higher precision than VGGNet, though both are strong contenders for recall. The pyramid proved difficult to train and has poor performance in comparison to the other architectures. The ResNet architecture was explored further, with different sizes being tested. The results of testing a 9-layer up to a 61-layer ResNet are shown in Fig. 3.

Fig. 3: The scores and inference speeds of PPN when using different sized ResNet architectures as base layers. Inference speeds were measured on images.

The graph shows the evaluation speed versus the score of the different architectures. These results show that there are very marginal improvements in the score up to the 31-layer ResNet, with it becoming worse with the 61-layer architecture. It may be beneficial to instead use the smaller architectures since even the 9-layer architecture is able to perform well, but ResNet-31 was chosen for the purposes of this paper as it boasts the highest score; however marginally.

3.2.2 Focal Loss Parameters

The performance of focal loss [20] as shown in Eq. (1) is sensitive to the values of and , so different values were explored in combination. Following the recommendation of the authors, the biases of the final layer are initialized to when using focal loss, with . This forces the confidence matrix to provide a confidence roughly equal to , increasing the loss contribution of easy negative samples during early training.

The measured performance of different models is shown in Table I. In the absence of focal loss (), precision worsens as increases with an improvement in recall. This is expected as the positive samples carry more weight, causing the network to learn to identify more difficult samples while also causing it to misidentify more noise. This pattern however disappears when .

Increasing tends to decrease recall, which is especially evident when comparing the results of to those of . The precision also decreases slightly as is increased, though the value increases as increases. The shortcoming of focal loss in this case may be attributed to the “difficult” samples of this problem being particularly difficult; the faint sources are effectively indistinguishable from the background noise. The focal loss puts less focus on the easier samples while still being unable to correctly detect the difficult ones, leading to an overall worse recall.

As a result, the values of and are chosen as they maximise the score. With these values the loss function is of course equivalent to a normal weighted binary cross-entropy.

Precision Recall
0.0 0.5
0.0 1.0
0.0 2.0
1.0 0.5
1.0 1.0
1.0 2.0
2.0 0.5
2.0 1.0
2.0 2.0
TABLE I: Precision, Recall and Scores of PPN for Different Focal Loss Parameters

3.2.3 Other Parameters

The chosen values of the remaining parameters are shown in Table II. The search for values was in no way exhaustive, focussing mainly on the value of . An exhaustive search quickly becomes infeasible when considering the number of hyper-parameters and the time required to train the CNN. However, the selected hyper-parameters are sufficiently well-tuned to demonstrate the effectiveness of PPN.

Parameter Value
Input shape
Origin shape
TABLE II: Chosen Values of PPN Hyper-Parameters

The input and origin shapes were chosen such that they are compatible with most existing architectures, where is a common input shape, and a feature map is easy to reach from the input shape. If more dense images are expected, the origin shapes should rather be adjusted, as it will directly affect the number of proposals PPN is able to make. Increasing the input shape may lead to a small increase in speed (as less patches and thus less evaluations are required), but will require more memory to store the model.

was chosen as the minimal radius that is able to cover the entire image, which is the distance from an origin to the centre of a group of four origins. This ensures that no point source is unreachable when constructing the ground truth regression values, while also reducing the number of duplicate proposals.

was minimally experimented with. When , the range in which origins do not contribute to the loss function very clearly affects the results. As the gap increases (i.e. more origins are ignored), the number of false positives increases which drastically reduces precision. Therefore was chosen, which produced the best results.

The value of showed to have very little impact on the results. The specificity of the confidence values is very high, meaning that most guesses provide either a low or a high probability.

Finally, the impact of was iteratively explored and shows a strong influence on the balancing of precision and recall. Its effect on precision and recall values is illustrated in Fig. 4. As the radius increases (therefore removing more points during NMS), the recall value lowers, but the precision increases. This is a result of removing more duplicates, but also removing some true positives. Conversely, decreasing the radius increases the recall and decreases the precision. Again, this is a result of including more results, and introducing more false positives.

Fig. 4: The precision, recall and scores for PPN when performing NMS with different values of . The score is maximised at .

The is an interesting parameter to consider when using PPN in practice, seeing the strong influence it has on the precision and recall values. It can therefore be used to help steer the results towards a desired score to maximise. For the purposes of this paper, the score was maximised at .

4 Experimental Outcomes

This section first describes the method used to generate simulated images in Section 4.1. Sections 4.2 and 4.3 then respectively compare the accuracy and speed of PPN to that of DeepSource based on different experiments.

4.1 Image Simulation

The true sky can never be known; in order to be used for training, images need to have all sources labelled, including those that humans may have missed. This section provides information on how radio survey images are simulated in order to obtain a sufficiently large, labelled data set.

Parameter Value
Synthesis time 1 h
Integration time 5 min
Centre frequency 1.4 GHz
Channel width 0.5 MHz
Number of channels 10
System equivalent flux density 450 Jy
Pixel scale
Clean iterations 10000
Weighting Briggs 1.5
Number of point sources 120-150
Image size
TABLE III: Parameters Used for Simulation of MeerKAT Survey Images

Images were simulated with the Stimela222https://github.com/SpheMakh/stimela package using the parameters provided in Table III, which is in part similar to those done by Sadr et al. [5]. The parameters simulate observations imaged by the MeerKAT telescope, and have different field centres sampled uniformly at random from .

To simulate an image, an empty measurement set which contains the visibilities of an interferometer is first created. Visibilities are measurements that represent Fourier components of the sky brightness distribution. Next, it is filled with the expected thermal noise from the MeerKAT setup listed in Table III

and then imaged. This creates a “noise” image that contains no point sources, from which the background root mean square (rms; equivalent to the standard deviation,

), can easily be calculated.

With the background rms calculated, the point sources are generated and added in the image through the Tigger-LSM software333https://github.com/ska-sa/tigger-lsm. Point sources are (uniformly) randomly distributed across the image, allowing sources to overlap. The peak flux, , of the sources is a controlled variable, and is calculated as a multiple of . The peak flux is allocated by separating the sources into 30 equal bins, where the sources in each bin will have an equal flux. The flux for bin is for . This provides 30 discrete values for flux in the range of to . Examples of generated images are shown in Fig. 5.

For use in both PPN and DeepSource, sky coordinates are converted to pixel coordinates to obtain the ground truth locations of all point sources. The flux is linearly scaled across the image such that all values fall in the range . For PPN, images are then separated into patches with an overlap of 4 pixels.

Ground Truth
Fig. 5: Examples of simulated MeerKAT images. The images were simulated at two corners of the selected field centre range, as well as in the centre of the range. The ground truth positions of point sources are also shown. Note that these images are cropped sections from the full images.

4.2 Precision and Recall

This section compares the precision and recall (as described in Section 3.1.4) of PPN to DeepSource. The recall for sources in each bin (i.e. at different flux levels) is calculated individually, which shows the performance of the techniques at different noise levels. The precision is measured across all sources at different values of .

For DeepSource, 20 thresholds from 0.1 to 1.0 with logarithmic intervals are used, with . After identifying islands of pixels, those with a pixel surface area of less than 3 pixels are removed. The centres of the sources are taken as the mean position of the pixels in the island. These hyper-parameters provided maximum recall for sources with in the validation set.

50 new images were created for testing the final model. Together, the images contain a total of 6350 sources, providing just over 200 sources in each of the 30 flux level bins. Fig. 6 shows examples of output produced by both PPN and DeepSource. The measured precision and recall scores are shown graphically in Fig. 7 and listed in Table IV. The listing contains only partial results for the sake of brevity.

Fig. 6(b) shows that both PPN and DeepSource perform well at levels with , with the recall worsening at and performing poorly at lower levels. Both methods are able to recall more than 80% of sources at , however PPN maintains a higher recall at all levels except the very lowest ().

The precision in Fig. 6(a) shows excellent results for both methods, although DeepSource dominates PPN in this regard. The graph shows that DeepSource in general has a higher precision than PPN, and also detects the source centres closer to the ground truth centre. This is indicated by the slower decline at lower values of .

One might wonder how larger values of (explored in Section 3.2.3) affects both the recall and precision on the final outcomes. To this end, PPN was additionally evaluated with , denoted as PPN in both Fig. 7 and Table IV. The results show a higher precision and a recall closer to what DeepSource achieves. The precision of course still drops at lower values since the regression is no more accurate than before, but the precision goes up as more false positives are removed.

These experiments show that PPN is capable of proposing point sources fairly accurately. The recall is comparable to that of DeepSource, however the precision is worse due to the inaccuracy of regression. When reliability is the most important aspect, PPN should not be the first choice, however the next experiment shows the trade-off that could be made in favour of speed which will be necessary for SKA pathfinders as the survey image sizes increase.

Original Ground Truth DeepSource PPN
Fig. 6: Example outputs produced by DeepSource and PPN. The precision of DeepSource is evident in these examples as the detected point centres are almost always right on the true centre. PPN is in most cases at least on the point source, but is further away from the true centres. A few shortcomings that can be seen here is that DeepSource tends to miss more sources which leads to its lower recall, and PPN suffers from bad precision in dense clusters, as can be seen in image (b). Note that these images are cropped sections from the full images.
Fig. 7: Precision and recall scores measured for PPN and DeepSource. Fig. 6(a) shows the precision of each method (across all point sources) as a function of . This representation of the precision indicates how close to the centres the predictions tend to be, as a smaller value of will only consider closer predictions to be true. Fig. 6(b) shows the recall as a function of point source flux. Figs. 6(c) and 6(d) show the same measurements for PPN when evaluated with .

TABLE IV: Precision and Recall Scores Measured for DeepSource and PPN
DeepSource PPN PPN
DeepSource PPN PPN

4.3 Evaluation Speed

The poor scaling ability of point source detectors is a large motivation for the development of PPN. This section breaks down the inference time (i.e. compute time) of both PPN and DeepSource444All experiments were performed on the Ilifu infrastructure using a single Nvidia Tesla P100 12GB.. All time measurements are in wall-clock seconds.

For the experiment, images from size up to were generated, with each category being 1024 pixels larger on both dimensions than the previous. This results in 16 size categories, for which 50 images were generated each. The number of point sources are also increased as to maintain a similar density of sources across the image surface.

Timing starts when the model has been created and the image already resides in memory. The times recorded therefore exclude the time of creating the model as it need only be done once, and excludes the time taken to load the image into main memory as it affects both techniques equally. Times do however include the copying of data to and from the GPU.

A modification of DeepSource was necessary to cope with memory constraints. DeepSource is defined to handle one image at a time, however the CNN model grows drastically as the images become larger. For this reason, the model was limited to dealing with images. For larger images, the image was patched in the same manner as for PPN. The patches are individually fed through the CNN and the results are integrated back with overlapping sections being averaged.

The individual steps of each technique are recorded separately to provide a detailed breakdown of where computation is spent. For PPN, these include the patching of the original image, the evaluation of the CNN, and non-maximum suppression for removal of duplicates. For DeepSource, these include the evaluation of the CNN, thresholded blob detection (TBD), and also additional patching and reintegration of the image as explained above.

It is difficult to measure a precise time for TBD, as the number of iterations may vary between images based on and the number of thresholds used. Additionally, TBD iterations may be parallelised, providing some speed-up. For this reason, only a single iteration of TBD is performed at . This decision can be seen as measuring a lower bound estimate of the DeepSource implementation’s computational time.

The measured total inference times are illustrated in Fig. 8. The times measured for selected image sizes and the individual steps are listed in Table V. The listing only contains select results for the sake of brevity. From the results, PPN shows better speed at both smaller and larger images when compared to DeepSource. The largest tested images at are processed by PPN in under 20% of the time used by DeepSource.

Fig. 8 suggests that PPN has a time complexity of for an image of size , though more detailed analysis will be necessary to confirm. DeepSource has a slightly worse complexity as a result of the flood-fill algorithm. The step pattern present in the DeepSource timings indicates the effect of the model’s size limit. Every time an image becomes sufficiently large, more individual evaluations of the CNN are required, leading to an increase in the total CNN evaluation time.

Fig. 8(a) shows a breakdown of the PPN steps. The CNN evaluation takes up the majority of computation time, as can be expected due to the depth of the CNN and the simplicity of the other steps involved. However, as the images become larger, the fraction of time being spent on the patching and NMS becomes larger, though the majority of time is still spent on CNN evaluation.

Fig. 8(b) shows a breakdown of the DeepSource steps. Here the CNN evaluation time is less than the single TBD iteration. This emphasises the poor complexity of the flood fill algorithm, the effects of which will worsen with more TBD iterations. The time spent patching the image is longer than the patching performed by PPN, due to the reintegration step after CNN evaluation.

PPN is able to process images faster than the current state of the art, machine learning based approach. The performance gain also only becomes more prominent as the images become larger, providing a more scalable approach.

Fig. 8: The measured total compute times for PPN and DeepSource at different image sizes. The figure takes the square root scaled time against the image size, which gives an indication of the per-pixel time complexity.
Fig. 9: A visualisation of the mean compute time compositions for PPN and DeepSource. Fig. 8(a) shows the composition for PPN and 8(b) for DeepSource. The compositions for the smallest and largest images are also shown in each case.
Size Patching CNN NMS Total
Size Patching CNN TBD Total
TABLE V: Measured Evaluation Times (sec) for PPN and DeepSource at Different Image Sizes

5 Conclusion

This paper proposed the Point Proposal Network, which is a point source detection technique that uses modern object-detection techniques based on deep CNNs. The technique addresses the slow processing speeds and poor scalability of existing techniques.

PPN was compared to DeepSource, the state of the art machine learning based approach to point source detection. Experiments showed that PPN has a similar ability to recall point sources, though the precision is weaker. Further experiments showed that PPN is able to process images much faster than DeepSource and is able to scale to larger images.

There is a lot of room to further develop and investigate the PPN approach. PPN can possibly be used in a larger pipeline, where PPN is first used to estimate where point sources are. The estimates could then be used to produce sub-images on which other methods can be used to produce more reliable results. PPN can also be expanded by including additional characteristics in the regression output. This may include the variances of a Gaussian distribution fitting the source, or even possible classification scores. Many different possibilities can be explored.

The significant source detection challenges that will be faced by the SKA and its pathfinders require a suite of robust, sophisticated and scalable methods to fully extract the scientific value of increasingly deep views of the Universe. PPN could well contribute to that with further testing and development.


The authors would like to thank the Inter-University Institute for Data-Intensive Astronomy (IDIA) for sponsoring the research and providing the computational infrastructure used throughout.


  • [1] R. P. Norris, J. Afonso, D. Bacon, R. Beck, M. Bell, R. J. Beswick, P. Best, S. Bhatnagar, A. Bonafede, G. Brunetti et al., “Radio continuum surveys with square kilometre array pathfinders,” Publications of the Astronomical Society of Australia, vol. 30, 2013.
  • [2] J. K. Banfield, O. I. Wong, K. W. Willett, R. P. Norris, L. Rudnick, S. S. Shabala, B. D. Simmons, C. Snyder, A. Garon, N. Seymour et al., “Radio galaxy zoo: host galaxies and radio morphologies derived from visual inspection,” Monthly Notices of the Royal Astronomical Society, vol. 453, no. 3, pp. 2326–2340, 2015.
  • [3] C. A. Hales, T. Murphy, J. R. Curran, E. Middelberg, B. M. Gaensler, and R. P. Norris, “BLOBCAT: software to catalogue flood-filled blobs in radio images of total intensity and linear polarization,” Monthly Notices of the Royal Astronomical Society, vol. 425, no. 2, pp. 979–996, 2012.
  • [4] N. Mohan and D. Rafferty, “PyBDSF: Python Blob Detection and Source Finder,” Astrophysics Source Code Library, record ascl:1502.007, 2015.
  • [5] A. V. Sadr, E. E. Vos, B. A. Bassett, Z. Hosenie, N. Oozeer, and M. Lochner, “DeepSource: Point Source Detection using Deep Learning,” 2018. [Online]. Available: http://arxiv.org/abs/1807.02701http://dx.doi.org/10.1093/mnras/stz131
  • [6] D. Carbone, H. Garsden, H. Spreeuw, J. Swinbank, A. van der Horst, A. Rowlinson, J. Broderick, E. Rol, C. Law, G. Molenaar, and R. Wijers, “PySE: Software for extracting sources from radio images,” Astronomy and Computing, vol. 23, pp. 92–102, 2018.
  • [7] P. J. Hancock, T. Murphy, B. M. Gaensler, A. Hopkins, and J. R. Curran, “Compact continuum source finding for next generation radio surveys,” Monthly Notices of the Royal Astronomical Society, vol. 422, no. 2, pp. 1812–1824, 2012.
  • [8] M. T. Huynh, A. Hopkins, R. Norris, P. Hancock, T. Murphy, R. Jurek, and M. Whiting, “The Completeness and Reliability of Threshold and False-discovery Rate Source Extraction Algorithms for Compact Continuum Sources,” Publications of the Astronomical Society of Australia, vol. 29, no. 3, pp. 229–243, 2012.
  • [9] A. M. Hopkins, M. T. Whiting, N. Seymour, K. E. Chow, R. P. Norris, L. Bonavera, R. Breton, D. Carbone, C. Ferrari, T. M. O. Franzen et al., “The ASKAP/EMU Source Finding Data Challenge,” Publications of the Astronomical Society of Australia, vol. 32, 2015.
  • [10] E. Bertin and S. Arnouts, “SExtractor: Software for source extraction,” Astronomy and Astrophysics Supplement Series, vol. 117, no. 2, pp. 393–404, 1996.
  • [11] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [12]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [13] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning.   MIT press, 2016.
  • [14] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2014, pp. 580–587.
  • [15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
  • [16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  • [17] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
  • [18] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in European conference on computer vision.   Springer, 2016, pp. 21–37.
  • [19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, 2016.
  • [20] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
  • [21] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
  • [22] N. Srivastava, G. Hinton, and A. Krizhevsky, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [23] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” sep 2014. [Online]. Available: http://arxiv.org/abs/1409.1556