Observational astronomy is a favorable field for computer vision applications and currently also experiences the accelerating uptake of convolutional neural networks (CNNs). These methods have drastically improved various object recognition tasks from natural images, such as object classification and detection(Russakovsky et al., 2015)
. Just in the past year there have been numerous applications of CNNs in astrophysics, including galaxy shape estimation(Ribli et al., 2019), supernovae detection (Reyes et al., 2018), and radio source morphology classification (Wu et al., 2019). Such progress strongly motivates the adaptation of CNNs in star cluster analysis. Furthermore, CNNs perform inference by processing all pixels of an image, which is beneficial for the parameter derivation task as demonstrated by Whitmore et al. (2011), who used pixel-to-pixel variations to infer cluster ages.
In Paper I (Bialopetravičius et al., 2019) we have implemented a CNN-based algorithm to simultaneously derive age, mass, and size of clusters in the low signal-to-noise regime. The algorithm was applied to M31 clusters, cataloged by The Panchromatic Hubble Andromeda Treasury (PHAT) survey. We have found that even when including information from all pixels and using accurate flux calibrations, interstellar extinction still plays a major role in influencing the results of parameter inference.
Numerous previous studies have explored physical parameter inference by taking into account the extinction problem, but were focused on the cases of resolved stellar or integrated cluster photometry. Among them are works by Bridžius et al. (2008), who used analytically integrated stellar luminosities, Fouesneau & Lançon (2010) and de Meulenaer et al. (2013, 2014), who used stochastically sampled stellar luminosities according to the stellar initial mass function (IMF), and SLUG, developed by Krumholz et al. (2015), which is one of the most mature codes in stochastic cluster population simulation and inference.
In this work we extend the CNN architecture proposed in Paper I to allow the inference of a cluster’s interstellar extinction directly from images. With an eye towards automated star cluster detection, we also explore indicators of cluster presence in images. The outputs of the network were modified to infer multiple cluster parameters jointly, which allows the degeneracies between them to be expressed in the outputs of the network, instead of relying on single-point estimates. This is especially useful when visualizing and dealing with age-extinction degeneracies.
We used the M83 galaxy HST survey (Blair et al., 2014), which covers the entire disk of this face-on galaxy in a number of passbands. This allows us to investigate the effects of extinction in a variety of dense and sparse environments. Previous studies of the M83 star cluster population were based on aperture photometry, such as Ryon et al. (2015) covering the whole galactic disk, Bastian et al. (2011) who studied a smaller part of the galaxy in detail, and Harris et al. (2001) covering its central region.
We trained the CNN on realistic mock observations and tested on mock clusters, as well as validated on the aforementioned real cluster catalogs.
We also experimented with the re-normalization of image fluxes for each passband separately when training the network, suggesting that precise photometric calibrations may not be necessary to derive star cluster parameters. This was done in the vein of Dieleman et al. (2015)
, where JPEG color images were used to classify galaxies, achieving reliable results. This brings the approach of analysis of astronomical images closer to the methods used on natural images, which rarely have accurate flux calibrations.
The paper is organized as follows. Section 2 provides details about the M83 survey data, the mock cluster bank construction, the added new parameters, and training data preparation. Section 3 describes the proposed CNN and its training methodology. Section 4 presents the results of testing the method on mock as well as validating on real M83 clusters previously studied using integral photometry. Section 5 discusses the CNN parameter inference results in an astrophysical context.
2.1 M83 mosaics
The M83 mosaic project data observed by the HST Wide Field Camera 3 (WFC3) (Blair et al., 2014; Dopita et al., 2010) was obtained from the Mikulski Archive for Space Telescopes111https://archive.stsci.edu/prepds/m83mos/. We use stacked, defect-free mosaic images of 7 WFC3 fields, which are calibrated photometrically (pixel values are in counts per second) and astrometrically (with available world coordinate system information). The details of image processing are provided by Blair et al. (2014).
The mosaics cover the whole extent of the galaxy, from the dense center to its sparse outskirts where stellar background contamination is low. For the analysis we selected wide passband images that cover the whole galaxy without gaps: F336W, F438W, and F814W. All three mosaics are of the same size and in a tangential projection with a common scale (0.04 arcsec/pixel).
In Paper I the M31 images were masked for saturated stars and extended objects in order to prevent unreliable CNN training. The distance to the M31 galaxy is 785 kpc (McConnachie et al., 2005), however the distance to M83 is 4.5 Mpc (Thim et al., 2003), therefore only a few saturated stars are visible. Because the area covered by extended objects in comparison to genuine stellar backgrounds is negligible, we decided to skip the masking step altogether and use all of the available mosaic area when selecting backgrounds for artificial clusters.
2.2 Mock cluster generation
Mock clusters were generated with different ages, masses, sizes, and affected by various levels of extinction. A fixed metallicity of (Hernandez et al., 2019) and standard extinction law with were assumed. To generate a cluster, its parameters were sampled independently of each other either from continuous (for mass and ) or discrete (for age and ) ranges. For a cluster to be included into the bank it also had to be brighter than a defined magnitude limit as discussed below to create suitable data for network training. We note, that we do not perform grid sampling of cluster parameters, where all possible permutations of their discrete values are combined, which would be computationally expensive, and, as further results show, not necessary for the network to learn cluster features based on a limited number of examples.
The age of each cluster was drawn with a uniform probability from the logarithmic range ofwith a step of 0.05 dex, which corresponds to 71 discrete ages in the isochrone bank. Mass for each cluster was drawn with a uniform probability from the logarithmic range of as a floating point number. The age and mass ranges were chosen in order to cover the majority of M83 clusters studied by Bastian et al. (2011). Extinction was drawn with a uniform probability from the range of mag with a step of 0.1 mag, which corresponds to 31 discrete extinctions in the isochrone bank. We define as the radius of a circle on the sky enclosing half of the stars of a cluster. The spatial distributions of stars were drawn from the Elson-Fall-Freeman (EFF) (Elson et al., 1987) profile:
The parameters and were drawn with a uniform probability from logarithmic ranges of and respectively as floating point numbers, such that is within the limits of arcsec. These values at the assumed distance of M83 (Thim et al., 2003, 4.5 Mpc) roughly correspond to real cluster sizes () in M83 (Bastian et al., 2011).
The stars of the clusters were generated as follows. Given the initial mass, , of a cluster, star masses were sampled according to the Kroupa (2001) IMF from Padova PARSEC isochrones222http://stev.oapd.inaf.it/cgi-bin/cmd (Bressan et al., 2012, release 1.2S), obtaining the absolute star magnitudes for passbands F336W, F438W, and F814W. Then, the absolute magnitudes were transformed to apparent magnitudes at the distance of M83 (Thim et al., 2003, 4.5 Mpc) and converted to the WFC3 camera counts per second for the three passbands using calibrations provided by Dressel (2012). Finally, the spatial 2D positions of stars were generated by sampling their distances from the cluster’s center according to the EFF profile (with given and values) and then distributing them symmetrically around the center.
The GalSim package (Rowe et al., 2015) was used to draw the individual stars of the clusters using TinyTim-generated333http://tinytim.stsci.edu/cgi-bin/tinytimweb.cgi point spread functions (PSFs) (Krist et al., 2011) for each of the three passbands. Every star in the cluster was drawn separately for each passband using the appropriate PSF scaled by the star’s flux in counts per second. For a single cluster this produces three images, which can then be visualized as either RGB pictures or given to a CNN as 3D (width height passband) arrays. Artificial clusters were then placed on backgrounds cut from the M83 mosaics. See Figs. 1 and 2 for examples of the generated mock clusters.
To explore the photometric properties of the cluster bank, we show integrated color-color and color-magnitude diagrams in Fig. 3. The magnitudes depicted were obtained solely from integrating the total flux of mock clusters and therefore are an idealized case, which does not take into account the variations of background and spatial positions of stars. The only source of stochastic effects in such a case is IMF sampling. Panels are dedicated to illustrate the influence of age, extinction, and mass present in the bank. The effects of these parameters are in different directions in the color-color and color-magnitude space. The oldest clusters are red (panel a) and low-luminosity (panel e) objects. Clusters with high extinction are reddened (panel b), and the lowest mass clusters are faintest (panel g).
The last column (Fig. 3, panels d and h) shows distributions of star clusters filtered by mass (as specified on the color bar on top) and by extinction mag. The simple stellar population (SSP) tracks centered on the specified masses are shown as black curves. In both, color-color and color-magnitude space, it can be seen that lower mass clusters are more widely distributed due to the stochastic IMF sampling. The effects of mass on cluster magnitude can be seen again in Fig. 3h as vertical shifts of the SSP tracks.
This means that a point in color-color and color-magnitude space can’t uniquely map to a point in cluster parameter space. This is worsened by stochastic IMF sampling effects and results in degeneracies with which any parameter inference method has to deal with. In cases like this any additional sources of information, such as individual image pixel values, are welcome.
Faint objects with mag, mag, and mag were not included in the final cluster bank due to their low signal, to mimic age/mass/extinction selection effects existing in magnitude limited real cluster samples. As adding these extremely faint clusters to real backgrounds would result in mock images that are below the detection limit, the CNN would be forced to learn a cluster’s parameters on what effectively is just a plain background image. Therefore, magnitude cuts applied are necessary to provide the CNN with a balanced dataset. For the F814W band this is illustrated by the shaded gray area in Fig. 3. See the lower-left corner of panel d in Figs. 1 and 2 for examples of such barely visible clusters.
2.3 Mock cluster properties
Samples of artificial clusters were generated with the described parameters and placed on real backgrounds of M83. In order to realistically model photon noise the following steps were applied. A cutout image of an M83 background from a random position in the mosaics is selected and its median value is determined. This median is then added to the image of an artificial cluster, multiplied by the exposure time to get photon counts, and then each pixel is sampled from a Poisson distribution, with its mean set to the value of the pixel. The median is then subtracted back from this image, the real background image is added and photon counts are transformed back to counts per second.
We also define a cluster parameter, constructing it to approximate signal-to-noise in such a way that higher values would be assigned to clusters that stand out relative to their stochastic stellar backgrounds. It is defined as follows:
where is the integral flux of the cluster within its , while
is the standard deviation of the background’s pixel values in a 25 pix (1 arcsec) radius aperture, andis the number of pixels within . Here is the cluster’s value increased to account for PSF size, which has the largest effect on the most compact clusters. A mock cluster with has mean flux per pixel approximately equal to the value of the standard deviation of a background it is placed on.
See Figs. 1 and 2 for a variety of values of clusters, displayed as yellow text in the corner of each image, with up to 1 mag. See Fig. 4 for samples of clusters with the full range of extinction ( up to 3 mag) used in this study to illustrate the effect of background crowding on . It can be seen that the values of correlate well with the ability to resolve clusters by eye – the best tool for cluster detection up to date.
Note, that for real clusters it is not possible to infer properties of background covered by cluster’s light, however by placing mock objects into backgrounds, we can compute parameter beforehand and train the network to infer it from the data of real observations.
2.4 Training data preparation
To minimize the influence of photometric image calibration accuracy, the counts per second of each passband of a cluster’s image were individually normalized to the mean of 0 and standard deviation of 1. They were then rescaled with the arcsinh function. The resulting images were pixels in size, which correspond to arcsec, or pc at the distance of M83 (Thim et al., 2003, 4.5 Mpc). Examples of the generated clusters with different ages, masses, and sizes, and without extinction, covering most of the parameter space, are shown in Fig. 1. A series of different examples (star position and mass sampling), but with extinction mag, are shown in Fig. 2. We generated 50,000 such images of mock clusters as a training sample for the CNN. The backgrounds have also been precomputed for efficiency resulting in 80,000 cutouts that were combined with the cluster images.
3 Convolutional Neural Network
Following the work in Paper I, the ResNet-50 (He et al., 2016) architecture was used as a basis for our CNN. In addition, a series of modifications were made to it in order to accommodate the different survey images, the higher number of predicted parameters, as well as the degeneracies between them. See Figs. 5 and 6 for details on the structure of the modified CNN.
In Paper I we used a method by Dieleman et al. (2015) to rotate the input image multiple times and pass it through the same convolutional layers; to simplify the network we omitted this step. The input image size was decreased to 6464 pixels to account for the smaller angular size of the clusters due to the more distant galaxy. Three input channels were used corresponding to the F336W, F438W, and F814W passbands.
In Paper I, the cluster’s parameters were predicted via linear output layers by treating it as a regression problem. This meant that each parameter was predicted independently. However, due to age/extinction degeneracies and age/extinction/mass selection effects (shown in Fig. 3) this approach is no longer viable.
Therefore, we predict all of the parameters on a grid, with the positions on it corresponding to the parameter values. This essentially transforms the regression problem into classification, allowing the network to predict each parameter in multiple locations of the parameter space, properly representing some degenerate cases such as low-extinction and old-age being just as likely as high-extinction and young-age.
The network’s output are 4 groups of layers branching out in parallel. The first group predicts age, extinction, and mass, the second – cluster size, the third – cluster/background class (), and the fourth – cluster (see bottom of Fig. 5).
Fig. 7 depicts the four output layer activations. We grouped age, extinction, and mass into a single output layer to allow the degeneracies between these parameters to be expressed in the network architecture itself. This was done by predicting them as activations on a 3D grid, with 20 bins for age, 10 for extinction, and 14 for mass. When flattened, this results in a softmax layer with neurons. For neurons were used to encode the likelihood of a cluster’s presence in the image. For the remaining parameters single-dimensional grids were used, resulting in neurons for size and for .
Each of the four groups of output parameters were represented as softmax activations:
where is the activations of a whole layer, and
specifies the index of a neuron (position on the parameter grid). The network was implemented with Keras444https://keras.io/
and TensorFlow555https://www.tensorflow.org/ packages.
3.2 Training and inference
When training the network, we wish to infer both, , which indicates the presence of a cluster, and the cluster’s astrophysical parameters, at the same time. To that end learning the parameter was modeled as a simple binary classification task. The network is trained on batches of 512 images, half of which are images of backgrounds, and the other half are images of backgrounds combined with clusters as described in Section 2.2. For the images with only background in them we set , while for the samples with clusters we set .
In addition, for background images we zero out the training loss gradients for all cluster parameters. In effect this causes gradient updates to only be derived from the and parameters, both of which are set to 0, indicating that the background contains no cluster. Training proceeds by sampling from M83 backgrounds (25,000 images) and the cluster bank (50,000 mock clusters) separately, combining the cluster and background images on the fly, effectively giving us over unique training samples.
The usual way to encode real-valued parameters as bins is called one-hot encoding. The parameter space is divided into bins and the bin at the position of the parameter’s value is set to 1. This array is then passed as a target vector,, for the network. One-hot encoding is ideal for categorical classification, where only one of the target bins is true at a time. However, for binned real-valued parameters this has the unfortunate side-effect of penalizing bins far away from the target just as much as bins nearby to it. The way we solve this is by inserting a Gaussian distribution centered on the true value of the parameter (see Fig. 7). For the case of and this is a simple 1D Gaussian, with a standard deviation equal to 0.5 the width of a bin. For age, mass, and a 3D Gaussian was used, with a standard deviation equal to 0.25 the width of a bin.
To obtain parameter estimates from this network we need a way to transform the network’s output activations back into single-point estimates that can then be analyzed. The 1D and 3D histograms, depicted in Fig. 7, need to be “unfolded”. This was done by finding the bin with the highest value in the histogram, which represents the most likely set of parameters inferred by the network, and calculating a weighted average within a radius of 3 bin widths. In effect this produces an output that is a real-valued single-point estimate in-between the bins instead of a discrete-valued one. Examples of inference results on mock and real clusters with both, the raw activation outputs and the derived single-point estimates, are shown in Figs. 9 and 21.
For computing the training gradients for the network the categorical cross-entropy loss function was used:
where is the neuron’s activation as described in Eq. 3, and
is the target output for the given training cluster image.
The Adam optimizer (Kingma & Ba, 2014) was used to calculate the gradients at each step of training. We experimented with various learning rates, starting from down to , with the learning rate decaying down to at the final iteration of training for all experiments. The learning rate of gave the best performance on the mock validation set, so this was the value used for the final training of the network. The best CNN model was selected by picking the training iteration during which the CNN’s loss was the lowest on the validation set. The training accuracy track of the parameter for the resulting model is shown in Fig. 8. This is the only parameter for which accuracy can be meaningfully calculated, because the other parameters are encoded as Gaussian distributions.
3.3 Output activations and stochastic effects
Three types of stochastic effects play a major role in the variation of CNN-inferred cluster parameters: 1) stellar mass sampling, 2) star spatial position randomization, and 3) background field.
We combine stellar mass and position sampling into one stochastic factor as a property of the cluster itself, while leaving the background choice as a property of its environment. We study both effects separately by: a) generating 100 different clusters with fixed parameters and placing them on the same background, and b) placing the same cluster on 100 different backgrounds.
Fig. 9 displays the influence of stochastic effects on the inference results of mock clusters. Left column shows clusters with , extinctions = 0.5, 1.5, 2.5 mag, and = 7.5, 8.5, 9.5. Right column shows clusters with mag, = 4.0, 4.5, 5.0, and the same ages. Cluster sizes are fixed at for all cases. The top row shows the results of inference when stellar IMF sampling and spatial positions are varied while holding the cluster parameters constant. The middle row shows the results of inference when background images are varied while using the same cluster image. The cyan circles correspond to the true values of parameters. The grayscale colormaps are raw CNN outputs for one specific case and magenta circles show 100 single-point estimates obtained for different random cases. The bottom row shows visualizations of the clusters with fixed background in the same format as Fig. 4. The parameter value is displayed on the bottom-left of each image. Note that the CNN predicts ages, masses, and extinctions as one 3D cube, while the outputs shown here are marginalized either over mass (left column) or extinction (right column).
In Fig. 9 it can be seen that the inference results for clusters with high visibility are all tightly packed for both types of stochastic effects (top and middle rows). This applies for both the spread of the CNN activation maps (grayscale) as well as the single-point estimates on different cluster images (magenta dots).
However, as clusters get fainter, and especially when they disappear into the background, the spread of activation maps (grayscale) as well as single-point estimates (magenta dots) gets wider. Background variability has a significantly larger influence on the spread of parameter estimates than stellar sampling effects.
It is worth noting that for old clusters CNN output activations are elongated, attempting to represent age/extinction degeneracies. For a small number of cases bimodal solutions are obtained. However, for cases where clusters are completely invisible both activations and single-point estimates can end up tightly packed. This highlights the importance of the parameter.
We note that less than 1% of the mock clusters show the bimodal distribution of activations. About 20% of the mock sample shows an extended unimodal distribution, while the rest of the results are symmetric and unimodal. Therefore, selecting the highest activation and obtaining single-point estimates from it is a viable approach, as that captures most of the information present in the CNN outputs.
For some clusters a systematic bias of the inference results can be observed, where the spread of activations would not sufficiently explain cluster misclassifications. This mainly occurs for the barely-visible clusters, where only cluster sampling is varied, implying that with a sufficiently difficult background and for a faint cluster, its parameter estimates may not be reliable and uncertainties may be underestimated.
However, Fig. 9 also illustrates the possibility to quantify the uncertainties of single inference results either from the extent of activation maps or by sampling random backgrounds, adding them to a cluster’s image, and re-running inference. The former can produce tightly packed (underestimated) activation maps for some high-uncertainty samples, making them unreliable for low-visibility scenarios. The latter can also introduce additional effects, depending on the used background sampling method, as well as the tendency to overestimate the uncertainties on real clusters, since the background effects would get doubled.
In subsequent sections the single-point estimates are analyzed in respect to inferred parameter accuracy and the age/extinction degeneracy.
4.1 Tests on mock clusters
To test the performance of the CNN, we built a separate bank of 5,000 artificial clusters. Their parameters were drawn from the same distributions as described in Section 2.2. The backgrounds for these mock clusters were also sampled from the used M83 mosaic, making sure that they are not the same as the backgrounds used for training. The inferred parameter values were obtained as described in Section 3.2.
Differences between CNN-derived single-point estimates of age, mass, extinction, and size vs. true parameters are shown in Fig. 10
. The spread of errors is visualized as a hexagonal density map with the count bins scaled logarithmically in order to highlight the spread of outliers. Dashed lines represent the error bounds containing 95% of the inference results for each parameter. Note that because of magnitude cuts introduced in the mock cluster bank, discussed in Section2.2, the parameter distributions aren’t uniform. For example, there are relatively less low-mass old-age clusters. In all of the panels, the clusters that are classified as much younger than the true given values are shown as red points, while the clusters classified as much older are highlighted as blue.
Fig. 10a shows no significant difference between the true and derived age values for and the distribution for all ages is symmetrical along the diagonal. The 95% of all inference results deviate 0.9 dex from the true values, as shown by the dashed lines. Starting at and above a large scatter in both directions – towards older and younger ages can be seen.
Fig. 10c shows the true and derived values. The 95% of all inference results deviate 1.4 dex from the true values, as shown by the dashed lines. The highlighted blue and red clusters are classified as having significantly higher and lower extinction respectively. This can be explained by the age-extinction degeneracy, as older clusters with low extinction are hard to distinguish from younger clusters with high extinction, and vice-versa, when using only three photometric passbands.
In Fig. 3a-b the age-extinction degeneracy can be seen in the lower S-shaped part () of the color-color distribution of clusters. Clusters older than with high extinction can be located in the same color-color area as clusters with low extinction. These effects have also been observed when using analytically integrated stellar luminosities (Bridžius et al., 2008) and remain when stochastic effects of IMF sampling are included (de Meulenaer et al., 2014).
Fig. 10b shows the true and derived mass values. Overall no systematic effects can be seen. The 95% of all inference results deviate 0.4 dex from the true values, as shown by the dashed lines.
Fig. 10d shows the true and derived values. No systematic effects can be seen. The 95% of all inference results deviate 0.2 dex from the true values, as shown by the dashed lines. However, for the smallest clusters the error spread is as low as 0.1 dex, while for the largest clusters the error spread goes up to 0.2 dex. This can be explained by the clusters with higher having lower signal-to-noise, as their stars are spread out over a larger area in space.
Although in Fig. 10b due to the age-extinction degeneracy we observe underestimated and overestimated cluster masses, size errors shown in panel d show no such bias. This can be explained by mass being a function of a cluster’s magnitude as can be seen in Fig. 3g, which makes the network mispredict its value if age and extinction are also mispredicted. However, size has no impact on cluster magnitude or color.
As we use images normalized in a passband-independent manner, the influence of calibration accuracy to our method was also explored. Fig. 11 shows results obtained on the same dataset as Fig. 10, only with the CNN trained on images with background fluxes that were varied from image to image. The flux scaling factor was sampled independently for each passband as a Gaussian with a mean of 1 and a standard deviation of 0.2. After multiplying the background image flux by this factor the cluster images were added and the final images normalized as usual. This encourages the network to learn parameter inference regardless of whether the calibrations for backgrounds match mock clusters well. As can be seen when comparing Figs. 10 and 11, the inference results are very similar, only with the error spread increasing for each parameter by about 10%. This implies that accurate calibrations, while still associated with slightly more precise results, are not essential for a CNN to derive cluster parameters.
Fig. 12 shows the derived and values for the 5,000 test mock clusters, as well as a random sample of 5,000 M83 background images. As can be seen in the histogram on top, the parameter is predicted as for the vast majority of mock cluster images, and as for the majority of background images. This suggests that the fraction of background images that are classified as are likely to correspond to real clusters. The parameter is highly correlated with , again showing high values for the majority of mock clusters and low values for the majority of backgrounds. The few remaining mock clusters with have very low values, which indicates faint, nearly invisible objects seen in Fig. 4.
Fig. 13 illustrates selection effects by showing the derived age, extinction, mass, and size parameters of the test mock clusters, with the color bar representing the derived parameter value for each cluster. In Fig. 13a it can be seen that mass and age are correlated as expected when deriving the parameter: clusters with lower mass and older ages tend to be less visible (this can also be seen in Figs. 1 and 2). The same is true for extinction (panels b and d), as higher extinctions tend to make cluster less visible, and size (panels c, d, and f), as more concentrated clusters stand out relative to their backgrounds.
Even though the cluster-related parameter inference results for background images have no inherit meaning, the CNN produces values for all of its output neurons regardless. Looking at these values can provide us with additional insights. For example, we would expect backgrounds to be classified as low-mass extended objects. Fig. 14 shows the derived parameters for the background images from Fig. 12, with dot size and color indicating . Black dots are images with close to 0, while red circles are images with close to 1. As can be seen in Fig. 14e, the vast majority of the backgrounds are classified as low-mass extended objects as expected, with some probable cluster images being spread out more evenly through the parameter space. The derived age values of these images are spread out through the whole age range (panels a, b, and c), however extinctions are heavily correlated with ages as seen in panel b. As the network is trained to predict extinction and age values regardless of what the cluster’s background looks like, there is no intuitive value that should be predicted for background images in this case. In effect the CNN avoids areas of age-extinction parameter space where the appearance of an observed object is either extremely blue (high-extinction high-age) or extremely red (low-extinction low-age), which can only be associated with genuine clusters, resulting in this diagonal effect.
The parameter was shown to be usable in differentiating between cluster and background images, while the parameter is correlated well with those cluster parameter ranges, which can show more confidently identified clusters. We conclude that these parameters can be useful indicators in star cluster search application.
4.2 Validation with cataloged clusters
To validate our method on real clusters we used three previous M83 HST star cluster studies which had published catalogs. This includes the study covering the whole galactic disk (7 WFC3 fields) by Ryon et al. (2015, R15), two WFC3 fields by Bastian et al. (2011, B11) and the galaxies central region by Harris et al. (2001, H01).
The study by Bastian et al. (2011) is comprised of 939 objects. We discarded objects with missing parameter values, leaving us with 889 of them to compare to the CNN inference results. Bastian et al. (2011) estimated the cluster age, mass, and extinction by comparing the integral photometry of the observed clusters to SSP models. Meanwhile, the sizes of clusters were estimated by fitting spatial models to F438W, F555W, and F814W band images. For this comparison we took the median value of these three size estimates. As the cluster magnitudes used by Bastian et al. (2011) were Galactic extinction corrected, we shift the values of those objects by 0.3 mag666https://irsa.ipac.caltech.edu/applications/DUST/. This was done so that we could compare CNN derived values directly, because we compute total extinctions for clusters regardless of the dust source.
Figs. 15 and 16 show a comparison between Bastian et al. (2011) and CNN-derived values. In Fig. 15 the red and blue dots represent clusters with significantly overestimated and underestimated extinction values respectively. They were defined as clusters that are outside the dashed lines in panel c, which represent the area containing 95% of mock cluster parameter derivations. This mirrors the situation with mock objects in Fig. 10, as the majority of clusters with overestimated extinctions end up with underestimated ages, and vice-versa for clusters with underestimated extinction values. These effects can again be attributed to the age-extinction degeneracy. In Fig. 16 the green dots represent images classified by the network as likely to be real clusters (), while the magenta dots are objects with . The vast majority of the objects are classified as likely clusters.
Overall the derived ages and masses show a reasonable correlation between Bastian et al. (2011) and CNN-derived values. Many of the objects have cataloged mag values (shown as mag in the figures, accounting for Galactic extinction). The CNN derives higher extinctions for some of these clusters, however, visual inspection has revealed that Galactic dust is unlikely to be the only source of extinction for the majority of them. The sizes show a good agreement for most of the objects, however, there is a subset of objects with somewhat overestimated values.
For the comparison with Ryon et al. (2015) we used 478 objects which had sizes obtained by a 2D spatial model fitting as well as age and mass estimates derived using spectral energy distribution fitting. We also took 45 objects from Harris et al. (2001) with their age, mass, and extinction estimates obtained by comparing the cluster photometry to theoretical population synthesis models. Fig. 17 shows our results compared against both of these catalogs. Ryon et al. (2015) objects are denoted as green dots with the parameter error bounds marked with black lines. Harris et al. (2001) objects are marked as large cyan circles. For both of these catalogs a reasonable agreement with the CNN-derived values can be seen, with only masses being slightly overestimated. However, there’s some age estimate divergence over , which is similar to the situation in Fig. 15.
We have shown that the CNN is capable of deriving cluster parameters on real clusters by comparing our results with those of other authors. The agreements between the values are reasonable and follow the results obtained with mock clusters. However, due to the age-extinction degeneracy with the used 3 passbands, the results with clusters older than are ambiguous and should be interpreted carefully.
We have shown the applicability of a CNN-based method in deriving a variety of star cluster parameters from M83 mosaic images in terms of quantitative error analysis. However, the final aim for this method is to be of use in star cluster search and automatic catalog construction. To this end, a better look into the derived parameters is needed both in terms of each other, and their context in the galaxy. In this chapter we look at derived values of the Bastian et al. (2011) sample of objects in more detail.
Fig. 18 shows the inferred age, extinction, mass, and size parameters of the Bastian et al. (2011) object sample. The objects are colored as in Fig. 16, with mock results shown in the background. The clusters cover the whole parameter range well, with samples being classified as expected: as low-mass objects (panel a). The minimal extinction line, with a large number of clusters around it, seen in panels b, d, and f, coincides with mag, expected due to Galactic dust foreground in the direction of M83. Lines of constant density are shown in panel e. The majority of the objects fall within 10 and , which is consistent with results for clusters of the M31 galaxy (Vansevičius et al., 2009).
Fig. 19 shows Bastian et al. (2011) objects marked on two fields of the M83 mosaic. Objects of are marked as blue circles in panel a, objects are marked as orange circles in panel b, and objects are marked as red circles in panel c. Panel d shows all of the objects marked as dots, with mag colored cyan, and mag colored magenta. The spatial distribution of objects is sensible, with young star clusters grouping around the galaxies spiral arms, near the dust clouds where they were formed, and old clusters spread out more evenly throughout the galaxy, as they had more time to drift away. The extinction distributions are less clear-cut, however some crowding around dust-heavy regions can be seen by the high-extinction objects, as is expected. The spatial distributions of age-selected clusters in Fig. 19 correspond well to the results obtained by Fouesneau et al. (2012) using UBVIH fluxes to measure ages, masses, and extinctions in the central region of M83. Sánchez-Gil et al. (2019) has derived age maps for the M83 galaxy’s stellar populations younger than 20 Myr, which corresponds to the lower age range of clusters in this study.
Although we studied clusters with masses , this does not imply that only such clusters are detectable with the HST/WFC3 observations of M83. In fact, clusters with masses as low as have been studied by Whitmore et al. (2011) and Andrews et al. (2014). However, such clusters are dominated by stochastic effects of IMF sampling making the analysis of the effects of extinction problematic. The lower-limit of masses was selected to focus on the effects of extinction as well as to align with the range of clusters used by Bastian et al. (2011). The presented CNN classifies lower mass clusters as being on the lower-limit of this range.
Fig. 20 shows the binned mass distributions obtained with the CNN. The gray outline shows all of the cluster distributions, with blue dots representing clusters of (panel a), and (panel b). The red lines represent Schechter type mass functions (Portegies Zwart et al., 2010) with various amounts of truncation. The solid red line follows the non-truncated power law , the dashed red line follows , and the dotted red line follows . The power law distributions fit the data well for both of the age cuts, however, there is a lack of low-mass clusters () for the mid-age data sample. This is due to selection effects, with less star clusters being detectable at those ages (see Fig. 18a). Similar cluster mass distributions and selection effects have been found in M31 (Vansevičius et al., 2009) and M33 (de Meulenaer et al., 2015) star cluster samples.
Fig. 21 show examples of inference results on 3 distinct Bastian et al. (2011) clusters chosen to illustrate the variety of CNN outputs (previously sketched in Fig. 7). The top row shows a young, low-mass cluster. The inferred age and mass matches Bastian et al. (2011) parameters well. Extinction is derived to be slightly higher, however the value is very close when Galactic extinction is accounted for. The parameter is derived to be 15, which corresponds well to similarly looking clusters in Fig. 2b.
In the middle, a cluster of , medium extinction and high mass is shown. The age, mass, and size correspond well to the values derived by Bastian et al. (2011). Extinction is derived as slightly higher, however it’s still within the range of CNN’s activations. The cluster is classified as brighter by the CNN, with .
On the bottom an older cluster is depicted. Its mass and size estimates correspond well, however extinction is overestimated in comparison to Bastian et al. (2011). Furthermore, the neuron activations show a diagonal pattern highlighting the age-extinction degeneracy which is hard to resolve with the used 3 passbands. However, the higher-extinction results are more likely as a significant amount of the field seen in the leftmost panel appears reddened, which suggest the presence of dust obscuring the cluster.
As detailed in Section 3.3, a correlation between the spread of CNN output activations and scatter of inferred cluster parameters is noted, therefore, activation maps can be used to estimate cluster parameter uncertainties. We checked that less than 1% of the clusters show the bimodal distribution activation distributions and about 20% of the samples show an extended unimodal distribution (see Fig. 9
for examples). The rest of the results are unimodal. Therefore, selecting the highest activation and interpolating it is a viable approach to provide inferred parameter estimates. However, there are some cases where there is a systematic bias in the derived results. This usually occurs for the nearly invisible clusters as well as clusters with high extinctions and older ages, where age-extinction degeneracies make inference unreliable with the used photometric passbands. This means that the extent of activation maps alone, while informative, is not a reliable uncertainty estimate in all cases. Additional insights on the reliability of inference results can be gained by performing the random background sampling test described in Section3.3.
These results further validate the applicability of the CNN in deriving the parameters of star clusters in realistic scenarios. In addition, the and parameters act as accurate proxies for cluster presence in images. Utilizing this method for constructing a full catalog of M83 clusters is left for the subsequent paper in the series.
We have extended the method introduced in Paper I to infer cluster ages, masses, sizes, extinctions, as well as to account for the degeneracies between them. Additional parameters were added for identifying the presence of clusters on background images of M83, and judging their visibility (signal-to-noise).
To train this network a bank of mock clusters was generated utilizing three photometric passbands in the context of the M83 galaxy. The CNN was verified on mock images of artificial clusters with ages, , between 6.6 and 10.1, masses, , between 3.5 and 5.5, sizes between 0.04 and 0.4 arcsec, and extinctions mag. Parameters derived by CNN have shown a good agreement with the true parameters for , with higher age estimates being unreliable due to the age-extinction degeneracy.
Real cluster parameter inference tests were performed with three different M83 cluster catalogs from Bastian et al. (2011), Ryon et al. (2015), and Harris et al. (2001) and have shown consistent results.
We have demonstrated that a CNN can perform evolutionary (age, mass), structural (size), and environmental (extinction) star cluster parameter inference. In addition, the network is capable of giving an indication of cluster presence in images. Therefore, the created CNN is a useful tool for further research in constructing a full pipeline of star cluster detection and parameter inference.
Acknowledgements.This research was funded by a grant (No. LAT-09/2016) from the Research Council of Lithuania. This research made use of Astropy, a community-developed core Python package for Astronomy (Astropy Collaboration, 2018). Some of the data presented in this paper were obtained from the Mikulski Archive for Space Telescopes (MAST). STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS5-26555. We are thankful to the anonymous referee who helped improve the paper.
- Andrews et al. (2014) Andrews, J. E., Calzetti, D., Chandar, R., et al. 2014, ApJ, 793, 4
- Bastian et al. (2011) Bastian, N., Adamo, A., Gieles, M., et al. 2011, MNRAS, 417, L6
- Bialopetravičius et al. (2019) Bialopetravičius, J., Narbutis, D., & Vansevičius, V. 2019, A&A, 621, A103
- Blair et al. (2014) Blair, W. P., Chandar, R., Dopita, M. A., et al. 2014, ApJ, 788, 55
- Bressan et al. (2012) Bressan, A., Marigo, P., Girardi, L., et al. 2012, MNRAS, 427, 127
- Bridžius et al. (2008) Bridžius, A., Narbutis, D., Stonkutė, R., Deveikis, V., & Vansevičius, V. 2008, Baltic Astronomy, 17, 337
- de Meulenaer et al. (2013) de Meulenaer, P., Narbutis, D., Mineikis, T., & Vansevičius, V. 2013, A&A, 550, A20
- de Meulenaer et al. (2014) de Meulenaer, P., Narbutis, D., Mineikis, T., & Vansevičius, V. 2014, A&A, 569, A4
- de Meulenaer et al. (2015) de Meulenaer, P., Narbutis, D., Mineikis, T., & Vansevičius, V. 2015, A&A, 581, A111
- Dieleman et al. (2015) Dieleman, S., Willett, K. W., & Dambre, J. 2015, MNRAS, 450, 1441
- Dopita et al. (2010) Dopita, M. A., Blair, W. P., Long, K. S., et al. 2010, ApJ, 710, 964
- Dressel (2012) Dressel, L. 2012, Wide Field Camera 3 Instrument Handbook for Cycle 21 v. 5.0
- Elson et al. (1987) Elson, R. A. W., Fall, S. M., & Freeman, K. C. 1987, ApJ, 323, 54
- Fouesneau & Lançon (2010) Fouesneau, M. & Lançon, A. 2010, A&A, 521, A22
- Fouesneau et al. (2012) Fouesneau, M., Lançon, A., Chandar, R., & Whitmore, B. C. 2012, ApJ, 750, 60
- Harris et al. (2001) Harris, J., Calzetti, D., III, J. S. G., Conselice, C. J., & Smith, D. A. 2001, The Astronomical Journal, 122, 3046
He et al. (2016)
He, K., Zhang, X., Ren, S., & Sun, J. 2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770
- Hernandez et al. (2019) Hernandez, S., Larsen, S., Aloisi, A., et al. 2019, ApJ, 872, 116
- Kingma & Ba (2014) Kingma, D. P. & Ba, J. 2014, ArXiv e-prints [[arXiv]1412.6980]
- Krist et al. (2011) Krist, J. E., Hook, R. N., & Stoehr, F. 2011, in Proc. SPIE, Vol. 8127, Optical Modeling and Performance Predictions V, 81270J
- Kroupa (2001) Kroupa, P. 2001, MNRAS, 322, 231
- Krumholz et al. (2015) Krumholz, M. R., Fumagalli, M., da Silva, R. L., Rendahl, T., & Parra, J. 2015, MNRAS, 452, 1447
- McConnachie et al. (2005) McConnachie, A. W., Irwin, M. J., Ferguson, A. M. N., et al. 2005, MNRAS, 356, 979
- Portegies Zwart et al. (2010) Portegies Zwart, S. F., McMillan, S. L. W., & Gieles, M. 2010, ARA&A, 48, 431
- Reyes et al. (2018) Reyes, E., Estévez, P. A., Reyes, I., et al. 2018, 2018 International Joint Conference on Neural Networks (IJCNN), 1
- Ribli et al. (2019) Ribli, D., Dobos, L., & Csabai, I. 2019, arXiv e-prints, arXiv:1902.08161
- Rowe et al. (2015) Rowe, B. T. P., Jarvis, M., Mandelbaum, R., et al. 2015, Astronomy and Computing, 10, 121
- Russakovsky et al. (2015) Russakovsky, O., Deng, J., Su, H., et al. 2015, International Journal of Computer Vision, 115, 211
- Ryon et al. (2015) Ryon, J. E., Bastian, N., Adamo, A., et al. 2015, MNRAS, 452, 525
- Sánchez-Gil et al. (2019) Sánchez-Gil, M. C., Alfaro, E. J., Cerviño, M., et al. 2019, MNRAS, 483, 2641
- Thim et al. (2003) Thim, F., Tammann, G. A., Saha, A., et al. 2003, ApJ, 590, 256
- Vansevičius et al. (2009) Vansevičius, V., Kodaira, K., Narbutis, D., et al. 2009, ApJ, 703, 1872
- Whitmore et al. (2011) Whitmore, B. C., Chandar, R., Kim, H., et al. 2011, ApJ, 729, 78
- Wu et al. (2019) Wu, C., Wong, O. I., Rudnick, L., et al. 2019, MNRAS, 482, 1211