1 Cloud Classification is key for modelling Climate Change
Clouds play a crucial role in the climate system. They are the source of all precipitation and have a significant impact on the Earth’s radiative budget. Crucially, as any changes in clouds impact the environment; these changes feedback on cloud formation and behaviour. These feedbacks are a primary source of uncertainty for climate model projections (Knutti et al., 2017; Rotstayn and Collier, 2015; Stephens, 2005), as there is a limited understanding of the mechanisms and relationships between clouds, climate and global circulation (Bony et al., 2015). It is for example unclear how warmer sea surface temperature will affect clouds and convective organization (Bony et al., 2015), or trigger possible climate transitions by cloud breakup (Schneider et al., 2019). Comprehensive use of the vast observational data available is crucial to improve our understanding of these processes, and their representations in climate models.
Clouds can form and develop through several different pathways, depending on their environment and the convective energy available. It is common to categorise clouds into different types based on their properties to better analyse them. The International Satellite Cloud Climatology Project (ISCCP) dataset (Rossow and Schiffer, 1991) provides a global classification of clouds at a 10km resolution, based on a network of geostationary meteorological satellites. Satellite based observations of clouds can be made using either passive imagery or active radar instruments. While high-resolution hyperspectral imagery is available from both polar orbiting, and geostationary satellites, providing excellent coverage at high temporal resolution, specific cloud properties (such as their exact height and droplet size distribution) must be inferred indirectly. The ISCCP classification relies on a simple assessment of the relationship between the clouds’ inferred height and optical thickness (Rossow and Schiffer, 1991). Conversely, the CloudSat cloud radar does provide direct measurements of clouds and their properties. This comes with a drawback, as it operates with a narrow swath and and repeat a cycle every 16 days (no global coverage at 1km resolution is present even after 16 days).
To overcome these limitations, in this paper we introduce Cumulo, a new dataset which combines the global 1km-resolution imagery of the Moderate Resolution Imaging Spectroradiometer (MODIS) with the accurately measured properties of the CloudSat products. It contains one year of 1354 x 2030 pixel hyperspectral images from MODIS combined with pixel-width ‘tracks’ of cloud labels from Cloudsat, corresponding to the eight World Meteorological Organization (WMO) genera (Fig. 1). While both datasets are publicly available, the extraction, cleaning and alignment of the data required specialist domain knowledge and extensive compute resources.
We apply a deep generative model architecture on one month of Cumulo, and present, for the first time to our knowledge, global high resolution spatiotemporal cloud classification derived from a combination of active and passive satellite sensors. We show that our results are physically reasonable in terms of locations of occurrences of the given classes and liquid water path distributions.
Related Work. Muhlbauer et al. (2014)
classify MODIS into three types of mesoscale cellular convection using a 3-layer neural network. While these classifications are global, they only describe a particular climatology.Zhang et al. (2019)
classify the geostationary Himawari-8 satellite into WMO cloud classes (from Cloudsat) using a random forest. Alternatively, our dataset provides global coverage (versus East Asia and Western Pacific for Himawari-8).Rasp et al. (2019) crowd-sourced human-level classifications of shallow trade clouds into four types: ‘sugar’, ‘flower’, ‘fish’, ‘gravel’. They evaluate an object detection method Lin et al. (2017) and a semantic segmentation method Ronneberger et al. (2015) to classify clouds into these types. Similar to Muhlbauer et al. (2014), this work only aims at a small aspect of cloud variability.
2 Cumulo: A global dataset for cloud classification
The proposed dataset contains 105,120 geolocated and hyperspectral images and provides a combination of channels from different sources (see Table 1): the selected radiance channels from MODIS AQUA Calibrated Radiances fully capture the physical properties needed for cloud classification and are meant to be used for training; MODIS AQUA Cloud Product channels are retrieved features describing cloud physical properties useful for validation; MODIS AQUA Cloud Mask detects the presence of a cloud; 2B-CLDCLASS-LIDAR provides the types of clouds spotted at different heights along the track of the satellites. Possible cloud types, corresponding to the eight WMO genera, are stratus (St), stratocumulus (Sc), cumulus (Cu, including cumulus congestus), nimbostratus (Ns), altocumulus (Ac), altostratus (As), deep convective (cumulonimbus, Dc), and high (Ci, cirrus and cirrostratus). Refer to Table 2 for a description of the classes.
Notice that some channels are available only at daytime, because they rely directly or indirectly on daylight radiances. In general, missing values due to artefacts were filled with the nearest (in time and space) available values.
|MODIS||shortwave visible (red)||1||land/shadow/cloud/aerosols boundaries||daytime|
|shortwave near infrared||2||land/shadow/cloud/aerosols boundaries||daytime|
|longwave thermal-infrared||20-23||surface/cloud temperature||always|
|shortwave near infrared||26||Cirrus clouds water vapor||daytime|
|longwave thermal-infrared||27||water vapor||always|
|longwave thermal-infrared||29||cloud properties||always|
|longwave thermal-infrared||33-36||cloud top altitude||always|
|MODIS Cloud Mask||cloud mask||cloud detection||always|
|2B-CLDCLASS-LIDAR||cloud layer type||target classes||always|
|MODIS Cloud Product||liquid water path||physical validation||always|
|cloud optical thickness||physical validation||daytime|
|cloud effective radius||physical validation||daytime|
|cloud particle phase||physical validation||daytime|
|cloud top pressure||physical validation||always|
|cloud top height||physical validation||always|
|cloud top temperature||physical validation||always|
|cloud effective emissivity||physical validation||always|
|surface temperature||physical validation||always|
|thickness||base height||liquid water path||rain|
|0||Cirrus and cirrostratus (Ci)||moderate||7.0 km||0.||none|
|1||Altostratus (As)||moderate||2.0-7.0 km||0.||none|
|2||Altocumulus (Ac)||shallow/moderate||2.0-7.0 km||0.||virga possible|
|3||Stratus (St)||shallow||0-2.0 km||0.||none/slight|
|4||Stratocumulus (Sc)||shallow||0.-2.0 km||0.||drizzle/snow possible|
|5||Cumulus (Cu)||shallow/moderate||0-3.0 km||0.||drizzle/snow possible|
|6||Nimbostratus (Ns)||thick||0-4.0 km||0.||prolonged rain/snow|
|7||Deep Convection (Dc)||thick||0-3.0 km||0.||intense rain/hail|
More precisely, each satellite image (swath) is acquired at a given time (one swath every five minutes) and at a given location (each pixel is associated with a latitude-longitude pair): . In the following, we denote by
the thirteen training channels coming from MODIS AQUA Calibrated Radiances 111https://modis.gsfc.nasa.gov/data/dataprod/mod02.php;
the eight validation channels coming from MODIS AQUA Cloud Product 222https://modis.gsfc.nasa.gov/data/dataprod/mod06.php which provide physical and radiative cloud properties obtained by combining infrared emission and solar reflectance techniques applied on MODIS original bands;
the cloud mask derived from MODIS Cloud Mask 333https://modis.gsfc.nasa.gov/data/dataprod/mod35.php, marking as the certainly cloudy pixels and as any other pixel;
the label mask derived from 2B-CLDCLASS-LIDAR 444http://www.cloudsat.cira.colostate.edu/data-products/level-2b/2b-cldclass-lidar, indicating for each pixel the number of occurrences of the cloud classes (refer to Table 2) that can be identified at different heights ( layers maximum).
Overall, Cumulo provides a comprehensive set of features both for identifying cloud types and for validating any finding, along with ready-to-use and accurate cloud annotations at high spatial resolution. From a Machine Learning perspective, Cumulo presents several challenges: supervision is available only for 1 every 1354 pixels (weakly-labelled data), pixels can be annotated with multiple types of clouds (multi-labelled data), and many cloud classes, such as deep convective clouds, are underrepresented (class imbalance).
3 Applying Cloud Classification globally
In this section, we provide baseline performance analysis of one of the tasks that can be performed using Cumulo: semi-supervised classification of clouds at a global and daily scale.
Given the small number of labels, we find that performing this experiment at a pixel level allows us to achieve better classification performance than applying common semantic segmentation models (such as Papandreou et al. ; Li et al. (2018); Ronneberger et al. (2015)
) that necessitate full label masks or annotations at the instance or image level. We consider tiles of 3x3 pixels extracted from the first month of data (January 2008) and predict a label for each tile. We use the most frequent cloud type identified within each tile as a target, when annotations are available. We have gathered two sets of tiles: a set of labelled tiles sampled around any annotated pixel on the satellite’s track; and a sample of unlabelled tiles randomly selected from the non-annotated regions. We have made use of the available cloud mask for restraining the classification to tiles that had a high probability of cloud cover. We have trained a hybrid Invertible Residual Network(Nalisnick et al., 2019) which allows us to (i) harness both labelled and unlabelled sets and (ii) learn a representation where the class distributions can be further subdivided into fine-grained classes. Indeed, cloud types are not limited to the well-studied WMO genera: being able to identify more species of clouds is an open-question in the cloud community. Our model combines a deep generative flownet with a linear classifier, which is simultaneously trained by maximizing the joint log-likelihood over the tiles and their labels 555label entropy is minimised instead of the cross-entropy whenever supervision is not available.. Classes are weighted in the objective to encourage a better classification of under-represented classes.
3.2 Formal problem setting
We segment each tensorinto non-overlapping tiles of size 3x3 pixels, e.g.
and aim at learning a mapping from any tile to a class label . As target values, we retain the most frequent cloud type occurring within the label mask associated to a tile.
For learning, we recall that we make use of two sets of tiles of equal size : a set of labeled tiles , sampled around any annotated pixel on the satellite’s track, and a sample of unlabeled tiles , randomly selected from the non-annotated regions. We deploy a hybrid Invertible Residual Network (Nalisnick et al., 2019), parameterised by , which allows us to learn a latent representation for modelling the true distribution of the data by the decomposition
where is the latent representation of and its prior distribution.
The peculiarity of flownets, such as the hybrid IResNet, is that they consist of a series of mappings that are invertible for any input : . Crucially, there is a direct and invertible relationship between each image and each latent point :
Thanks to these constraints on the architecture of the network, it is possible to optimise by maximising the joint likelihood of the swaths and their target label masks , rewritten as:
where the last two terms are given by the change of variable formula. As supervision is not provided for all the tiles, in Eq. (6
) can be estimated only for tiles from the labeled set. For the tiles of the unlabeled set , the label entropy is minimised instead, to promote sharp predictions. The overall objective function takes the following form, by equivalently maximizing the log likelihood:
We randomly split the labelled tiles into training (70%), validation (10%) and test (20%) sets. We report test classification accuracies, F1 score and Intersection over Union index, per class and on average, in Table 3 for the model with the best mean accuracy on the validation set. Figure 2 shows the predictions obtained over one day of images and the occurrences (gridded at latitude longitude) of three predicted classes over the month. Predicted classes (Fig. 2a) are spatially contiguous across swaths (this is not a constraint of our algorithm). The occurrences appear spatially coherent with Sc clouds occurring mostly over upwelling regions of the major oceans (Fig. 2c); Dc clouds are more confined to equatorial regions (Fig. 2d); and Ci (high) clouds more globally widespread (Fig. 2b). The highest occurrence of Ci clouds appears roughly over the inter-tropical convergence zone and is spatially correlated with Dc clouds, in agreement with Mace et al. (2006). Most interestingly, the heatmaps show great spatial similarities with the ones reported by Sassen et al. (2008b, a, 2009), where the authors studied occurrences of cloud classes labeled by CloudSat over a period of one Sassen et al. (2008b, a) and two years Sassen et al. (2009). All occurrences are shown in Fig. 3.
As an additional physical-based evaluation we consider the distributions of the liquid water path (LWP) and cloud optical thickness (COT) variables for the predicted classes and the ground truth given by CloudSat 666Note that both LWP and COT features were not used for training.. LWP defines the total amount of liquid water present in the whole atmospheric column on a given point. COT is a measure of the thickness between the bottom and top of a cloud. Qualitatively, differences between the predicted distributions and the ground truth are minimal for both variables. Results are shown in Figure 4. In Table 4
we also provide a quantitative comparison between predicted distributions and CloudSat ones by means of both the Kullback–-Leibler divergence and the Wasserstein distance. We note that the class ”Deep Convection” is the one associated with largest accuracy () despite having relatively large differences with the ground truth values in both the LWP and COT variables (see Table 4).
In general, as the available supervision is minimal, we argue that it is inadequate to gauge the quality of a method considering exclusively its accuracy on testing samples. Therefore, a physical-based evaluation is more tailored to cloud classification studies, it does not suffer from the minimal supervision and it should always complement the more basic metrics discussed in Table 3.
In this work we first proposed Cumulo, a new benchmark dataset for training and evaluating global cloud classification models. It consists of one year of 1km resolution MODIS hyperspectral images merged with pixel-width tracks of CloudSat labels. We think that this is an important step for engaging the Machine-Learning community to develop innovative methods and solutions to climate related problems. In particular, the proposed dataset presents several important challenges: (a) labels make up for less than of the data (weakly-labelled data), (b) a pixel can have multiple labels (multi-labelled data), (c) certain type of clouds (i.e., deep convection) are underrepresented (class imbalance). Moreover, within a single cloud class we can still distinguish a rich variety of cloud organizations at the mesoscale (from 5 to several hundred kilometers) and slightly different physical properties, implying the existence of sub-classes for each given class. Proposing novel unsupervised models that directly discover fine-grained classes with only access to the observed coarse labels could be an important new line of research for both the Climate and Machine-Learning community.
We also provided a first high resolution spatiotemporal cloud classification baseline performance on Cumulo. To complement the standard ML prediction scores, we made use of the validation channels of Cumulo to analyse the results from a physical perspective. First, the occurrences over the analyzed month are found to be qualitative similar to previous studies. Second, we found that our baseline results are physically reasonable in terms of the liquid water path and cloud optical thickness distributions of the predicted classes. The reported analysis is quantitative and physical, but limited to January 2008. Since CloudSat needs 16 days to complete a cycle, we leave a rigorous comparison of the (predicted) monthly occurrences for future studies, using 1 year of classification.
The code used for extracting Cumulo is hosted at https://github.com/FrontierDevelopmentLab/CUMULO. The dataset will soon be made publicly available. In the meantime, please contact us for any request.
This work is the result of the 2019 ESA Frontier Development Lab 777https://fdleurope.org/ (FDL) Atmospheric Phenomena and Climate Variability challenge. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work. Finally, we thank Yarin Gal for helpful discussions and Sylvester Kaczmarek for his help and support in coordinating the work.
- Clouds, circulation and climate sensitivity. Nature Geoscience 8 (4), pp. 261. Cited by: §1.
- Beyond equilibrium climate sensitivity. Nature Geoscience 10, pp. 727–736. Cited by: §1.
Weakly-and semi-supervised panoptic segmentation.
Proceedings of the European Conference on Computer Vision (ECCV), pp. 102–118. Cited by: §3.1.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §1.
- Association of tropical cirrus in the 10–15-km layer with deep convective sources: an observational study combining millimeter radar data and satellite-derived trajectories. J. Atmos. Sci. 63, pp. 480–503. Cited by: §3.3.
- Climatology of stratocumulus cloud morphologies: microphysical properties and radiative effects. Atmospheric Chemistry and Physics 14 (13), pp. 6695–6716. Cited by: §1.
- Hybrid models with deep and invertible features. arXiv preprint arXiv:1902.02767. Cited by: §3.1, §3.2.
Weakly-and semi-supervised learning of a dcnn for semantic image segmentation (2015). arXiv preprint arXiv:1502.02734. Cited by: §3.1.
Combining crowd-sourcing and deep learning to understand meso-scale organization of shallow convection. arXiv preprint arXiv:1906.01906. Cited by: §1.
- U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1, §3.1.
- ISCCP cloud data products. Bulletin of the American Meteorological Society 72 (1), pp. 2–20. External Links: Cited by: §1.
- Why does aerosol forcing control historical global-mean surface temperature change in cmip5 models?. Journal Of Climate 28, pp. 6608–6625. Cited by: §1.
- Classifying clouds around the globe with the cloudsat radar: 1-year of results. Journal of Geophysical Research: Atmospheres 113 (D8). Cited by: §3.3.
- Global distribution of cirrus clouds from cloudsat/cloud-aerosol lidar andinfrared pathfinder satellite observations (calipso) measurements. Journal of Geophysical Research: Atmospheres 113 (D00A12). Cited by: §3.3.
- Cirrus clouds and deep convection in the tropics: insights from calipso and cloudsat. Journal of Geophysical Research: Atmospheres 114 (D00H06). Cited by: §3.3.
- Possible climate transitions from breakup of stratocumulus decks under greenhouse warming. Nature Geoscience 12 (3), pp. 163. Cited by: §1.
- Cloud feedbacks in the climate system: a critical review. Journal of climate 18 (2), pp. 237–273. Cited by: §1.
- Development of a high spatiotemporal resolution cloud-type classification approach using himawari-8 and cloudsat. International Journal of Remote Sensing 40 (16), pp. 6464–6481. Cited by: §1.