- Atkinson and Zhang (1996) Atkinson, B. W., and J. W. Zhang, 1996: Mesoscale Shallow Convection in the Atmosphere. Reviews of Geophysics, doi:10.1029/96RG02623.
- Bony and Dufresne (2005) Bony, S., and J. Dufresne, 2005: Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models. Geophysical Research Letters, 32 (20), L20 806, doi:10.1029/2005GL023851.
- Bony et al. (2004) Bony, S., J.-L. Dufresne, H. Le Treut, J.-J. Morcrette, and C. Senior, 2004: On dynamic and thermodynamic components of cloud changes. Climate Dynamics, 22 (2), 71–86, doi:10.1007/s00382-003-0369-6.
- Bony et al. (2017) Bony, S., and Coauthors, 2017: EUREC4A: A Field Campaign to Elucidate the Couplings Between Clouds, Convection and Circulation. Surveys in Geophysics, 38 (6), 1529–1568, doi:10.1007/s10712-017-9428-0.
- Boucher et al. (2013) Boucher, O., and Coauthors, 2013: Clouds and aerosols, 571–657. Cambridge University Press, Cambridge, UK, doi:10.1017/CBO9781107415324.016.
- Caron et al. (2018) Caron, M., P. Bojanowski, A. Joulin, and M. Douze, 2018: Deep Clustering for Unsupervised Learning of Visual Features. https://arxiv.org/abs/1807.05520.
- Chollet and Others (2015) Chollet, F., and Others, 2015: Keras. https://keras.io/.
- Gneiting and Raftery (2007) Gneiting, T., and A. E. Raftery, 2007: Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102 (477), 359–378, doi:10.1198/016214506000001437.
- He et al. (2015) He, K., X. Zhang, S. Ren, and J. Sun, 2015: Deep Residual Learning for Image Recognition. http://arxiv.org/abs/1512.03385.
- Hennon et al. (2015) Hennon, C. C., and Coauthors, 2015: Cyclone Center: Can Citizen Scientists Improve Tropical Cyclone Intensity Records? Bulletin of the American Meteorological Society, 96 (4), 591–607, doi:10.1175/BAMS-D-13-00152.1.
- Hong et al. (2017) Hong, S., S. Kim, M. Joh, and S.-k. Song, 2017: GlobeNet: Convolutional Neural Networks for Typhoon Eye Tracking from Remote Sensing Imagery. http://arxiv.org/abs/1708.03417.
- Kurth et al. (2018) Kurth, T., and Coauthors, 2018: Exascale Deep Learning for Climate Analytics. http://arxiv.org/abs/1810.01993.
- LeCun et al. (2015) LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521 (7553), 436–444, doi:10.1038/nature14539.
- Lin et al. (2017) Lin, T.-Y., P. Goyal, R. Girshick, K. He, and P. Dollár, 2017: Focal Loss for Dense Object Detection. http://arxiv.org/abs/1708.02002.
- Liu et al. (2016) Liu, Y., E. Racah, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, M. Wehner, and W. Collins, 2016: Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets. https://arxiv.org/abs/1605.01156.
- Muhlbauer et al. (2014) Muhlbauer, A., I. L. McCoy, and R. Wood, 2014: Climatology of stratocumulus cloud morphologies: microphysical properties and radiative effects. Atmospheric Chemistry and Physics, 14 (13), 6695–6716, doi:10.5194/acp-14-6695-2014.
- Racah et al. (2016) Racah, E., C. Beckham, T. Maharaj, S. E. Kahou, Prabhat, and C. Pal, 2016: ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. http://arxiv.org/abs/1612.02095.
- Rasp et al. (2018) Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences of the United States of America, 115 (39), 9684–9689, doi:10.1073/pnas.1810286115.
- Rauber et al. (2007) Rauber, R. M., and Coauthors, 2007: Rain in Shallow Cumulus Over the Ocean: The RICO Campaign. Bulletin of the American Meteorological Society, 88 (12), 1912–1928, doi:10.1175/BAMS-88-12-1912.
- Reichstein et al. (2019) Reichstein, M., G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais, and Prabhat, 2019: Deep learning and process understanding for data-driven Earth system science. Nature, 566 (7743), 195–204, doi:10.1038/s41586-019-0912-1.
- Ronneberger et al. (2015) Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional Networks for Biomedical Image Segmentation. http://arxiv.org/abs/1505.04597.
- Stevens et al. (2016a) Stevens, B., S. C. Sherwood, S. Bony, and M. J. Webb, 2016a: Prospects for narrowing bounds on Earth’s equilibrium climate sensitivity. Earth’s Future, 4 (11), 512–522, doi:10.1002/2016EF000376.
- Stevens et al. (2016b) Stevens, B., and Coauthors, 2016b: The Barbados Cloud Observatory: Anchoring Investigations of Clouds and Circulation on the Edge of the ITCZ. Bulletin of the American Meteorological Society, 97 (5), 787–801, doi:10.1175/BAMS-D-14-00247.1.
- Stevens et al. (2019a) Stevens, B., and Coauthors, 2019a: A high-altitude long-range aircraft configured as a cloud observatory–the NARVAL expeditions. Bulletin of the American Meteorological Society, BAMS–D–18–0198.1, doi:10.1175/BAMS-D-18-0198.1.
- Stevens et al. (2019b) Stevens, B., and Coauthors, 2019b: Sugar, Gravel, Fish, and Flowers:Mesoscale cloud patterns in the Tradewinds. Quart. J. Roy. Meteor. Soc., submitted.
- Tobin et al. (2012) Tobin, I., S. Bony, and R. Roca, 2012: Observational Evidence for Relationships between the Degree of Aggregation of Deep Convection, Water Vapor, Surface Fluxes, and Radiation. Journal of Climate, 25 (20), 6885–6904, doi:10.1175/JCLI-D-11-00258.1.
- Vial et al. (2013) Vial, J., J.-L. Dufresne, and S. Bony, 2013: On the interpretation of inter-model spread in cmip5 climate sensitivity estimates. Climate Dynamics, 41 (11), 3339–3362, doi:10.1007/s00382-013-1725-9.
- Wood and Hartmann (2006) Wood, R., and D. L. Hartmann, 2006: Spatial Variability of Liquid Water Path in Marine Low Cloud: The Importance of Mesoscale Cellular Convection. Journal of Climate, 19 (9), 1748–1764, doi:10.1175/JCLI3702.1.
- Xie et al. (2016) Xie, J., R. Girshick, and A. Farhadi, 2016: Unsupervised Deep Embedding for Clustering Analysis. http://arxiv.org/abs/1511.06335v2.
- Young et al. (2002) Young, G. S., D. A. R. Kristovich, M. R. Hjelmfelt, and R. C. Foster, 2002: Rolls, Streets, Waves, and More: A Review of Quasi-Two-Dimensional Structures in the Atmospheric Boundary Layer. Bulletin of the American Meteorological Society, 83 (7), 997–1001, doi:10.1175/1520-0477(2002)083¡0997:RSWAMA¿2.3.CO;2.
Region selection criteria
The regions were selected ahead of the classification days according to a similarity analysis of atmospheric conditions that resemble the conditions encountered during the DJF season east of Barbados where these patterns were first found (Stevens et al., 2019b).
Because the mesoscale organization of shallow cumulus is a relatively new research topic, the meteorological conditions influencing it are primarily an educated guess. Lower tropospheric stability (LTS), surface wind speed (FF) and total integrated column water vapour (TCWV) are three parameters one could naively imagine to describe the meteorological setting to a sufficient degree. Starting with the inter-annual seasonal mean of these atmospheric properties at the region east of Barbados, we searched for climatologically similar regions and seasons within a 120-wide latitudinal belt (60N to 60
S) around the globe. We used a k-means clustering with eight clusters to find similar patterns within our search perimeter. As input to the algorithms we used the climatological means of LTS, FF10 and TCWV for each of the four seasons. The eight clusters explain more than 90% of the variance in the dataset and provide large enough regions to fit 21longitude by 14 latitude boxes reasonably well.
Fig. S1 shows the clusters for the four seasons. Our analysis indicates that the meteorological conditions over the Northwestern Atlantic change with season. This is not surprising due to the migration of the ITCZ, but it illustrates that we shouldn’t expect to see the same cloud patterns or at least the same distribution throughout the year. The final choice of seasons and regions was made to match the climate of region 1 in DJF (Table S1)
|1||-61E -40E; 10N 24N||DJF, MAM|
|2||159E 180E; 8N 22N||DJF|
|3||-135E -114E; -1N -15N||DJF, SON|
In the paper we use two different metrics for agreement. First, the agreement score, used to compare the inter-human agreement, defined as follows: “In which percentage of cases, if one user drew a box of a certain class, did another user also draw a box of the same class, under the condition that the boxes overlap.” The overlap is measured using the Intersection-over-Union (IoU) metric. For above metric an IoU of larger than 0.1 is required. While this value might seem low, for two equally sized boxes this actually indicates an overlap of 20%, almost one quarter of the box. Changing the threshold changes the absolute values but not the relative agreement for each of the patterns. To measure inter-human agreement, for each image all combinations of two users are compared against each other and subsequently averaged.
The second metric is the pixel accuracy used to compare the machine learning models to the human predictions. Here, for each pixel, the accuracy of one user (or a machine learning prediction) compared to another user is computed for each pattern. Pixels where both users predict no pattern are omitted for this score.
The reason for using two different metrics is that while the first metric is easily understandable and interpretable, it is not a proper metric (Gneiting and Raftery, 2007). This means that predicting the truth does not necessarily give the best score. For example, because of the IoU threshold, predicting larger boxes would result in a higher agreement score. The pixel agreement, in contrast, is a proper score and is therefore suited to compare inter-human agreement with the two deep learning algorithms.
Deep learning models
Two deep learning models are used, one for object detection and one for semantic segmentation. For object detection, an algorithm called Retinanet (Lin et al., 2017) is used. Here we used the following implementation in Keras (Chollet and Others, 2015): https://github.com/fizyr/keras-retinanet, which uses a Resnet50 (He et al., 2015) backbone. The original images had a resolution of 2100 by 1400 pixels. For Retinanet the images were downscaled to 1050 by 700 pixels. This is necessary to fit the batch (batch size = 4) into GPU RAM.
For semantic segmentation, we first converted each human classification, i.e. all boxes by one user for an image, to a mask. Sometimes boxes for different patterns overlap. In this case, the mask is chosen to represent the value of the smaller box. Overall, the amount of overlapping boxes is small, however, so that the resulting error is most likely negligible. To create a segmentation model, we used the fastai Python library121212https://docs.fast.ai/. The network architecture has a U-Net (Ronneberger et al., 2015) structure with a Resnet50 backbone. For the segmentation model the images were downscaled to 700 by 466 pixels (batch size = 6).
To create the prediction masks, first a Gaussian filter with a half-width of 10 pixels was applied to smooth the predicted field. Then, for each pixel the highest probability for each of the four patterns was used, if this probability exceeded 30%. This last step counteracts the tendency to predict background, which is by far the most common class in the training set.
To create the heatmaps, the segmentation algorithms was used. Predictions were created for a 21 longitude by 14 latitude region at a time, with a windows sliding in 10.5 and 7 increments over the globe. The highest pattern probability for the overlapping images was then taken to create the global mask. This was necessary because the algorithm tends to predict background at the edges of the image, a consequence of the human labelers not drawing boxes that extend all the way to the edge of the image. The climatology was created from one year of Aqua data.