Combining crowd-sourcing and deep learning to understand meso-scale organization of shallow convection

06/05/2019 ∙ by Stephan Rasp, et al. ∙ Universität München 0

The discovery of new phenomena and mechanisms often begins with a scientist's intuitive ability to recognize patterns, for example in satellite imagery or model output. Typically, however, such intuitive evidence turns out to be difficult to encode and reproduce. Here, we show how crowd-sourcing and deep learning can be combined to scale up the intuitive discovery of atmospheric phenomena. Specifically, we focus on the organization of shallow clouds in the trades, which play a disproportionately large role in the Earth's energy balance. Based on visual inspection four subjective patterns or organization were defined: Sugar, Flower, Fish and Gravel. On cloud labeling days at two institutes, 67 participants classified more than 30,000 satellite images on a crowd-sourcing platform. Physical analysis reveals that the four patterns are associated with distinct large-scale environmental conditions. We then used the classifications as a training set for deep learning algorithms, which learned to detect the cloud patterns with human accuracy. This enables analysis much beyond the human classifications. As an example, we created global climatologies of the four patterns. These reveal geographical hotspots that provide insight into the interaction of mesoscale cloud organization with the large-scale circulation. Our project shows that combining crowd-sourcing and deep learning opens new data-driven ways to explore cloud-circulation interactions and serves as a template for a wide range of possible studies in the geosciences.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

page 6

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

Supplemental Methods

Region selection criteria

The regions were selected ahead of the classification days according to a similarity analysis of atmospheric conditions that resemble the conditions encountered during the DJF season east of Barbados where these patterns were first found (Stevens et al., 2019b).

Because the mesoscale organization of shallow cumulus is a relatively new research topic, the meteorological conditions influencing it are primarily an educated guess. Lower tropospheric stability (LTS), surface wind speed (FF) and total integrated column water vapour (TCWV) are three parameters one could naively imagine to describe the meteorological setting to a sufficient degree. Starting with the inter-annual seasonal mean of these atmospheric properties at the region east of Barbados, we searched for climatologically similar regions and seasons within a 120-wide latitudinal belt (60N to 60

S) around the globe. We used a k-means clustering with eight clusters to find similar patterns within our search perimeter. As input to the algorithms we used the climatological means of LTS, FF10 and TCWV for each of the four seasons. The eight clusters explain more than 90% of the variance in the dataset and provide large enough regions to fit 21

longitude by 14 latitude boxes reasonably well.

Figure S1: Cluster analysis of LTS, FF10, TCWV separated by season (DJF, MAM, JJA, SON). The colors identify the 8 clusters as a result of the k-means algorithm. For a better visual impression the clusters are sorted by cluster mean column integrated moisture with cluster 1 being the driest. Black boxes indicate regions chosen for human-classifications.

Fig. S1 shows the clusters for the four seasons. Our analysis indicates that the meteorological conditions over the Northwestern Atlantic change with season. This is not surprising due to the migration of the ITCZ, but it illustrates that we shouldn’t expect to see the same cloud patterns or at least the same distribution throughout the year. The final choice of seasons and regions was made to match the climate of region 1 in DJF (Table S1)

Domain Bounds Seasons used
1 -61E -40E; 10N 24N DJF, MAM
2 159E 180E; 8N 22N DJF
3 -135E -114E; -1N -15N DJF, SON
Table S1: Selected domains used for human-classification of cloud patterns.

Agreement metrics

In the paper we use two different metrics for agreement. First, the agreement score, used to compare the inter-human agreement, defined as follows: “In which percentage of cases, if one user drew a box of a certain class, did another user also draw a box of the same class, under the condition that the boxes overlap.” The overlap is measured using the Intersection-over-Union (IoU) metric. For above metric an IoU of larger than 0.1 is required. While this value might seem low, for two equally sized boxes this actually indicates an overlap of 20%, almost one quarter of the box. Changing the threshold changes the absolute values but not the relative agreement for each of the patterns. To measure inter-human agreement, for each image all combinations of two users are compared against each other and subsequently averaged.

The second metric is the pixel accuracy used to compare the machine learning models to the human predictions. Here, for each pixel, the accuracy of one user (or a machine learning prediction) compared to another user is computed for each pattern. Pixels where both users predict no pattern are omitted for this score.

The reason for using two different metrics is that while the first metric is easily understandable and interpretable, it is not a proper metric (Gneiting and Raftery, 2007). This means that predicting the truth does not necessarily give the best score. For example, because of the IoU threshold, predicting larger boxes would result in a higher agreement score. The pixel agreement, in contrast, is a proper score and is therefore suited to compare inter-human agreement with the two deep learning algorithms.

Deep learning models

Two deep learning models are used, one for object detection and one for semantic segmentation. For object detection, an algorithm called Retinanet (Lin et al., 2017) is used. Here we used the following implementation in Keras (Chollet and Others, 2015): https://github.com/fizyr/keras-retinanet, which uses a Resnet50 (He et al., 2015) backbone. The original images had a resolution of 2100 by 1400 pixels. For Retinanet the images were downscaled to 1050 by 700 pixels. This is necessary to fit the batch (batch size = 4) into GPU RAM.

For semantic segmentation, we first converted each human classification, i.e. all boxes by one user for an image, to a mask. Sometimes boxes for different patterns overlap. In this case, the mask is chosen to represent the value of the smaller box. Overall, the amount of overlapping boxes is small, however, so that the resulting error is most likely negligible. To create a segmentation model, we used the fastai Python library121212https://docs.fast.ai/. The network architecture has a U-Net (Ronneberger et al., 2015) structure with a Resnet50 backbone. For the segmentation model the images were downscaled to 700 by 466 pixels (batch size = 6).

To create the prediction masks, first a Gaussian filter with a half-width of 10 pixels was applied to smooth the predicted field. Then, for each pixel the highest probability for each of the four patterns was used, if this probability exceeded 30%. This last step counteracts the tendency to predict background, which is by far the most common class in the training set.

Global heatmaps

To create the heatmaps, the segmentation algorithms was used. Predictions were created for a 21 longitude by 14 latitude region at a time, with a windows sliding in 10.5 and 7 increments over the globe. The highest pattern probability for the overlapping images was then taken to create the global mask. This was necessary because the algorithm tends to predict background at the edges of the image, a consequence of the human labelers not drawing boxes that extend all the way to the edge of the image. The climatology was created from one year of Aqua data.

Figure S2: (Top row) Total size of classifications for the two deep learning algorithms for a random validation dataset. (Bottom row) Mean pixel accuracy for the two algorithms stratified by pattern, also for a random validation set.