Rotation Equivariant Deforestation Segmentation
Deforestation has become a significant contributing factor to climate change and, due to this, both classifying the drivers and predicting segmentation maps of deforestation has attracted significant interest. In this work, we develop a rotation equivariant convolutional neural network model to predict the drivers and generate segmentation maps of deforestation events from Landsat 8 satellite images. This outperforms previous methods in classifying the drivers and predicting the segmentation map of deforestation, offering a 9 classification accuracy and a 7 addition, this method predicts stable segmentation maps under rotation of the input image, which ensures that predicted regions of deforestation are not dependent upon the rotational orientation of the satellite.READ FULL TEXT VIEW PDF
Rotation Equivariant Deforestation Segmentation
Deforestation has been greatly accelerated by human activities with many drivers leading to a loss of forest area. Deforestation has a negative impact on natural ecosystems, biodiversity, and climate change and it is becoming a force of global importance (Foley et al., 2005). Deforestation for palm plantations is projected to contribute 18-22% of Indonesia’s -equivalent emissions (Carlson et al., 2013). Furthermore, deforestation in the tropics contributes roughly 10% of annual global greenhouse gas emissions (Arneth et al., 2019). In addition, over one quarter of global forest loss is due to deforestation with the land being permanently changes to be used for the production of commodities, including beef, soy, palm oil, and wood fiber (Curtis et al., 2018). Climate tipping points are when a small change in forcing, triggers a strongly nonlinear response in the internal dynamics of part of the climate system (Lenton, 2011). Deforestation is one of the contributors that can cause climate tipping points (Lenton, 2011). Therefore, understanding the drivers for deforestation is of significant importance.
The availability and advances in high-resolution satellite imaging have enabled applications in mapping to develop at scale (Roy et al., 2014; Verpoorter et al., 2012, 2014; Janowicz et al., 2020; Karpatne et al., 2018)et al., 2019; Descals et al., 2019; Poortinga et al., 2019; Hethcoat et al., 2019; Sylvain et al., 2019; Irvin et al., 2020). However none of these previous methods leverage advances in group equivariant convolutional networks (Cohen and Welling, 2016b, a; Weiler and Cesa, 2019) and as such the methods are not stable with respect to transformations that would naturally occur during the capture of such data.
In this work we train models to classify drivers of deforestation and generate a segmentation map of the deforestation area. For this we build a convolutional and group equivariant convolutional model to assess the impact on classification accuracy and both segmentation accuracy and stability of the segmentation maps produced. We show that not only does the group equivariant model, with translation and rotation equivariant convolutions, improve classification and segmentation accuracy, but it has the desired property of stability of the segmentation map under natural transformations of the data capture method, namely rotations of the satellite imaging.
A CNN is, in general, comprised of multiple convolutional layers, alongside other layers. These convolutional layers are translation equivariant. This means that if the input signal is translated, the resulting output feature map is translated accordingly. Translation equivariance is a useful inductive bias to build into a model for image analysis as it is known that there is a translational symmetry within the data, i.e. if an image of an object is translated one pixel to the left, the image is still an image of the same object. This translational symmetry can be expressed through the group consisting of all translations of the plane .
This leads us to consider if the data has additional symmetries, such that we can look at these symmetry groups and utilise them in a model. Steerable CNNs define feature spaces of steerable feature fields , where a -dimensional vector is linked to each point of the bases space (Cohen and Welling, 2016a). Steerable CNNs are equipped with a transformation law that specifies how the features transform under actions of the symmetry group. The transformation law is fully characterized by the group representation . A group representation specifies how the channels, , of the feature vector, , mix under transformations. For a network layer to be equivariant it must satisfy the transformation law, see Figure 1. This places a constraint over the kernel, reducing the space of permissible kernels to those which satisfy the equivariance constraint. As the goal is to build linear layers that combine translational symmetry with a symmetry of another group for use in a model, the vector space of permissible kernels forms a subspace of that used in a conventional CNN. This increases the parameter efficiency of the layers, similar to how a CNN increases parameter efficiency over an MLP (Weiler and Cesa, 2019).
One particular group of interest for satellite imagery is the orthogonal group . The orthogonal group consists of all continuous rotations and reflections leaving the origin invariant. In addition to the orthogonal group, the cyclic group, , and the dihedral group, , consisting of discrete rotations by angles of multiples of and in the case of the dihedral group reflections also. These rotational symmetries are of interest for analysing satellite imagery as there is no global orientation of the images collected, i.e. if an image of a forest is captured it is still the same image of the same forest if it is rotated by an angle or reflected.
The dataset used is the same as that used by Irvin et al. (2020), where forest loss event coordinates and driver annotations were curated by (Austin et al., 2019). Random samples of primary natural forest loss events were obtained from maps publish by Global Forest Change (GFC) at 30m resolution from 2001 to 2016. These images were annotated by an expert interpreter (Austin et al., 2019). The drivers are grouped into categories determined feasible to identify using 15m resolution Landsat 8 imagery, while ensuring sufficient representation of each category in the dataset (Irvin et al., 2020). The mapping between expert labelled deforestation driver category and driver group used as a classification target is provided in Table 3. The dataset consists of 2,756 images, segmentation maps, and class labels; we follow the training/validation/testing set splits as provided by Irvin et al. (2020).
We use a U-Net (Ronneberger et al., 2015) architecture for the task of segmentation and attach an MLP to the lowest dimensional feature space for classification. In one model we use translation equivariant convolutional layers, while in the other we use translation rotation equivariant convolutional layers. For the rotation equivariant version we choose the group of discrete rotations by as the symmetry group. The input to the model is therefore three trivial representations, while hidden layers are multiple regular representations of the group, chosen similarly to the size of feature spaces in the non-rotation equivariant model, and the output is a single trivial representation. An example of how a trivial representation and a regular representation transform the output feature space is given in Figure 1 (a) and (b) respectively. Building a model in this way will ensure that the output segmentation map is stable under rotations of the input image.
The model trained with rotation equivariance outperforms the non rotation equivariant model for classification of the drivers of deforestation, shown in Table 1. Given that the convolutional kernels are constrained to be rotation equivariant in the better performing model it is possible for the model to use the features more efficiently and hence model parameters are not used learning similar features at different orientations. As a result the model is able to better distinguish between the different deforestation drivers. In addition to classification accuracy, the rotation equivariant model achieves better test segmentation accuracy, demonstrated in Table 2. One cause of this benefit is that the model can share learned segmentation features across different orientations that occur across the different images in the dataset.
|UNET - CNN||90.3||60.6||57.9||56.3|
|UNET - C8 Equivariant||82.7||67.1||63.0||64.3|
|UNET - CNN||72.9||68.7||67.8||67.9|
|UNET - C8 Equivariant||84.1||71.3||72.3||72.3|
Furthermore, the segmentation map predictions for the non rotation equivariant model and rotation equivariant models are shown to compare the stability of segmentation under rotation in Figure 2. This highlights, in Figure 2, that the segmentation map prediction for the non-rotation equivariant model changes as the image is rotated, which would be highly undesirable if used in practice as the rotation orientation of the satellite should not effect the segmentation map prediction of deforestation. On the other hand, the rotation equivariant models segmentation map prediction is stable under rotation, which is a desirable property of the model.
We develop a U-Net style model for classification and segmentation of deforestation that makes use of translation rotation equivariant convolutions. To the best of our knowledge this is the first study to make use of rotation equivariance in deforestation segmentation. The improved weight sharing through consideration of known symmetries in the data improves the classification accuracy of the model by 9%. Furthermore, the rotation equivariant model predicts segmentation maps that are stable under rotation. In a practical application of this model this would ensure that deforestation segmentation would not be dependent upon the rotational orientation of the satellite, which does not hold true for other models. Finally, the rotation equivariant model is 7% more accurate than the non-rotation equivariant model for the segmentation maps it produces when compared to ground truth segmentation. The improvement gain in both classification and segmentation of deforestation drivers will allow for conservation and management policies to be implemented more routinely based on model predictions from satellite data
International conference on machine learning, pp. 2990–2999. Cited by: §1.
Forestnet: Classifying Drivers of Deforestation in Indonesia Using Deep Learning on Satellite Imagery. arXiv preprint arXiv:2011.05479. Cited by: Table 3, §1, §3.
GeoAI: Spatially Explicit Artificial Intelligence Techniques for Geographic Knowledge Discovery and Beyond. Taylor & Francis. Cited by: §1.
|Expert Labelled Deforestation Driver Category||Classification Target Driver Group|
|Oil palm plantation||Plantation|
|Other large-scale plantations|
|Small-scale agriculture||Smallholder agriculture|
|Small-scale mixed plantation|
|Small-scale oil palm plantation|
Equivariance places a constraint over the kernels used by the model such that the model respects symmetries in the data. An alternative approach to this is to use data augmentation, which is generally easier to implement. On the other hand, data augmentation effectively increases the size of the dataset and therefore makes training slower. Building equivariant models guarantees the models behaviour under certain symmetries, whereas data augmentation does not. Furthermore, equivariance can reduce the number of parameters required in the model and increase training efficiency. Therefore, in this work, given that we have a known symmetry group equivariant models are a sensible choice.
For both the non-rotation equivariant and rotation equivariant models we use the same model architecture with the key difference that the convolutional layers are either rotation equivariant or non-rotation equivariant depending on the choice of model. The model architecture is a U-Net style model, which makes use of a convolutional block comprised of two convolutional layers, two batch normalisation layers, and two drop out layers. The model then consists of five convolutional blocks with downsampling in-between each and five convolutional blocks with upsampling in-between each. Further, a skip connection is placed between each convolutional block connecting upsampled layers with the corresponding same shape downsampled layer. In addition, there is a flatten layer and three multi-layer perceptron layers providing the driver classification output from the lowest dimensional space. We build the model using PyTorch(Paszke et al., 2019) and for the rotation-equivariant layers we make use of E2CNN (Weiler and Cesa, 2019).
The non-rotation equivariant model has 3.7 million trainable parameters and the rotation equivariant model has 3.0 million trainable parameters. Each model was run on a Titan Xp GPU taking less than 30 minutes and requires approximately 3GiB of memory to train.