Log In Sign Up

Exploring Wilderness Using Explainable Machine Learning in Satellite Imagery

Wilderness areas offer important ecological and social benefits, and therefore warrant monitoring and preservation. Yet, what makes a place "wild" is vaguely defined, making the detection and monitoring of wilderness areas via remote sensing techniques a challenging task. In this article, we explore the characteristics and appearance of the vague concept of wilderness areas via multispectral satellite imagery. For this, we apply a novel explainable machine learning technique on a curated dataset, which is sophisticated for the task to investigate wild and anthropogenic areas in Fennoscandia. The dataset contains Sentinel-2 images of areas representing 1) protected areas with the aim of preserving and retaining the natural character and 2) anthropogenic areas consisting of artificial and agricultural landscapes. With our technique, we predict continuous, detailed and high-resolution sensitivity maps of unseen remote sensing data in regards to wild and anthropogenic characteristics. Our neural network provides an interpretable activation space in which regions are semantically arranged in regards to wild and anthropogenic characteristics and certain land cover classes. This increases confidence in the method and allows for new explanations in regards to the investigated concept. Our model advances explainable machine learning for remote sensing, offers opportunities for comprehensive analyses of existing wilderness, and practical relevance for conservation efforts. Code and data are available at and, respectively.


page 1

page 2

page 5

page 7

page 8

page 9

page 10


Exploring Models and Data for Remote Sensing Image Caption Generation

Inspired by recent development of artificial satellite, remote sensing i...

GlobeNet: Convolutional Neural Networks for Typhoon Eye Tracking from Remote Sensing Imagery

Advances in remote sensing technologies have made it possible to use hig...

Enhancing Poaching Predictions for Under-Resourced Wildlife Conservation Parks Using Remote Sensing Imagery

Illegal wildlife poaching is driving the loss of biodiversity. To combat...

ESFNet: Efficient Network for Building Extraction from High-Resolution Aerial Images

Building footprint extraction from high-resolution aerial images is alwa...

Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya

Glacier mapping is key to ecological monitoring in the hkh region. Clima...

SpaceNet: A Remote Sensing Dataset and Challenge Series

Foundational mapping remains a challenge in many parts of the world, par...

1 Introduction

Within a very short period, from a geophysical point of view, humans have greatly expanded and strongly influenced Earth’s environment (Steffen et al., 2011). Areas without human pressure have been greatly reduced and hardly exist in most regions of the world (Allan et al., 2017). While urbanization and agriculture have brought many benefits, land use has had immense and unprecedented ecological impacts. However, countless species, including us, depend on natural ecosystem functions. Water cycles provide freshwater, forests produce oxygen, sequester carbon dioxide, and bees pollinate plants from which fruits humans and animals are fed. Disturbing ecosystems has an impact on biodiversity, air quality, pathogen spread and even more. The global consequences of human land use are well described by Foley et al. (2005).

Wilderness areas can offer important ecological and social benefits. Understanding wilderness and being able to quantify it is therefore of high importance. Yet, what makes places “wild” and others not is only vaguely defined. In this article, we investigate the characteristics of wilderness areas using explainable machine learning (ML) techniques in satellite imagery, accompanied by philosophical reflections on the normative foundations of the concept of wilderness.

Wilderness mapping approaches such as the human influence index by Sanderson et al. (2002) provide important insights into the global distribution of wilderness areas. The human influence index is based on population density, land transformation, accessibility indicators such as roads, and electrical power infrastructure. Human-defined scoring methods result in an index, globally mapped with a resolution of 1 km2. Because such mapping approaches are based on many assumptions, with the aim to better understand the concept of wilderness, they offer limited insight. Furthermore, they are restricted to the low spatial and temporal resolution of the data they are based on.

Using satellite imagery when investigating undisturbed areas offers huge advantages. One of them is to be independent of the presence of infrastructure such as roads. Beyond that, no further costs are incurred, as in the case of areal flights, which makes satellites well suited for observing broad regions. Furthermore, satellites repeatedly fly over the same areas making it possible to continuously observe changes in Earth’s land uses. Some satellites, including Landsat 8 of the USGS National Land Imaging Program and Sentinel-2 of the Copernicus program, produce multispectral imagery along with infrared channels. Their equipment makes them ideal to investigate vegetation and thus wilderness with an emphasis on habitat-friendly areas. Several remote sensing applications have been used for monitoring protected areas and are reviewed by Wang et al. (2020).

Explainable Machine Learning.

ML models are able to find patterns and relations in large data sets that are not recognizable by humans. Ma et al. (2019)

gives a review on the variety of remote sensing applications which can be approached with deep learning including a critical conclusion and open challenges. Although ML applications are widely used, they are frequently viewed with suspicion due to lacking interpretability. Addressing this challenge,

Roscher et al. (2020b) present the usefulness of such models being interpretable and explainable. There are various approaches in the field of explainable ML, where Samek et al. (2021) reviews the important ones specifically for deep learning applications. Roscher et al. (2020a) discuss explainable ML approaches in remote sensing specifically in the field of bio- and geosciences.

A common explainable ML approach to interpreting models for image analysis is to derive saliency maps, which highlight regions that are important for the model’s decision. Ways to achieve this are gradient-based approaches such as Gradient-weighted Class Activation Mapping (Grad-CAM) by Selvaraju et al. (2017)

. Here, class-specific gradient information of convolutional neural networks is used to find relevant areas in the input image. A different approach is to occlude parts in the input image and identify changes in the model’s outcome such as by

Zeiler, Fergus (2014). They replace patches in the input image with a specific value to derive occlusion sensitivity maps. Petsiuk et al. (2018) extend this approach by occluding random areas in the image instead of uniformly shaped patches (Randomized Input Sampling for Explanation, RISE). Many of these and other methods have been evaluated on remote sensing data by Kakogeorgiou, Karantzalos (2021). They applied several explainable ML methods on models for land cover classification using Sentinel images and concluded that the approaches Grad-CAM, occlusions and LIME by Ribeiro et al. (2016) lead to the most reliable results.

According to Roscher et al. (2020a), explainable ML methods for remote sensing tasks are usually applied with the intention to align models with prior knowledge but have been less frequently used to uncover novel scientific insights. For this, common methods for saliency maps are only suitable to a limited extent, because they derive interpretations from single input images with the purpose of explaining the model’s decision for these specific samples. In that respect, our approach differs, because we do not focus on explaining the functionality of the model, but rather use explainable ML to explore a vaguely defined concept, namely wilderness.

Research with a focus on understanding concepts has been done i.a. by Levering et al. (2020) who explains scenicness from Sentinel-2 imagery with an interpretable-by-design model. The model has an interpretable bottleneck to generate Semantically Interpretable Activation Maps proposed by Marcos et al. (2019)

. Within this bottleneck, a land cover classification vector is predicted when predicting the scenicness score from a Sentinel-2 image, which allows explaining the predicted scenicness score with land cover classes.

In the context of this work, interpretability and explainability have both methodological and conceptual relevance. Methodologically, they contribute to improving and validating ML models for remote sensing applications using satellite imagery via focusing on a vague concept. Because of this, wilderness offers an interesting target for the development and application of ML-enabled remote sensing, as well as for the development of explainable ML. An outcome of explainable ML applications in this context is the opportunity for reflexivity regarding our own presuppositions and biases regarding what constitutes wild areas, and what properties inform such categorizations.

Our Research and Objectives.

Our here presented research builds upon the idea of analyzing activation maps within an activation space as suggested by Stomberg et al. (2021). By analyzing the activation space with a clustering technique, they are able to find semantic concepts that are sensitive to wild or non-wild characteristics. In this article, we extend their approach with the following objectives:

  1. We ground our study with a reflection on the philosophical and ethical dimensions of the concept of wilderness.

  2. These considerations are the foundation for the new AnthroProtect dataset, which is sophisticated for the task of investigating the appearance of wilderness and anthropogenic areas in Fennoscandia using multispectral satellite imagery.

  3. Using this dataset, we present an interpretable-by-design model to investigate the appearance of wilderness. With our method, regions in satellite images are put into a semantic formation which allows a fundamental sensitivity analysis resulting in sensitivity maps with high resolution.

  4. We compare our results with occlusion sensitivity maps as by Zeiler, Fergus (2014) and test our method on the Places365 dataset by Zhou et al. (2018) to show the advantages and generalizability of our method.

  5. We relate our method and results to conceptual debates about the idea of wilderness.

2 Wilderness: In Search of a Vague Concept

As discussed in the introduction, it is generally agreed that wilderness areas offer important ecological and social benefits, and therefore warrant preservation. Yet, exactly what makes a place “wild” is indeterminate. Policy-oriented definitions often highlight qualitative and relational characteristics rather than quantitative biophysical criteria. For example, Alterra et al. (2013, p. 10) defines wilderness as “an area governed by natural processes. It is composed of native habitats and species, and large enough for the effective ecological functioning of natural processes. It is unmodified or only slightly modified and without intrusive or extractive human activity, settlements, infrastructure or visual disturbance.” Similarly, the International Union for Conservation of Nature (IUCN) considers wilderness spaces to be “usually large unmodified or slightly modified areas, retaining their natural character and influence without permanent or significant human habitation, which are protected and managed so as to preserve their natural condition” (Dudley, 2008). What is found in these and other (policy-oriented) definitions is a set of generalizable characteristics, such as natural processes, native species, and the absence of human activity or settlement. This opens interesting but under-explored questions regarding how novel ML-enabled remote sensing techniques can offer insights into vaguely defined concepts.

We also acknowledge from the outset that the idea of wilderness is not only vaguely defined but ethically and politically contested. It has been argued that wilderness is best understood as socially constructed: “[W]hat constitutes wilderness is not the specific biophysical properties of an area but rather the specific meanings ascribed to it according to cultural patterns of interpretation” (Kirchhoff, Vicenzotti, 2014, p. 444). This has led to an ongoing debate regarding the role of wilderness in environmental decision-making and/or conservation efforts, captured in the edited volumes by Callicott, Nelson (1998) and Nelson, Callicott (2008). The primary critiques of wilderness are conceptual, targeting the meanings and associations it supports. These include the positioning of wilderness as places free of humans – and therefore sites of more “authentic nature” - and thus reinforcing a dualism between humans and nature; ignoring the historical presence of aboriginal peoples; and, overlooking the temporal dimension of ecological processes as opposed to a static and idealized state. For in-depth discussions of these critiques, see Callicott (1998), Cronon (1995) and Vogel (2015). Further, meanings associated with wilderness itself are not static but historically and culturally contingent. For example, Kirchhoff, Vicenzotti (2014) sketch changing European perceptions of wilderness: from a site of evil and danger in pre-modern times to a place for positive aesthetic experiences and symbolic of freedom for Enlightenment and Romantic thinkers, to contemporary associations with “natural” ecological conditions and a place for leisure and thrill.

We are sympathetic to these important critiques, however, we do not endorse the position that exercises in attempting to define wilderness via spatial parameters have no conceptual or practical relevance, as some posit (e.g., Kirchhoff, Vicenzotti (2014)). While there is undoubtedly a need to be cautious with the label of “wilderness” due to historical contingencies and political concerns, a move towards relativism overlooks the very real material conditions we must address in the 21st century. Indeed, in debates over the concept of wilderness, the underlying goals are not in question (e.g., conservation, sustainable development, biodiversity, etc.). Rather it is the idea of wilderness as a driver of environmentalism that is under scrutiny, as explained in the introductions by Callicott, Nelson (1998) and Nelson, Callicott (2008). We begin from the position that there is practical, conceptual, and methodological value in utilizing ML-enabled analyses of remote sensing data to better understand and analyze our conceptions and categorizations of wild places.

Before expanding on the value of this project, however, we want to clarify a few points on the goals and scope. We do not make any claims that “wilderness” areas as categorized here are in any way completely free of human influence. Given the global impact of human activities, the notion of sites with pristine nature completely (and historically) untouched by human actions, especially in Europe, is tenuous at best. Relatedly, we do not posit that current wilderness areas are completely free of human presence or intervention. Indeed, definitions of wilderness such as the IUCN’s provided above concede an active role for humans as managers of these spaces. Acknowledging this means we are not searching for some idealized or romanticized notion of “authentic” or “true” nature. Instead, we are searching for places in which the qualities of wilderness as defined above are best preserved and promoted. So while there is undoubtedly symbolism and cultural meaning attached to such areas, it would seem odd to insist there is not some set of underlying features that are (ecologically) desirable. “Wilderness” is then a shorthand used to categorize such areas. Conversely, there are areas of pervasive and continuous human influence, which are actively and intentionally maintained for specific human purposes or functions (e.g., cities and communities, energy infrastructures, agriculture, etc.). This creates a categorically different type of human presence and intervention, which we here initially classify as “anthropogenic” in our training data.

A final important clarification is the relation between “wild” and “anthropogenic” classifications. Alterra et al. (2013, p. 12) acknowledge that wild areas, as per their own definition, are few and fragmented across the continent. Because of this, “[i]n the European context [it] is important to notice that there is a spectrum of more or less wild areas according to the intensity of human interference. In that sense, wilderness is a relative concept which can be measured along a ‘continuum’, with wilderness at one end and marginal used land at the other.” Seeing wild-to-anthropogenic spaces on a continuum can help to avoid the human-nature dualism that could be reinforced via conceptions of wilderness (Dill, 2021). Further, it does not assert that “wild” spaces are free of human influences, but rather that anthropogenic influences in these areas are presumably synergistic rather than toxic or destructive. Dill (2021) points out, that certain extremes are in this sense anthropogenic or minimally wild (e.g., urban metropolitan cores). Conversely, other spaces (e.g., deep in the Amazon) exemplify the characteristics we would classify as wild or minimally anthropogenic. Understanding the gradients in-between – and importantly the characteristics and/or processes we use to locate places along such a continuum – then becomes an important task. And, it is one that ML is uniquely suited to assist with.

These clarifications lead to the most important practical applicability of this work. While we should be cautious with the label of “wilderness,” there are undoubtedly places and conditions where, at the least, certain processes and biophysical conditions are better able to thrive – places that are relatively less urbanized, or less negatively affected by human settlements, infrastructures, transportation networks, industrial processes, etc. The preservation and promotion of such places is a critical and timely task. Take for instance the Sustainable Development Goals (SDGs) of the United Nations. SDG 15 is focused on protecting, restoring, and promoting sustainable land use, including forest management and safeguarding critical biodiversity areas (United Nations, 2021, p. 56). Being able to analyze the extent and characteristics of non-anthropogenic areas, even if approximate, is a valuable tool. Further, analyses that can be updated, improved, and compared over time can allow for the monitoring and tracking of conservation areas, reforestation efforts, etc. As such, it is a useful tool that could be utilized as an SDG indicator, and more generally for policy-making and land-use planning.

In searching for areas that align with our conceptualization of wilderness on the European continent, we identify protected areas within Fennoscandia as a suitable test case. This is consistent with the human influence index by Sanderson et al. (2002) and the wilderness quality index by Fisher et al. (2010) mapping wide areas within Fennoscandia as areas with relatively minimal (disruptive) anthropogenic influence. However, we are aware that over the last 300 years the landscape of Fennoscandia, namely forests, have seen anthropogenically-driven changes, before regulations were introduced to protect them. This affects the southern regions of Fennoscandia more than the north. Kouki et al. (2001), Östlund et al. (1997) give a detailed overview of forest fragmentation in Fennoscandia and the transformation of the boreal forest landscape in Scandinavia, respectively.

3 Data

3.1 AnthroProtect: Collecting Satellite Imagery in Protected and Anthropogenic Areas in Fennoscandia

The AnthroProtect dataset is built for the purpose of investigating the appearance of wilderness and anthropogenic areas in Fennoscandia using multispectral satellite imagery. It is founded on the philosophical and ethical considerations in Section 2. In this article AnthroProtect is used 1) to train a ML model for wilderness classification, 2) to verify the semantic formation of regions within the so-called activation space, and 3) to investigate specific regions with identified sensitivity maps. Workflow and code for data extraction are based on Google Earth Engine by Gorelick et al. (2017). The full dataset and the code for data export are available at and, respectively. In the following, we explain our requirements for the dataset and its characteristics.


Although wilderness is an abstract concept whose appearance we want to investigate, we have to make some basic assumptions to collect data for our interpretable-by-design model:

1) We assume that wilderness is a geographically and culturally diverse concept since it appears in very different ways on earth: tropical forests, deserts and Antarctica can be all associated with wilderness although they have vastly different appearances, flora and fauna, and ecosystems. Therefore, we focus the research of this paper on Fennoscandia, or more specific the countries Norway, Sweden and Finland.

2) We further assume that within Fennoscandia, vegetation provides significant indicators for the appearance of wilderness. Therefore, we use multispectral imagery of the Sentinel-2 satellites whose instruments are specialized in vegetation. We decide to use Sentinel-2 over Landsat 8, due to the better spatial resolution and the additional red-edge bands. There also has been research by Astola et al. (2019) concluding that Sentinel-2 outperforms Landsat 8 in forest parameter prediction in Finland.

Binary classes.

To train our ML model, we need labeled data with binary classes associated with wild and anthropogenic, respectively. For this purpose, we first locate areas with certain conditions and then export Sentinel-2 images within these areas. In the following, we declare our strategy to find suitable areas for both classes:

To find areas that are associated with wilderness, we use the World Database on Protected Areas (WDPA) by UNEP-WCMC, IUCN (2021) which contains polygons of protected areas according to the categories of Dudley (2008). We consider terrestrial areas of categories Ia (strict nature reserve), Ib (wilderness area) and II (national park) with a minimum area of 50 km2. The corresponding class is called protected in the following.

To find areas that are not associated with wilderness, we use the Copernicus CORINE Land Cover dataset by European Environment Agency (2018). First, we locate areas with land cover classes 1 (artificial surfaces) and 2 (agricultural areas). Then, three morphological functions are applied to these areas in the following order: 1) closing to remove holes and gaps, 2) opening to increase compactness and filter small structures, 3) dilation to create a buffer. For closing and opening we use a circle with a radius of 2 km, respectively, and for dilation a circle with a radius of 1 km. Finally, all areas are filtered for a minimum area of 50 km2. We call the corresponding class anthropogenic in the following.

Sentinel-2 data.

For each protected and anthropogenic area, we export one multispectral Sentinel-2 image composite with the following workflow: 1) The atmospheric corrected Sentinel-2 products (Level-2A) are used with a resolution of 10 meters. 2) Images are filtered for the time period of summer 2020 (July 1st to August 30th

). 3) A mask for clouds, cirrus and cloud shadows is created using the Quality-60m band (QA60) and scene classification map (SCL) provided by Sentinel-2. Only images with a mask fraction of less than 5 % within the region of interest are taken. 4) The masked areas are dilated with a radius of 100 meters to prevent artifacts at the transitions. 5) The 25

th percentile of all images leaving out masked areas is calculated across all pixels and bands to create a single image composite. 6) The following ten bands are exported: B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12. 7) We look at red-green-blue images made of bands B4, B3, B2 and remove images of two regions of interest because of strong artifacts. 8) Each region of interest is tiled into images of size 256256 pixels, which corresponds to 25602560 meters. Samples for each category are shown in Figure 2.

Land cover data.

Besides Sentinel-2 images, the following land cover data is exported for each region. In this article, we use this land cover data for evaluation purposes only: 1) a specific composite of the Sentinel-2 scene classification map (SCL), 2) the Copernicus CORINE Land Cover dataset by European Environment Agency (2018), 3) the MODIS Land Cover Type 1 by Friedl, Sulla-Menashe (2019), 4) the Copernicus Global Land Service by Buchhorn et al. (2020) and 5) the ESA GlobCover by Arino et al. (2012).

Data split.

The data is divided into three subsets for training, validation and testing. The three subsets are intended to be independent of each other, spatially consistent and categorically consistent. To ensure categorical consistency, the data split is performed separately for each category (protected Ia, Ib and II, and anthropogenic). To ensure independence and spatial consistency, spatial clusters are built as follows: In a first step, data samples are separated if their distance is larger than 10 km using DBSCAN developed by Ester et al. (1996)

. This way, some large clusters occur, so that in a second step, these large clusters are spatially clustered multiple times using the k-means algorithm by

Lloyd (1982). We use scikit-learn by Pedregosa et al. (2011) to perform both clustering algorithms.

Subsequently, all samples within one cluster are assigned to the same dataset. We choose the split fractions to be 80% / 10% / 10%. Hereby, samples within very small clusters are assigned to the training dataset.

Our procedure of data splitting reduces the incidence that nearby samples appear in different datasets, preventing validation and test results from being glossed over. The resulting data split is visualized in Figure 1 and the sizes of each subset and category are listed in Table 1.

Figure 1: Locations of the AnthroProtect data samples. Shown are the 7,003 protected and 16,916 anthropogenic samples which are split into three independent subsets for training (80%), validation (10%) and testing (10%). For better clarity, the coloring of both categories differs only for the training set. [The plot is created with Plotly Technologies Inc. (2015) and copyright holders of the map are Carto and OpenStreetMap contributors.]
Figure 2: Samples of Sentinel-2 images. Shown are three samples for each category. From top to bottom: 1) anthropogenic, 2) WDPA Ia, 3) WDPA Ib, 4) WDPA II. [Copernicus Sentinel data 2020.]
class category # train # val # test # total
protected Ia 295 37 37 369
Ib 3,601 465 446 4,512
II 1,693 220 209 2,122
anthropogenic - 13,534 1,670 1,712 16,916
19,123 2,392 2,404 23,919
Table 1: Number of samples in the AnthroProtect dataset separated by categories and datasets.

Investigative areas.

Besides the mentioned protected and anthropogenic regions, some further Sentinel-2 images are exported for regions that are of interest for investigation. This includes several villages, forests, power plants, wind parks, airports and more. For many of these regions, time series of the years 2017 to 2021 are included in this dataset. It is ensured that all investigative regions do not overlap with samples of the training, validation or test set.

3.2 Places365

We also apply our ML method to chosen classes of the Places365-Standard dataset by Zhou et al. (2018) to test the generalizability of our method and investigate a possible different behavior for a different type of data. We use the version of the small images of Places365 which provides images of 256256 pixels in size. This is equal to the Sentinel-2 images’ height and width of the AnthroProtect dataset and therefore ensures comparability. The number of channels, though, is three (red, green, blue) instead of ten.

In this article, we present experiments with some classes of the Places365 dataset. For each of the classes, the Places365-Standard dataset provides 5,000 training images and 100 validation images. We randomly split the training set into two subsets so that, for each class, we get 4,900 training samples and 100 test samples.

4 Methodology

4.1 Activation Space Occlusion Sensitivity (ASOS)

In this section, we present an interpretable-by-design model and a methodology developed for the purpose to investigate the appearance of vaguely defined concepts: Activation Space Occlusion Sensitivity (ASOS). The code for the presented methodology and experiments is available at

ASOS is a two-step procedure: First, a neural network consisting of an image-to-image network and a binary classification head is trained to classify weakly defined labels. In the second step, a functional relationship between the activation values at the interface of the two networks and the sensitivity to the classification decision is established. Having a trained image-to-image network and a functional relationship between activation values and sensitivity, we predict sensitivity maps and identify the key characteristics of an image that lead to the classification decision.

Figure 3: Architecture of our neural network. The network consists of a modified U-Net, followed by a binary image classifier. It takes multispectral images and predicts a value between 0 and 1. The activation map at the interface between both networks has the same height and width as the input image. [The figure has been created with PlotNeuralNet by Iqbal (2018). On the left-hand side it contains a Sentinel-2 image of the Copernicus Sentinel data 2020.]

Neural network architecture.

For the image-to-image network we choose a modified form of the U-Net by Ronneberger et al. (2015) with the following characteristics: 1) Our U-Net consists of four encoding and four decoding steps. Instead of two convolutional layers per encoding or decoding step, our U-Net has only one such layer. Furthermore, we reduce the number of output channels for each convolution as shown in Figure 3

. 2) We add batch normalization after each convolutional layer. 3) Including padding to each convolution, we preserve the image size at each skip connection. 4) We replace the deconvolutional upsampling with bilinear upsampling as proposed in

Odena et al. (2016) to prevent checkerboard artifacts. 5) Instead of a single-channel input image, our U-Net takes

-channel input images. Furthermore, we consider the number of output maps as a hyperparameter 

. 6) The output maps are not batch-normalized and activated with the hyperbolic tangent function (tanh) so that the activation maps have values in the range of -1 and 1. With this architecture, the predicted activation map of an image with shape () has shape () - so height  and width  remain unchanged.

The classifier network has significantly fewer parameters than the image-to-image network and consists of three convolutional layers followed by two fully connected linear layers. Each convolutional layer doubles the number of channels, has a kernel size of 5, a stride of 3 and is ReLU activated. The output of the last convolutional layer is flattened. Two fully connected linear layers follow with 128 and 1 neuron(s), respectively. The first linear layer is ReLU activated and the last one is activated with Sigmoid, yielding to predictions

in the range of 0 to 1.

Both U-Net and classifier are treated as a single neural network during training and therefore trained end-to-end. However, the activation map at the interface is essential to perform ASOS. The described architecture is visualized in Figure 3. We want to emphasize though, that various architectures lead to similar results, which is discussed in Section 6.

CutMix and loss function.

Our goal is to determine sensitivities by occluding certain areas in the activation maps after the model was trained. For this purpose, the neural network must be able to output continuous classification scores in the range of 0 to 1. Therefore, we randomly apply CutMix similar to Yun et al. (2019) on training samples with a chance of 80% while training the model. If CutMix is applied, a stripe is cut from another random training sample and pasted into the actual one. The stripe is cut and pasted at a random edge of the images and makes up a random amount between 0 and 50%. The label is adjusted proportionally which results in a continuous value. Resulting images and their labels are shown in Figure 4.

Yun et al. (2019)

use CutMix as a data augmentation technique with cross-entropy as the loss function. We, on the other hand, use CutMix to increase the model’s ability to predict continuous labels between 0 and 1 and therefore use mean square error as the loss function.

Figure 4: Cut-Mix. Cut-Mix is applied with a change of 80% on training samples. From left to right: 1) two anthropogenic areas with label 0, 2) anthropogenic and protected area with label 0.16, 3) protected and anthropogenic area with label 0.72, 4) two protected areas with label 1. [Copernicus Sentinel data 2020.]

Activation space.

After training the neural network, all correctly predicted training samples are used to define the activation space as follows: 1) Having correctly predicted training samples, we predict activation maps, each one with shape (, , ). 2) We then treat each pixel in each activation map as a vector with shape (), having vectors in total. These vectors build an -dimensional activation space, in which each axis represents the values of one of the channels. The described steps are visualized in Figure 5 for the specific case .

Figure 5: Activation maps and their representation within the activation space. The left-hand side shows samples of the 19,074 correctly predicted Sentinel-2 images of the training dataset. For each of them, one activation map is predicted with the image-to-image network. Here, each activation map has channels so that they can be displayed as red-green-blue images in the middle of this figure. Each pixel in each activation map is represented within the three-dimensional activation space (right-hand side), in which each axis represents the values of one of the three channels. [The images on the left-hand side are Sentinel-2 images of the Copernicus Sentinel data 2020. The plot on the right-hand side is created with Plotly Technologies Inc. (2015).]

Activation space sensitivities and occlusions.

Our goal is to map areas in the activation space to sensitivity values, indicating to what extent activation values influence the classification score. To this end, pixels that are close in the activation space and thus semantically similar are occluded in the activation maps. These occlusions lead to deviations in the classification scores from which we derive sensitivities. In detail, this works as follows:

We slide a -dimensional hypercube with edge length through the activation space with a stride equal to the hypercube’s edge length. If the density of vectors within the hypercube is times higher than the average density in the activation space, we perform the following steps: 1) For each vector within the hypercube, we occlude the corresponding pixels in all activation maps. With occluding we mean that we replace the actual values with zero. 2) We pass the occluded activation maps to the classifier and receive a prediction for each activation map. 3) Comparing each prediction with the corresponding non-occlusion prediction , we obtain a deviation per pixel of , where is the number of occluded pixels in the corresponding activation map. In doing so, we obtain one deviation for each of the correctly predicted training samples. 4) We define a threshold and consider only those deviations where is larger. In this way, we skip very small deviations due to very few occluded pixels. 5) The mean or median value over all remaining deviations can be calculated and the negative of that value is a measure of the activation space sensitivity within the hypercube: where is the set of all deviations for the vectors within the hypercube. The obtained sensitivities can be illustrated in the activation space as in Figure 6.

Sensitivity maps.

Having a trained image-to-image model and a determined activation space sensitivities, we are able to predict sensitivity maps for any new sample. First, the activation map is determined using the trained image-to-image network. Second, this activation map is evaluated using the functional relationship between activation values and sensitivity, as derived from the activation space occlusions. Since our image-to-image network is a purely convolutional neural network, images can have any height  and width . This allows the investigation of any region including regions that cannot be clearly assigned to wilderness or anthropogenic characteristics. Samples are shown in Figures 8, 9 and 10.

Figure 6: Sensitivities in the activation space. The greener the vectors, the more sensitive they are to wilderness; the more purple, the more sensitive they are to anthropogenic characteristics; the more beige, the less sensitive they are. For low-density areas, the sensitivity is not determined which is why the corresponding vectors are not displayed here. The minimum and maximum values of the color scale are chosen according to the 2%-percentile of the absolutes of all existing values. [The plot is created with Plotly Technologies Inc. (2015).]

Sensitivities were determined only for high-density regions within the activation space of the training samples. Therefore, activation maps might have pixels that cannot be attributed to a sensitivity value and must therefore be masked. This characteristic is related to out-of-distribution detection and prevents the sensitivity map to be filled when the sensitivity is unlikely correct.

Neutral occlusion value.

Sturmfels et al. (2020) perform baseline experiments and point out the challenge of feature missingness. In terms of our method, the question is which value is most suitable to occlude pixels in the activation maps. We deliberately use tanh to activate the activation maps, because it ranges from -1 to 1 which makes value 0 a good choice to deactivate certain features. However, we want to ensure that the classifier treats zero as neutral and is furthermore used to edges, which occur at the occlusion’s transitions. For this reason, we occlude pixels of the activation maps during training with a chance of 20% for 50% of the training samples.

4.2 Input Image Occlusion Sensitivity (IIOS)

Determining sensitivities by occlusions is based on the idea by Zeiler, Fergus (2014). Therefore, we choose this as a comparison method and produce Input Image Occlusion Sensitivity (IIOS) maps for any samples as follows:

We use the same, trained model as for ASOS to ensure comparability. Instead of the activation map, we now occlude the input image. Furthermore, the occlusions are not defined by the activation space, but by a square-shaped patch with edge length . We slide this patch through the input image with a stride of . For each position, we perform the following steps: 1) We replace the values of each pixel covered by the patch with zero. 2) We pass the occluded input image to the model and receive a prediction of . 3) Comparing each prediction with the corresponding non-occlusion prediction , we obtain a deviation per pixel of .

Thus, we receive one deviation for each occlusion. For areas with overlapping occlusions, the mean value of the corresponding deviations is calculated. Overlapping occlusions exist, if the stride is lower than the edge length of the patch. We receive one mean deviation for each pixel and call the negative of that value sensitivity for that pixel. We discuss the main differences and application cases for ASOS and IIOS in Section 6.

5 Experiments

5.1 ASOS on AnthroProtect

Training setup.

We build our model using PyTorch by

Paszke et al. (2019) and PyTorch Lightning by Falcon, The PyTorch Lightning team (2019). According to the multi-spectral Sentinel-2 data, the number of input channels is set to . Further, we choose the number of activation map channels to be and discuss the reasons for that in Section 6. Overall, our modified U-Net has about 1.8 million parameters. The classifier has about 200,000 parameters and is significantly smaller than the U-Net.

The Sentinel-2 Level-2A products have values from 0 to 10,000. We scale all values to be in a range from 0 to 1. Additionally to CutMix, we perform random image rotations of 90°, 180° or 270° during training to increase variability. Our model is trained with a batch size of 32. We optimize the model’s parameters with stochastic gradient descent and perform the one cycle learning rate policy by

Smith, Topin (2019) with a maximum learning rate of 1e-2. Further, we add a weight decay of 1e-4 to the loss function. In total, the model is trained for 5epochs.

Accuracy assessment.

The achieved overall accuracies of the training, validation and test dataset are 99.7%, 99.96% and 99.7%, respectively. Table 2

shows the confusion matrix of the test dataset.

anthropogenic protected
anthropogenic 1,710 2
protected 4 688
Table 2: Confusion matrix of the test dataset which has an accuracy of 99.7%.

Activation space sensitivities.

With our trained model, we predict activation maps for all correctly predicted training samples and analyze them within the activation space as described in Section 4.1. Because of artifacts at the margins due to padding layers in the U-Net, we define a small frame size of pixels and do not consider vectors at the margins within this frame. Furthermore, we only use a randomly chosen fraction of 1e-3 of all vectors. The activation space, as well as some activation maps, are visualized in Figure 5.

Determining the sensitivities, we decide the edge length of the hypercube to be and choose a minimum density fraction of = 2. Again, we ignore the margins of the activation maps by not occluding pixels within the defined frame of size . We define as threshold for the minimum number of occluded pixels. Figure 7 shows deviations due to occluding pixels of vectors within the same hypercube for three representative positions of the hypercube. We take the mean value of the deviations as a measure of sensitivity. The resulting activation space sensitivities are visualized in Figure 6.

Figure 7: Deviations due to occlusions of pixels for three different positions of the hypercube. From left to right: 1) Occluding activation maps at regions that correspond to vectors within this hypercube’s position mainly causes positive deviations. The sensitivity is which means that the area in the activation space is sensitive to anthropogenic characteristics. 2) The deviations are distributed around zero and so is the sensitivity. The corresponding area in the activation space is neither sensitive to anthropogenic nor wild characteristics. 3) Most deviations are negative and the sensitivity is to wild characteristics.

Sensitivity maps of investigative areas.

With our trained model and the determined activation space sensitivities, we predict sensitivity maps for the investigative areas included in the AnthroProtect dataset. To prevent the GPU from memory overflow, we split large images into tiles not larger than 2,0482,048 pixels and merge them after the prediction of the sensitivity maps. The advantage of large tiles is the reduced number of margins with artifacts. Sensitivity maps of some investigative samples are shown in Figure 8, 9 and 10.

Figure 8: Hydroelectric power plant Letsi in Sweden, located in an area barely inhabited by humans. Shown are a Sentinel-2 image and its sensitivity map covering an area of about 100 km2. The color scale is the same as in Figure 6. Sensitivities not predicted due to a low vector density in the activation space are colored in grey. [Copernicus Sentinel data 2020.]
Figure 9: Municipality Alvdal in Norway, whose villages are located within the Østerdalen valley. Shown are Sentinel-2 images and their sensitivity maps covering an area of about 420 km2 each. Deforestation within the years from 2017 to 2020 causes our model to predict a growing anthropogenic area. The color scale is the same as in Figure 6. Sensitivities not predicted due to a low vector density in the activation space are colored in grey. [Copernicus Sentinel data 2017 and 2020.]
Figure 10: Forestry and unused wetlands encounter in North Ostrobothnia, a region of Finland. Shown are a Sentinel-2 image and its sensitivity map covering an area of about 100 km2. The color scale is the same as in Figure 6. Sensitivities not predicted due to a low vector density in the activation space are colored in grey. [Copernicus Sentinel data 2020.]

Land cover classes within the activation space.

The AnthroProtect dataset provides various land cover datasets for all images, which allows us to relate each vector in the activation space to a land cover class. Figure 11 shows a subset of the CORINE land cover classes within the activation space. The division of these classes can be well seen.

Figure 11: Land cover classes within the activation space. Shown is a subset of the land cover classes of the Copernicus CORINE Land Cover dataset by European Environment Agency (2018). The land cover names are shortened and the color of glaciers & snow has been changed for a better overview. From top to bottom, the official classes are 211, 312, 512, 322 and 335. We obtain similar results with the other land cover data available in the AnthroProtect dataset. [The plots are created with Plotly Technologies Inc. (2015).]

5.2 IIOS on AnthroProtect

We predict IIOS using a patch’s edge length of  = 8 and a stride of  = 4. Other than in ASOS, we need the whole model including the classifier to predict IIOS for investigative samples. Since the classifier consists of linear neural layers, IIOS can be only predicted for images with height and width of  = 256 pixels. We, therefore, split the investigative images into tiles of this size and merge them after the prediction of the IIOS maps. Figure 12 shows the IIOS maps for the Alvdal valley in Norway.

Figure 12: Sensitivity maps of the Alvdal predicted with IIOS. Sentinel-2 images and ASOS maps are shown in Figure 9. The color scale is multiplied by 10 compared to the ASOS maps. To predict the IIOS maps, the image is subdivided into tiles of size , which is necessary to perform the IIOS.

5.3 ASOS and IIOS on Places365

We use the same model architecture for the Places365 data and only change the number of input channels to 3 instead of 10 since the Places365 images have red, green and blue channels only. Again, we scale the images from 0 to 1, perform CutMix, rotation augmentation and random occlusions during training and use the same hyperparameters. We increase the number of epochs to 20 though since we have fewer data samples than in AnthroProtect.

For our experiments we use the binary classes 1) dining room vs. bedroom and 2) windmill vs. lighthouse. For each of the two experiments, we predict the activation space sensitivities using all correctly predicted training samples and apply them to our test data. Furthermore, we predict IIOS maps for the test data. Samples are shown in Figure 13.

Figure 13: Experiments with the Places365 dataset. Upper row: Sensitivity maps resulting from a model trained with the classes dining room and bedroom. Lower row: Sensitivity maps resulting from a model trained with the classes windmill and lighthouse. Sensitivities not predicted due to a low vector density in the activation space are colored in grey in the ASOS maps. [Places365 dataset by Zhou et al. (2018).]

6 Discussion

6.1 Technical discussion

Activation space.

We hypothesize that regions within the satellite images are semantically arranged within the activation space and we present two observations that support this hypothesis. First, we observe that negative and positive sensitivities are separated within the activation space (see Figure 6). In between these maxima, we recognize an approximately even gradient from negative to positive sensitivity values. This shows that our model is able to arrange regions based on their influence on the classification decision. Secondly, we find that regions are arranged within the activation space according to specific land cover classes, although no land cover information has been used for training (see Figure 11). While glaciers & snow and water bodies build their own, distinct clusters, the other classes tend to merge into each other. This is coherent, considering that mixed landscapes are not uncommon, while water and glaciers can usually be clearly separated from their surroundings. Furthermore, there are limitations due to CORINE’s accuracy of about 92% and its resolution of 100m, which is significantly less than the resolution of the Sentinel-2 images of 10m. Both cause faulty assignments of land cover classes, which also supports smooth transitions in the activation space.

Comparing sensitivities and land cover classes within the activation space, we see that some land cover classes are correlated with the sensitivities. Arable lands appear more in regions that are sensitive to anthropogenic regions while moors & heathland appear more in regions that are sensitive to protected regions. Water is mostly neutral while coniferous forests appear in both positively and negatively sensitive regions. While the separation of moors & heathland and arable land is likely due to the data distribution of protected and anthropogenic areas, their separation from coniferous forests is independent of that and can be ascribed to the model.


Figure 7 shows histograms of the deviations resulting from occluding pixels of vectors within the same hypercube. If the mean value is close to zero, the distribution is usually very narrow and has a high peak at zero. This is representatively shown in the middle histogram. The further the mean value moves away from zero, the broader is the distribution. However, we observe that within the same hypercube, nearly all deviations are either positive or negative. Representative histograms are shown on the left- and right-hand sides. We observe similar behavior for all experiments with classes from the Places365 dataset.

The distributions we obtain also support the hypothesis that semantically similar regions are close within the activation space. It is the reason why we can attribute one sensitivity value to the whole activation space’s region within the hypercube.

Sensitivity maps.

We observe sensitivity maps predicted with ASOS to be detailed and in high resolution. Structures as the lake in Figure 8 are mapped nearly pixel-wise and so are many fields in Figure 9 and 10. Further, many detailed structures can be seen close to the windmill blade in Figure 13.

Comparing the level of detail in ASOS and IIOS maps, we see a significant advantage of our method. The resolution in IIOS maps depends on the patch size and stride which makes it much more coarse than the resolution of the input image. In contrast, occluded pixels in ASOS must not be spatially connected but can cover unshaped and disjoint areas. Therefore, the size of the hypercube has no direct influence on the spatial resolution of ASOS maps. As a result, we do not only observe a lack in IIOS maps’ resolution but generally notice that small structures are hardly considered. As an example, the windmill blade in Figure 13 is not detected with IIOS and the villages’ borders in Figure 12 are not that sharply defined.

Sensitivity maps can highlight image areas as well as edges such as in Figure 12, where the edges of the windmill, as well as larger image areas of the dining room and the bedroom, are identified as important for the model.

We observe that the ASOS maps are robust regarding the size of the hypercube and that it does not directly influence the spatial resolution of the ASOS maps. However, the larger the hypercube, the more step-like transitions exist between sensitivity values in the ASOS maps. Apart from that, we do not observe any significant changes in the sensitivity maps.

Another advantage of ASOS maps is the comparability of sensitivity values. With ASOS, we do not determine sensitivities for a specific image but rather for groups of vectors within the whole training dataset. These sensitivities are then used for any other image, which makes them comparable to each other. This is not the case for IIOS maps, where sensitivities are independently determined for each image. The consequence is visible in Figure 12, year 2020. The anthropogenic structures in the valley exhibit lower sensitivity values than the more natural surroundings.

Once model and activation space sensitivities are fixed, the calculations of ASOS maps are computationally inexpensive. The input image passes the U-Net once, then the activation maps are transferred to sensitivity maps. In contrast, the prediction of IIOS maps is much more computationally expensive, since all occluded input images have to pass the neural network separately. The number of input images depends on the defined patch size and stride. On the upside, IIOS can be applied to any model whereas ASOS depends on a specific model architecture. This is where a big difference in terms of objective becomes apparent: IIOS can be used to understand the reasons for any model’s decision. In contrast, the aim of ASOS is to gain new scientific insights into a vaguely defined concept, such as wilderness. For this, it is essential, that ASOS maps are more detailed and robust and that sensitivities are transparent and interpretable due to the activation space. It is sufficient that this works only with certain architectures.

Lastly, we want to mention, that one has to be cautious not to overinterpret odd results. For example, we observe that remains of clouds, cloud shadows and shadows of mountains can cause strange behavior in sensitivity maps.

6.2 Experimental Setup

Number of activation map channels.

We present results obtained with activation map channels which results in a 3-dimensional activation space. However, one can decide on another number of activation map channels. If , the activation space can be visualized on a plane and if , it can be visualized as a histogram. We also train a model with and observe that the 2-dimensional activation space is similar to the projection of the 3-dimensional activation space onto a plane. Also, the sensitivity maps look similar. However, a consequence is that the land cover classes are no longer as well-separated in the 2-dimensional activation space.

One could probably also produce interesting results in high-dimensional activation spaces. A visualization must then be realized with a dimensionality reduction technique. However, the number of possible hypercube positions potentially grows with the number of dimensions, which makes the sensitivity analysis more computationally intensive.

Multi-class classification and ASOS.

In our research, we perform binary classification due to the fact that we have two contrary classes and our classifier predicts a single number. However, our methodology is not necessarily limited to binary classification problems and could be applied to classifiers predicting a -dimensional vector where is the number of classes. In that case, a sensitivity value would be also a vector with dimensions. Visualizing the activation space sensitivities, one would receive different visualizations for the activation space - one for each class. Likewise, one would receive sensitivity maps for each image.


The model architecture proposed in Section 4.1 can be designed in various ways. It is only required that the network consists of an image-to-image network and an image classifier. A U-Net is a common architecture for image-to-image applications and we decrease its number of layers and channels since it does not limit the accuracy scores. The classifier should have significantly fewer parameters than the image-to-image network so that the activation maps at the interface already contain high-level information. In regards to the classifier, we obtain distinct sensitivities when we perform convolutions with a kernel size of at least 5, increase the number of channels in the first layer, and do not use pooling layers. Apart from that, we obtain similar results for most architectures.

We observe that an omission of batch normalization results in a narrow and tube-like activation space. Large learning rates (1e-1) can result in high accuracies, but at the same time cause the vectors to be more spread in the activation space and to have lower sensitivities. Low learning rates (1e-3) can cause artifacts in the activation maps. The higher the weight decay, the more dense the vectors. If the weight decay is chosen too high, they can collapse to a very narrow and tube-like area. If it is chosen too low, they are more spread and values might be saturated.

6.3 Analyzing investigative samples

Hydroelectric power plant Letsi in Sweden.

The hydroelectric power plant Letsi in Sweden is located in an area barely inhabited by humans and is shown in Figure 8. Areas close to the power plant (center) are mainly highlighted as anthropogenic. The water reservoir (left of the power plant) and the other lake (bottom left) do not seem to contribute to the classification decision, as this is generally the case for most water bodies. Here, the model is not able to distinguish between the unnatural and natural lake. No sensitivities could be attributed to the river (starting right of the power plant) due to a low density of vectors within the activation space of the training data. Right above the river, we see deforestation areas. In the sensitivity map, these areas are much more extended. One deforestation area above the river is only highlighted in its surroundings. Here, not the area itself but the edges seem to trigger sensitivities due to the model. The power line (going centered from top to bottom) is not detected as anthropogenic. In contrast, the road along the bottom is well highlighted in its surroundings. However, smaller roads barely affect the sensitivity map. We observe similar results in other images as well. Dark violet highlighted areas are usually narrow and close to edges, whereas dark green highlighted areas tend to be larger and often occur centrally within areas sensitive to wilderness. This is also given in most other images.

Municipality Alvdal in Norway.

The villages of the municipality Alvda in Norway are located within the Østerdalen valley, which is shown in Figure 9. In the year 2017, we see that the model detects villages and agricultural areas as anthropogenic and all forests and rocky landscapes as wild. It is striking, that distinct regions in the sensitivity maps correspond to the land cover in the Sentinel-2 image. According to the global forest loss mapping by Hansen et al. (2013, there has been significant tree cover loss within the years 2017 to 2020. Deforestation areas can also be seen in the Sentinel-2 images. These land cover changes cause the model to predict a much larger anthropogenic area in the year 2020. We want to point out that both time points were presented independently to the model and no time series analysis was performed. In both images, we see that dark violet highlights are often textured whereas dark green highlights usually cover large areas.

North Ostrobothnia in Finland.

In North Ostrobothnia, a region in Finland, forestry and unused wetlands (according to LUCAS data by Eurostat, 2018) encounter and build an interesting landscape shown in Figure 10. Roads and a power line are not highlighted here, remaining cloud shadows (bottom right) are grayed out. The huge wetland area (top right), as well as many of the smaller wetland areas, are sensitive to wilderness. On the other hand, most but not all forests are detected as anthropogenic. The brownish fields in the lower part of the image are peat production areas. It is interesting that these areas are wild for the model despite human peat mining. We believe that this misconception occurs because there are too few or no peat production areas within our training data.

6.4 Conclusions in regards to wilderness

Training our model on the AnthroProtect dataset enables us to predict sensitivity maps of any region. In what follows, we discuss some explicit findings with respect to the concept of wilderness that we have obtained in this way.


We have reason to assume that our model is not sensitive to roads but the surrounding vegetation. This is based on the fact that both small and large roads, and both asphalted or gravel roads are sometimes recognized and sometimes not. Furthermore, we observe other linear structures like power lines not to be highlighted by the model. Vegetation next to roads could be affected by direct land use or indirect influences as pollutants due to construction, traffic or de-icing salt carried into surrounding soils. The ecological effects of roads and traffic are well reviewed by Spellerberg (1998). In case we do not misinterpret the sensitivity maps at this point, they show that some roads can strongly influence the adjacent vegetation.


We observe that our model is sensitive to deforestation areas. It is able to detect certain types of unnatural disruption even within small regions which in turn leads to fairly dramatic shifts in sensitivities. In doing so, the model is clearly able to distinguish between disrupted and natural bare soils. The model can react very strongly, as it does around Alvdal (Figure 9). It seems that not just the size but also the number of deforested areas has an impact on the sensitivity map: Many small areas become disproportionately sensitive. This may also allow conclusions about a minimum contiguous wild-like area needed to be graded as actually wild.

Sensitivity gradients.

We observe that in most cases dark violet highlighted areas are narrow and close to edges whereas dark green highlighted areas tend to be larger and occur centrally within areas sensitive to wilderness. This suggests that specific edges seem to be important anthropogenic characteristics whereas scale and uniformity are important factors of wilderness.

Measure of wilderness.

Our hypothesis that wilderness can be measured along a continuum as claimed by Dill (2021) and Alterra et al. (2013) is strengthened by the formation of sensitivities within the activation space (Figure 6). Although training the model begins with binary classification, the sensitivities create a more nuanced spectrum and are accordingly arranged within the activation space. However, the system is not categorizing spaces along with a linear comparison but in a more complex way, as the activation space has three dimensions, in our case. Future research could explore how to interpret this multi-dimensional relation between wild and anthropogenic characteristics. In a first step, the formation of vectors within the activation space could be analyzed for different model architectures, hyperparameters and activation space dimensions. There may be recurring patterns from which to derive ideas in regards to the concept of wilderness.

Type of human influence.

Forestry is conducted in North Ostrobothnia (Figure 10) as well as near the power plant Letsi (Figure 8) according to LUCAS data by Eurostat (2018). However, we see that our model reacts very differently to these forests: In North Ostrobothnia, forests are highlighted as anthropogenic whereas around Letsi, forests are highlighted as wild. Human influence seems to be of a different type in regards to the preservation of naturalness. In fact, we observe that trees in North Ostrobothnia are planted in distinct rows, different from trees near Letsi. It is of great importance that our model is able to recognize these differences in landscape management and it strengthens the hypothesis that wilderness and anthropogenic characteristics are not strictly binary but can appear together.


One limitation of our method is that we cannot detect anthropogenic influence on wildlife unless the affected wildlife influences the vegetation in a measurable amount. For example, habitat fragmentation due to roads is a particular problem for many species of wildlife. Literature examples are listed by Spellerberg (1998, Table 2). For a complete investigation of wilderness, considering both flora and fauna are important. Furthermore, plants obscured by larger plants cannot be detected using satellites. This especially occurs in forests where tree canopies hide mosses, grasses and other plants close to the ground. Human influence on such plants must affect the appearance of tree canopies enough to be measurable for satellites or we cannot detect such influences.

7 Conclusion

Because wilderness is a vague concept, utilizing satellite images to study regions in regards to wilderness is challenging. We argue that wilderness can be measured along a continuum, with certain extremes at both ends, and build the AnthroProtect dataset, which contains multispectral Sentinel-2 images of regions associated with wilderness and anthropogenic characteristics in Fennoscandia. Hereby, we refer less to the absence of humans but more to the aims and goals with which an area is managed and define two classes: 1) protected areas with the aim to preserve natural characteristics and processes, and 2) anthropogenic areas consisting of artificial and agricultural landscapes. The dataset is very extensive and consists of nearly 24,000 samples.

Using AnthroProtect, we train an interpretable-by-design model and provide a novel explainable ML methodology: Activation Space Occlusions Sensitivity. ASOS allows predicting sensitivities on a continuous scale, although it has been trained on data with binary classes. Applying ASOS we do not only obtain sensitivity maps but an activation space in which vectors are semantically arranged in regards to wild and anthropogenic characteristics and certain land cover classes. Our method is unique and outstanding in its results. The sensitivity maps are detailed, have a high-resolution and are robust in regards to hyperparameters. Sensitivities are comparable even among different data samples. ASOS is computationally cheap, once the model has been trained. We observe it to work very well on satellite images but it is not limited to that type of data.

With our method, we are capable of detecting deforestation areas and altered vegetation caused by roads. ASOS is further able to distinguish between different types of human influence. In regards to practical applications, our method has the potential to be used for monitoring and evaluating wilderness preservation using present satellite data.

Declaration of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


We acknowledge funding from the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety under grant no 67KI2043 (KISTE), the German Federal Ministry of Education and Research (BMBF) in the framework of the international future AI lab “AI4EO – Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond” (Grant number: 01DD20001) and the Alexander von Humboldt Foundation in the framework of the Alexander von Humboldt Professorship endowed by the Federal Ministry of Education and Research. This work has partially been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy, EXC-2070 - 390732324 - PhenoRob. In addition, we acknowledge funding from DFG as part of the project RO 4839/5-1 / SCHM 3322/4-1 - MapInWild.


ASOS Activation Space Occlusion Sensitivity
IIOS Input Image Occlusion Sensitivity
IUCN International Union of Conservation of Nature
ML machine learning
SDG Sustainable Development Goal
WDPA World Database on Protected Areas


  • Allan et al. (2017) Allan James R., Venter Oscar, Watson James E.M. Temporally inter-comparable maps of terrestrial wilderness and the Last of the Wild // Scientific Data. XII 2017. 4, 1. 170187.
  • Alterra et al. (2013) Alterra , Directorate-General for Environment (European Commission) , Eurosite , PAN Parks Foundation . Guidelines on wilderness in Natura 2000: Management of terrestrial wilderness and wild areas within the Natura 2000 network. LU: Publications Office, 2013.
  • Arino et al. (2012) Arino Olivier, Perez Jose Julio Ramos, Kalogirou Vasileios, Bontemps Sophie, Defourny Pierre, Bogaert Eric Van. Global Land Cover Map for 2009 (GlobCover 2009). 2012.
  • Astola et al. (2019) Astola Heikki, Häme Tuomas, Sirro Laura, Molinier Matthieu, Kilpi Jorma. Comparison of Sentinel-2 and Landsat 8 imagery for forest variable prediction in boreal region // Remote Sensing of Environment. III 2019. 223. 257–273.
  • Buchhorn et al. (2020) Buchhorn Marcel, Smets Bruno, Bertels Luc, Roo Bert De, Lesiv Myroslava, Tsendbazar Nandin-Erdene, Herold Martin, Fritz Steffen. Copernicus Global Land Service: Land Cover 100m: collection 3: epoch 2019: Globe. IX 2020. Version Number: V3.0.1.
  • Callicott (1998) Callicott J Baird. The Wilderness Idea Revisited: The Sustainable Development Alternative // The great new wilderness debate. Athens: The University of Georgia Press, 1998. 337–366.
  • Callicott, Nelson (1998) Callicott J Baird, Nelson Michael P. The great new wilderness debate. Athens: The University of Georgia Press, 1998.
  • Cronon (1995) Cronon William. The Trouble with Wilderness; or, Getting Back to the Wrong Nature // Uncommon Ground: Rethinking the Human Place in Nature. New York: W.W. Norton & Co., 1995. 69–90.
  • Dill (2021) Dill Kimberly M. In Defense of Wild Night // Ethics, Policy & Environment. IV 2021. 0. 1–25.
  • Dudley (2008) Dudley Nigel. Guidelines for applying protected area management categories. 2008.
  • Ester et al. (1996) Ester Martin, Kriegel Hans-Peter, Sander Jörg, Xu Xiaowei. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Second International Conference on Knowledge Discovery and Data Mining (KDD-96). 96. 1996. 226–231.
  • European Environment Agency (2018) European Environment Agency . Corine Land Cover (CLC) 2018, Version 2020_20u1. 2018.
  • Eurostat (2018) Eurostat . LUCAS 2018 v. 20190611. 2018.
  • Falcon, The PyTorch Lightning team (2019) Falcon William, The PyTorch Lightning team . PyTorch Lightning. III 2019.
  • Fisher et al. (2010) Fisher Mark, Carver Steve, Kun Zoltan, McMorran Rob, Arrell Katherine, Mitchell Gordon. Review of Status and Conservation of Wild Land in Europe. XI 2010. 193.
  • Foley et al. (2005) Foley Jonathan A., DeFries Ruth, Asner Gregory P., Barford Carol, Bonan Gordon, Carpenter Stephen R., Chapin F. Stuart, Coe Michael T., Daily Gretchen C., Gibbs Holly K., Helkowski Joseph H., Holloway Tracey, Howard Erica A., Kucharik Christopher J., Monfreda Chad, Patz Jonathan A., Prentice I. Colin, Ramankutty Navin, Snyder Peter K. Global Consequences of Land Use // Science. VII 2005. 309, 5734. 570–574.
  • Friedl, Sulla-Menashe (2019) Friedl Mark, Sulla-Menashe Damien. MCD12Q1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V006. 2019.
  • Gorelick et al. (2017) Gorelick Noel, Hancher Matt, Dixon Mike, Ilyushchenko Simon, Thau David, Moore Rebecca. Google Earth Engine: Planetary-scale geospatial analysis for everyone // Remote Sensing of Environment. XII 2017. 202. 18–27.
  • Hansen et al. (2013) Hansen M. C., Potapov P. V., Moore R., Hancher M., Turubanova S. A., Tyukavina A., Thau D., Stehman S. V., Goetz S. J., Loveland T. R., Kommareddy A., Egorov A., Chini L., Justice C. O., Townshend J. R. G. High-Resolution Global Maps of 21st-Century Forest Cover Change // Science. XI 2013. 342, 6160. 850–853.
  • Iqbal (2018) Iqbal Haris. HarisIqbal88/PlotNeuralNet v1.0.0. XII 2018.
  • Kakogeorgiou, Karantzalos (2021) Kakogeorgiou Ioannis, Karantzalos Konstantinos. Evaluating explainable artificial intelligence methods for multi-label deep learning classification tasks in remote sensing // International Journal of Applied Earth Observation and Geoinformation. XII 2021. 103. 102520.
  • Kirchhoff, Vicenzotti (2014) Kirchhoff Thomas, Vicenzotti Vera. A Historical and Systematic Survey of European Perceptions of Wilderness // Environmental Values. VIII 2014. 23, 4. 443–464.
  • Kouki et al. (2001) Kouki Jari, Löfman Satu, Martikainen Petri, Rouvinen Seppo, Uotila Anneli. Forest Fragmentation in Fennoscandia: Linking Habitat Requirements of Wood-associated Threatened Species to Landscape and Habitat Changes // Scandinavian Journal of Forest Research. I 2001. 16, sup003. 27–37.
  • Levering et al. (2020) Levering Alex, Marcos Diego, Lobry Sylvain, Tuia Devis. Interpretable Scenicness from Sentinel-2 Imagery // IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium. Waikoloa, HI, USA: IEEE, IX 2020. 3983–3986.
  • Lloyd (1982) Lloyd S. Least squares quantization in PCM // IEEE Transactions on Information Theory. III 1982. 28, 2. 129–137.
  • Ma et al. (2019) Ma Lei, Liu Yu, Zhang Xueliang, Ye Yuanxin, Yin Gaofei, Johnson Brian Alan. Deep learning in remote sensing applications: A meta-analysis and review // ISPRS Journal of Photogrammetry and Remote Sensing. VI 2019. 152. 166–177.
  • Marcos et al. (2019) Marcos Diego, Lobry Sylvain, Tuia Devis. Semantically Interpretable Activation Maps: what-where-how explanations within CNNs

    // 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE, X 2019. 4207–4215.

  • Nelson, Callicott (2008) Nelson Michael P, Callicott J Baird. The wilderness debate rages on: continuing the great new wilderness debate. Athens: University of Georgia Press, 2008.
  • Odena et al. (2016) Odena Augustus, Dumoulin Vincent, Olah Chris. Deconvolution and Checkerboard Artifacts // Distill. 2016.
  • Paszke et al. (2019) Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Kopf Andreas, Yang Edward, DeVito Zachary, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, Chintala Soumith. PyTorch: An Imperative Style, High-Performance Deep Learning Library // Advances in Neural Information Processing Systems. 32. 2019.
  • Pedregosa et al. (2011) Pedregosa Fabian, Varoquaux Gaël, Gramfort Alexandre, Michel Vincent, Thirion Bertrand, Grisel Olivier, Blondel Mathieu, Prettenhofer Peter, Weiss Ron, Dubourg Vincent, Vanderplas Jake, Passos Alexandre, Cournapeau David, Brucher Matthieu, Perrot Matthieu, Duchesnay Edouard. Scikit-learn: Machine Learning in Python // Journal of Machine Learning Research. 2011. 12, 85. 2825–2830.
  • Petsiuk et al. (2018) Petsiuk Vitali, Das Abir, Saenko Kate. RISE: Randomized Input Sampling for Explanation of Black-box Models // British Machine Vision Conference (BMVC). 2018.
  • Plotly Technologies Inc. (2015) Plotly Technologies Inc. . Collaborative data science. Montreal, QC, 2015.
  • Ribeiro et al. (2016) Ribeiro Marco Tulio, Singh Sameer, Guestrin Carlos. "Why Should I Trust You?": Explaining the Predictions of Any Classifier // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery, 2016. 1135–1144. (KDD ’16).
  • Ronneberger et al. (2015) Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-Net: Convolutional Networks for Biomedical Image Segmentation // Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Cham: Springer International Publishing, 2015. 234–241.
  • Roscher et al. (2020a) Roscher R., Bohn B., Duarte M. F., Garcke J. EXPLAIN IT TO ME – FACING REMOTE SENSING CHALLENGES IN THE BIO- AND GEOSCIENCES WITH EXPLAINABLE MACHINE LEARNING // ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. VIII 2020a. V-3-2020. 817–824.
  • Roscher et al. (2020b) Roscher Ribana, Bohn Bastian, Duarte Marco F., Garcke Jochen. Explainable Machine Learning for Scientific Insights and Discoveries // IEEE Access. 2020b. 8. 42200–42216.
  • Samek et al. (2021) Samek Wojciech, Montavon Gregoire, Lapuschkin Sebastian, Anders Christopher J., Muller Klaus-Robert. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications // Proceedings of the IEEE. III 2021. 109, 3. 247–278.
  • Sanderson et al. (2002) Sanderson Eric W., Jaiteh Malanding, Levy Marc A., Redford Kent H., Wannebo Antoinette V., Woolmer Gillian. The Human Footprint and the Last of the Wild // BioScience. X 2002. 52, 10. 891–904.
  • Selvaraju et al. (2017) Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, Batra Dhruv. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization // 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, X 2017. 618–626.
  • Smith, Topin (2019) Smith Leslie N., Topin Nicholay. Super-convergence: very fast training of neural networks using large learning rates // Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. 11006. 2019. 369 – 386.
  • Spellerberg (1998) Spellerberg Ian. Ecological effects of roads and traffic: a literature review: Ecological effects of roads // Global Ecology & Biogeography Letters. IX 1998. 7, 5. 317–333.
  • Steffen et al. (2011) Steffen Will, Persson Asa, Deutsch Lisa, Zalasiewicz Jan, Williams Mark, Richardson Katherine, Crumley Carole, Crutzen Paul, Folke Carl, Gordon Line, Molina Mario, Ramanathan Veerabhadran, Rockström Johan, Scheffer Marten, Schellnhuber Hans Joachim, Svedin Uno. The Anthropocene: From Global Change to Planetary Stewardship // AMBIO. XI 2011. 40, 7. 739–761.
  • Stomberg et al. (2021) Stomberg T., Weber I., Schmitt M., Roscher R. JUNGLE-NET: USING EXPLAINABLE MACHINE LEARNING TO GAIN NEW INSIGHTS INTO THE APPEARANCE OF WILDERNESS IN SATELLITE IMAGERY // ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. VI 2021. V-3-2021. 317–324.
  • Sturmfels et al. (2020) Sturmfels Pascal, Lundberg Scott, Lee Su-In. Visualizing the Impact of Feature Attribution Baselines // Distill. I 2020. 5, 1. e22.
  • UNEP-WCMC, IUCN (2021) UNEP-WCMC , IUCN . Protected Planet: The World Database on Protected Areas (WDPA) [Online], [September 2021]. Cambridge, UK, 2021.
  • United Nations (2021) United Nations . The Sustainable Development Goals Report. 2021.
  • Vogel (2015) Vogel Steven. Thinking like a Mall: Environmental Philosophy after the End of Nature. Cambridge, MA, USA: MIT Press, V 2015.
  • Wang et al. (2020) Wang Yeqiao, Lu Zhong, Sheng Yongwei, Zhou Yuyu. Remote Sensing Applications in Monitoring of Protected Areas // Remote Sensing. IV 2020. 12, 9. 1370.
  • Yun et al. (2019) Yun Sangdoo, Han Dongyoon, Chun Sanghyuk, Oh Seong Joon, Yoo Youngjoon, Choe Junsuk. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, X 2019. 6022–6031.
  • Zeiler, Fergus (2014) Zeiler Matthew D., Fergus Rob. Visualizing and Understanding Convolutional Networks // Computer Vision – ECCV 2014. Cham: Springer International Publishing, 2014. 818–833. (Lecture Notes in Computer Science).
  • Zhou et al. (2018) Zhou Bolei, Lapedriza Agata, Khosla Aditya, Oliva Aude, Torralba Antonio. Places: A 10 Million Image Database for Scene Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence. VI 2018. 40, 6. 1452–1464.
  • Östlund et al. (1997) Östlund L, Zackrisson O, Axelsson A L. The history and transformation of a Scandinavian boreal forest landscape since the 19th century // Canadian Journal of Forest Research. 1997. 27, 8. 1198–1206.