Mapping Informal Settlements in Developing Countries with Multi-resolution, Multi-spectral Data

11/30/2018 ∙ by Patrick Helber, et al. ∙ DLR University of Oxford DFKI GmbH King's College London, a.s. 0

Detecting and mapping informal settlements encompasses several of the United Nations sustainable development goals. This is because informal settlements are home to the most socially and economically vulnerable people on the planet. Thus, understanding where these settlements are is of paramount importance to both government and non-government organizations (NGOs), such as the United Nations Children's Fund (UNICEF), who can use this information to deliver effective social and economic aid. We propose two effective methods for detecting and mapping the locations of informal settlements. One uses only low-resolution (LR), freely available, Sentinel-2 multispectral satellite imagery with noisy annotations, whilst the other is a deep learning approach that uses only costly very-high-resolution (VHR) satellite imagery. To our knowledge, we are the first to map informal settlements successfully with low-resolution satellite imagery. We extensively evaluate and compare the proposed methods. Please find additional material at



There are no comments yet.


page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The United Nations (UN) state that informal settlements are defined as follows 2008oecd ; united2012state :

  1. Inhabitants have no security of tenure vis-à-vis the land or dwellings they inhabit, with modalities ranging from squatting to informal rental housing.

  2. The neighborhoods usually lack, or are cut off from, basic services and city infrastructure.

  3. The housing may not comply with current planning and building regulations, and is often situated in geographically and environmentally hazardous areas.

Slums are the most deprived and excluded form of informal settlements. They can be characterized by poverty and large agglomerations of dilapidated housing, located in the most hazardous urban land, near industries and dump sites, in swamps, degraded soils and flood-prone zones Kohli2016 . In addition to tenure insecurity, slum dwellers lack basic infrastructure and services, public space and green areas, and are constantly exposed to eviction, disease and violence Sclar2005 . The ability to map and locate these settlements would give organizations such as UNICEF and other related organizations the ability to provide effective social and economic aid pais2002poverty .

Figure 1: Two images of the same informal settlement in Kibera, Nairobi representing the difference between high and low resolution imagery. Left: The Sentinel-2 10m resolution image. Right: A DigitalGlobe 30cm very-high-resolution image. Also, a detailed view of a part of the very-high-resolution image is shown.

1.1 Data

In this paper, we have annotated low-resolution (LR) and very-high-resolution (VHR) satellite imagery for the locations of informal settlements in parts of Kenya, South Africa, Nigeria, Sudan, Colombia and Mumbai.

Sentinel-2 Multi-spectral Satellite Data

The Sentinel-2 mission is part of the Copernicus programme, a global earth observation service. The Sentinel-2 satellites map the entire global land mass on average every 5 days at various resolutions of 10 to 60 . Sentinel-2 provides multiple-resolution, multi-spectral imagery using a multi-spectral instrument (MSI). MSI measures top of the atmosphere radiances in 13 spectral bands covering the visible, near infrared and the shortwave infrared part of the electromagnetic spectrum at different spatial resolution depending upon the particular band sent2userhandbook ; Zhang2017 . We only use 10 bands due to atmospheric distortions. The resolution of up to 10 denotes that each pixel represents a surface, which means that there is a certain amount of contextual information contained within one pixel. By observing the spectral signal, which provides us with the chemical composition of a pixel, we can extract this contextual information.

Very-High-Resolution Satellite Images

In addition to freely available multi-spectral low-resolution satellite images, we use very-high-resolution images with a resolution of up to 30 per pixel, kindly provided by DigitalGlobe through the Satellite Applications Catapult. See Figure 1 to view the difference in the resolution between Sentinel-2 and very-high-resolution imagery.

2 Methods

Our first method

trains a classifier to learn what the spectral signal of an informal settlement is, using low-resolution freely available spectral data. To do this, we employ a pixel-wise classification, where the classifier learns whether or not a 10-band spectra is associated to an informal settlement or the environment, which encompasses everything that is not an informal settlement.

When we require finer grained features, such as the roof size, or the density of the surrounding settlements to determine whether or not there exists an informal settlement, we demonstrate our second method

, which trains a state-of-the-art semantic segmentation deep neural network that uses very-high-resolution satellite imagery. This is crucial when informal settlements do not have unique spectra when compared to the environment, like those in Al Geneina, Sudan, see Figure 


Pixel-wise Classification with Canonical Correlation Forest

Canonical Correlation Forests (CCFs) rainforth2015canonical

are a decision tree ensemble method for classification and regression. CCFs are the state-of-the-art random forest technique, which have shown to achieve remarkable results for numerous regression and classification tasks 


. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Like most random forest based approaches, CCFs have very few hyper-parameters to tune and typically provide very good performance out of the box. The only parameter that has to be set is the number of trees,

. For CCFs, setting provides a performance that is empirically equivalent to a random forest that has  rainforth2015canonical , meaning CCFs have lower computational costs, whilst providing better classification. CCFs work by using canonical correlation analysis (CCA) and projection bootstrapping during the training of each tree, which projects the data into a space that maximally correlates the inputs with the outputs. This is particularly useful when we have small datasets, as for our case, because it reduces the amount of artificial randomness required to be added during the tree training procedure and improves the ensemble predictive performance rainforth2015canonical .

The computational efficiency aspects of CCFs and their suitability to both small and large datasets, makes them ideal for detecting informal settlements as many of the organisations that we aim to help will not have access to a large amount of compute resources, therefore computational efficiency is important. If the access to very-high-resolution imagery and the computational costs are not a restriction, we can employ a deep learning approach using convolution neural networks (CNN) to detect informal settlements.

Contextual Classification with CNNs

Since informal settlements can also be classified by the rooftop size and the surrounding building density, we employ a state-of-the-art semantic segmentation neural network on optical (RGB) very-high-resolution satellite imagery to detect these contextual features. Contextual features are important when it is not possible to distinguish informal settlements from the environment by spectral signal in the same region. An example of such an informal settlement is shown in Figure 3. We see that the informal settlements in a rural region of Al Geneina, Sudan have a very low building density, and also the roof tops of both formal and informal settlements are built out of concrete, meaning they have the same spectral signal. This is in contrast to the dense slums in Nairobi and Mumbai.

Figure 3: Very-high-resolution images provided by DigitalGlobe comparing an informal, left and formal settlement, right, in Al Geneina, Sudan. The main distinguishing feature is the wider contextual information, as the material spectrums are the same.
Figure 2: Predictions of informal settlements (white pixels) in Kibera, Nairobi. Left: The CCF prediction of informal settlements in Kibera on low-resolution Sentinel-2 spectral imagery. Middle: CNN based prediction of informal settlements in Kibera, trained on very-high-resolution imagery. Right: The ground truth informal settlement mask for Kibera.

Encoder-Decoder with Atrous Separable Convolution

We use the DeepLabv3+ encoder-decoder architecture. DeepLabv3+  deeplabv3plus2018 is a deep CNN that extends the prior DeepLabv3 network with a decoder module to refine the segmentation results of the previous encoder-decoder architecture particularly at the object boarders. The DeepLab architecture uses Atrous Spatial Pyramid Pooling (ASPP) with atrous convolutions to explicitly control the resolution at which feature responses are computed within the CNN. This ASPP module is augmented with image level features to capture longer range information. We use a Xception 65 network backbone in the encoder-decoder architecture. The beneficial use of this Xception model together with applying depth wise separable convolution to ASPP and the decoder modules have been shown in deeplabv3plus2018 .

3 Results

Experimental Setup

For each region we have a 10 low-resolution Sentinel-2 image, the corresponding very-high-resolution 30-50

resolution image and the ground truth annotations. When training and validating a model on the same region we use a 80-20 split. We ensure that each class contains the same number of points, we then randomly sample 80% of each class to generate the training data and then use the remaining 20% of each class to construct our test set, which is comprised of a different set of points. We then center the training data (testing data accordingly) to have a mean of zero and standard deviation of one. We set the

for training the CCF. For validating our methods we report both pixel accuracy, and mean intersection over union (mIoU).

We provide a comparison of both the pixel-wise classification with CCFs and the contextual classification with CNNs for the detection and mapping of informal settlements, see on the left side of Table 1. The CCFs trained solely on freely available and easily accessible low-resolution data perform well, although they are unable to match the performance of the CNN trained on very-high-resolution imagery, except for Kibera. Figure 2 shows the predictions of both methods and the ground truth annotations. Despite having access to very-high-resolution data, the CNN still manages to miss-classify structural elements of the informal settlements in Kibera. Whereas the CCF, although more granular, incorporates the full structure of the informal settlement in Kibera via only the spectral information.


To demonstrate the adaptability of our approach we train each model on different parts of the world and use that model to perform predictions on other unseen regions across the globe. For this paper we train two models, one on Northern Nairobi, Kenya and another on Medellin, Colombia. The results can be found in the right-half of Table 1

. Even though we only have a small amount of data, we are able to demonstrate that our models can generalize moderately well, even with data that is noisy and partially incomplete. This opens a new challenge for transfer learning.

Pixel Acc. Mean IOU Pixel Acc. Mean IOU
Kenya, Northern Nairobi 69.4 93.1 62.0 80.8 69.4 55.0 62.0 54.4
Kenya, Kibera 69.0 78.2 73.3 65.5 67.3 63.8 54.1 56.0
South Africa, Capetown* 92.0 - 33.2 - 41.3 71.5 43.1 32.0
Sudan, El Daien 78.0 86.0 61.3 73.4 14.2 1.1 37.9 34.0
Sudan, Al Geneina 83.2 89.2 35.7 76.3 27.1 6.0 34.9 41.0
Nigeria, Makoko* 76.2 87.4 59.9 74.0 59.0 77.0 37.8 34.6
Colombia, Medellin* 84.2 95.3 74.0 83.0 65.0 84.2 46.9 74.0
India, Mumbai* 97.0 - 40.3 - 37.9 63.0 32.4 34.4
Table 1: Left-half: Pixel accuracy and mean IOU (%) results for informal settlement detection using the CCF pixel-wise classification and the contextual classification with CNNs. CCFs are trained and tested on low-resolution imagery, CNNs are trained and tested on very-high-resolution imagery. Right-half: Using the CCFs we train a model on Northern Nairobi (NN) and a model trained on Medellin (M), then predict on all regions. *ground truth annotations are less than 75% complete for the region.


In this work we have provided benchmarks for detecting informal settlements and have proposed two different methods for detecting informal settlements. The first method uses computationally efficient CCFs to learn the spectral signal of informal settlements from multispectral low-resolution satellite imagery. The second trains a CNN with very-high-resolution satellite imagery to extract finer grained features. We extensively evaluated the proposed methods and demonstrated the generalization capabilities of the computationally efficient CCFs to detect informal settlements globally. In particular, to the best of our knowledge, we demonstrated for the first time that informal settlements can be detected effectively using only freely and openly accessible multi-spectral low-resolution satellite imagery.


This project was executed during the Frontier Development Lab (FDL), Europe program, a partnership between the Phi-Lab at ESA, the Satellite Applications (SA) Catapult, Nvidia Corporation, Oxford University and Kellogg College. We gratefully acknowledge the support of Adrien Muller and Tom Jones of SA Catapult for their useful comments, providing very-high-resolution imagery and ground truth annotations for Nairobi. We thank UNICEF, in particular Do-Hyung Kim and Clara Palau Montava, for valuable discussions and AIData for access to geo-located Afrobarometer data. We thank Nvidia for computation resources. We thank Yarin Gal for his helpful comments. Patrick Helber was supported by the NVIDIA AI Lab program and the BMBF project DeFuseNN (Grant 01IW17002). Bradley Gram-Hansen was also supported by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems.


  • [1] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  • [2] European Space Agency., 2018. Accessed : 2018-08-27.
  • [3] Divyani Kohli, Richard Sliuzas, and Alfred Stein. Urban slum detection using texture and spatial metrics derived from satellite imagery. Journal of Spatial Science, 61(2):405–426, 2016.
  • [4] OECD. OECD Glossary of Statistical Terms. OECD glossaries OECD glossary of statistical terms. OECD Publishing, 2008.
  • [5] Marta Santos Pais. Poverty and exclusion among urban children. 2002.
  • [6] Tom Rainforth and Frank Wood. Canonical correlation forests. arXiv preprint arXiv:1507.05444, 2015.
  • [7] Elliott D. Sclar, Pietro Garau, and Gabriella Carolini. The 21st century health challenge of slums and cities. The Lancet, 365(9462):901–903, Mar 2005.
  • [8] United Nations. State of the World’s Cities 2012-2013: Prosperity of Cities. United Nations Publications, 2012.
  • [9] Tianxiang Zhang, Jinya Su, Cunjia Liu, Wen-Hua Chen, Hui Liu, and Guohai Liu. Band selection in sentinel-2 satellite for agriculture applications. 2017 23rd International Conference on Automation and Computing (ICAC), pages 1–6, 2017.