Generating Material Maps to Map Informal Settlements

Detecting and mapping informal settlements encompasses several of the United Nations sustainable development goals. This is because informal settlements are home to the most socially and economically vulnerable people on the planet. Thus, understanding where these settlements are is of paramount importance to both government and non-government organizations (NGOs), such as the United Nations Children's Fund (UNICEF), who can use this information to deliver effective social and economic aid. We propose a method that detects and maps the locations of informal settlements using only freely available, Sentinel-2 low-resolution satellite spectral data and socio-economic data. This is in contrast to previous studies that only use costly very-high resolution (VHR) satellite and aerial imagery. We show how we can detect informal settlements by combining both domain knowledge and machine learning techniques, to build a classifier that looks for known roofing materials used in informal settlements. Please find additional material at


page 2

page 3


Mapping Informal Settlements in Developing Countries with Multi-resolution, Multi-spectral Data

Detecting and mapping informal settlements encompasses several of the Un...

On monitoring development using high resolution satellite images

We develop a machine learning based tool for accurate prediction of deve...

Mapping Informal Settlements in Developing Countries using Machine Learning and Low Resolution Multi-spectral Data

Informal settlements are home to the most socially and economically vuln...

Interpretable Poverty Mapping using Social Media Data, Satellite Images, and Geospatial Information

Access to accurate, granular, and up-to-date poverty data is essential f...

Activation Regression for Continuous Domain Generalization with Applications to Crop Classification

Geographic variance in satellite imagery impacts the ability of machine ...

OGNet: Towards a Global Oil and Gas Infrastructure Database using Deep Learning on Remotely Sensed Imagery

At least a quarter of the warming that the Earth is experiencing today i...

1 Informal Settlements

The United Nations (UN) and the Organisation for Economic Co-operation and Development (OECD) state that informal settlements are defined as follows 2008oecd ; united2012state :

  1. Inhabitants have no security of tenure vis-à-vis the land or dwellings they inhabit, with modalities ranging from squatting to informal rental housing.

  2. The neighborhoods usually lack, or are cut off from, basic services and city infrastructure.

  3. The housing may not comply with current planning and building regulations, and is often situated in geographically and environmentally hazardous areas.

Typically, those that live in informal settlements are the most vulnerable in society, subject to harsh social and economic constraints WEKESA2011238 . Although informal settlements are well studied in the humanities and remote sensing communities united2012state ; huchzermeyer2006informal ; hofmann2008detecting , in machine learning little has been done to map informal settlements using high-resolution satellite imagery mahabir2018critical ; mboga2017detection ; varshney2015targeting and as far as we know nothing has been done using low-resolution imagery. Being able to map and locate these settlements would give organizations such as UNICEF and other UN organizations the ability to provide effective social and economic aid pais2002poverty .

Data Sources

Figure 1: Two images of the same informal settlement in Kibera, representing the difference between high and low resolution imagery. Left: A DigitalGlobe 30 resolution VHR image, meaning each pixel represents a 30 area. Right: The Sentinel-2 10 resolution image, that is, each pixel represents a 10 area.

Low-resolution Sentinel-2 Satellite Data

Each pixel in a Sentinel-2 image contains a thirteen dimensional feature vector. This feature vector includes, besides the usual RGB bands, additional ten bands that are acquired at different wavelengths throughout the visible (VIS) and near-infrared (NIR) spectral range. These pixels are of a 10

resolution, which means that each pixel represents a surface. Thus, there is a lot of contextual information contained within one pixel. By observing the spectral signal, which provides us with the chemical composition of a pixel, we can extract this contextual information. See Figure 1.

Figure 2: Material prediction of Mumbai with . Left: Sentinel-2 image. The area is densely populated and contains several informal settlements. Middle: Partially complete ground truth for the Informal settlements, white represents informal settlements and black represents environment. The Southern right hand corner is not included in this mask. Right: A CCF prediction on the Sentinel-2 L1C spectral data for determining what materials are present. Black is environment, yellow is metal, blue is shingles and red is thatch.

Afrobarometer Data

We take advantage of rounds 1-6 of the open source database Afrobarometer Afro , which is a pan-African non-partisan research network that conducts socio-economic public surveys serving policy making. In partnership with US Global Development Lab, AIDdata, the Afrobarometer survey are mapped to specific villages and towns in 36 African nations providing hyperlocal time-series information. In particular, we have used Afrobarometer data as ground truth, as the dataset contains geo-located survey data, which asked What was the roof of the respondent’s home or shelter made of? Respondents could choose from the following options: 1) Metal, tin or zinc, 2) Tiles, 3) Shingles, 4) Thatch or grass, 5) Plastic sheets, 6) Asbestos, 7) Multiple materials, 8) Some other material, 9) Could not tell/could not see. It is important to note, that whilst this data should be seen as ground truth, it itself is noisy. This is due to two reasons. 1) Even though the data is geo-located the location provided did not, on multiple occasions, align with any type of roof, or material other than vegetation. 2) Distortion in the geo-located coordinates is added to protect the privacy of the respondents. Because of this we use several pre-processing steps to remove any noisy data points, which reduced the number of classes to four.

2 Method

Our approach uses domain knowledge about the types of materials used to build informal settlements. The freely available Afrobarometer Afro data and low-resolution Sentinel-2 Copernicussent2 satellite imagery is used to train a classifier to detect those type of materials.

Canonical Correlation Forests

(CCFs) rainforth2015canonical

are a decision tree ensemble method for classification and regression. CCFs are the state-of-the-art random forest technique, which have shown to achieve remarkable results for numerous regression and classification tasks 


. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Like most random forest based approaches, CCFs have very few hyper-parameters to tune and typically provide very good performance out of the box. The only parameter that has to be set is the number of trees,

. For CCFs, setting provides a performance that is empirically equivalent to a random forest that has  rainforth2015canonical , meaning CCFs have lower computational costs, whilst providing better classification. CCFs work by using canonical correlation analysis (CCA) and projection bootstrapping during the training of each tree, which projects the data into a space that maximally correlates the inputs with the outputs. This is particularly useful when we have small datasets, like in our case, as it reduces the amount of artificial randomness required to be added during the tree training procedure and improves the ensemble predictive performance rainforth2015canonical .

The computational efficiency aspects of CCFs and their suitability to both small and large datasets, makes them ideal for detecting informal settlements for three reasons. First, many of the organisations that we aim to help will not have access to a large amount of compute resources, therefore computational efficiency is important. Second, to run the CCFs for both training and prediction, all that has to be called is one function. This ensures that the end user does not need to be an expert in ensemble methods and makes the method akin to plug and play. Finally, our ground truth datasets are relatively small, for example, in the Afrobarometer data we have 11 data points per class for training. This means that we have to use the data as efficiently as possible, which CCFs allow us to do.

3 Results

Figure 3: Material prediction of Captetown with . Top: Sentinel-2 image of Capetown. Middle: Ground truth is 35% complete. White represents an informal settlement and black represents environment. Bottom: A CCF material prediction on the Sentinel-2 L1C spectral data.

Experimental Setup

The classes that we choose to predict are the metal, shingles, thatch and environment class. Metal contains aluminum, zinc and tin signals. The shingles class contains asbestos, some types of metal and wood shingles. The thatch class contains a spectrum that is similar to that of dry vegetation and the environment class contains everything else.

We train only on the Sentinel-2 spectral signal extracted from the geo-located Afrobarometer data points regarding respondents roof type. Due to the inconsistency of the data, we have 11 Level-1C product spectral signals per class for training to ensure that the training data is balanced, as the largest class contains 373 annotations, whereas the lowest contained just 16 annotations. In order to validate our models we have to rely on the help of domain experts to verify our predictions, as we have no ground truth annotations regarding the spectral material of each pixel in our test images.

As can be seen in Figures 2 and 3

, the material prediction heuristically provides a useful model for detecting informal settlements within Mumbai and Cape Town by looking for known materials used in the construction of informal settlements. There are some clear mis-classifications, however, we hope that if we were to have larger amounts of training data, future models could be made more robust. It should also be noted that the ground truth maps are not entirely complete, especially for Cape Town. In order to develop a more formal way to compare results we are working with multiple partners to develop a larger dataset that consists of Sentinel-2 Level-1C and Level-2A spectral signals, with the corresponding material annotations. This would enable us to evaluate our predictions much more robustly without the need of an external expert.

Future Work

Currently much of the work performed in this area requires multiple partnerships over varying disciplines and institutions, which makes it difficult to conduct research effectively. We are currently working on ways to make it easier for the machine learning community to participate within this area and related areas, by constructing datasets and highlighting socio-economic problems that need to be solved.


This project was executed during the Frontier Development Lab (FDL), Europe program, a partnership between the Phi-Lab at ESA, the Satellite Applications (SA) Catapult, Nvidia Corporation, Oxford University and Kellogg College. We gratefully acknowledge the support of Adrien Muller and Tom Jones of SA Catapult for their useful comments, providing VHR imagery and ground truth annotations for Nairobi. We thank UNICEF, in particular Do-Hyung Kim and Clara Palau Montava, for valuable discussions and AIData for access to geo-located Afrobarometer data. We thank Nvidia for computation resources. We thank Yarin Gal for his helpful comments. Patrick Helber was supported by the NVIDIA AI Lab program and the BMBF project DeFuseNN (Grant 01IW17002). Bradley Gram-Hansen was also supported by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems.


  • [1] Afrobarometer. Round 6 survey manual., 2014. Accessed : 2018-08-21.
  • [2] Copernicus., 2018. Accessed : 2018-08-27.
  • [3] Peter Hofmann, Josef Strobl, Thomas Blaschke, and Hermann Kux. Detecting informal settlements from quickbird data in rio de janeiro using an object based approach. In Object-based image analysis, pages 531–553. Springer, 2008.
  • [4] Marie Huchzermeyer. Informal settlements: A perpetual challenge? Juta and Company Ltd, 2006.
  • [5] Ron Mahabir, Arie Croitoru, Andrew T Crooks, Peggy Agouris, and Anthony Stefanidis. A critical review of high and very high-resolution remote sensing approaches for detecting and mapping slums: Trends, challenges and emerging opportunities. Urban Science, 2(1):8, 2018.
  • [6] Nicholus Mboga, Claudio Persello, John Ray Bergado, and Alfred Stein.

    Detection of informal settlements from vhr images using convolutional neural networks.

    Remote sensing, 9(11):1106, 2017.
  • [7] OECD. OECD Glossary of Statistical Terms. OECD glossaries OECD glossary of statistical terms. OECD Publishing, 2008.
  • [8] Marta Santos Pais. Poverty and exclusion among urban children. 2002.
  • [9] Tom Rainforth and Frank Wood. Canonical correlation forests. arXiv preprint arXiv:1507.05444, 2015.
  • [10] United Nations. State of the World’s Cities 2012-2013: Prosperity of Cities. United Nations Publications, 2012.
  • [11] Kush R Varshney, George H Chen, Brian Abelson, Kendall Nowocin, Vivek Sakhrani, Ling Xu, and Brian L Spatocco. Targeting villages for rural development using satellite image analysis. Big Data, 3(1):41–53, 2015.
  • [12] B.W. Wekesa, G.S. Steyn, and F.A.O. (Fred) Otieno. A review of physical and socio-economic characteristics and intervention approaches of informal settlements. Habitat International, 35(2):238 – 245, 2011.