1 Informal Settlements
Inhabitants have no security of tenure vis-à-vis the land or dwellings they inhabit, with modalities ranging from squatting to informal rental housing.
The neighborhoods usually lack, or are cut off from, basic services and city infrastructure.
The housing may not comply with current planning and building regulations, and is often situated in geographically and environmentally hazardous areas.
Typically, those that live in informal settlements are the most vulnerable in society, subject to harsh social and economic constraints WEKESA2011238 . Although informal settlements are well studied in the humanities and remote sensing communities united2012state ; huchzermeyer2006informal ; hofmann2008detecting , in machine learning little has been done to map informal settlements using high-resolution satellite imagery mahabir2018critical ; mboga2017detection ; varshney2015targeting and as far as we know nothing has been done using low-resolution imagery. Being able to map and locate these settlements would give organizations such as UNICEF and other UN organizations the ability to provide effective social and economic aid pais2002poverty .
Low-resolution Sentinel-2 Satellite Data
Each pixel in a Sentinel-2 image contains a thirteen dimensional feature vector. This feature vector includes, besides the usual RGB bands, additional ten bands that are acquired at different wavelengths throughout the visible (VIS) and near-infrared (NIR) spectral range. These pixels are of a 10resolution, which means that each pixel represents a surface. Thus, there is a lot of contextual information contained within one pixel. By observing the spectral signal, which provides us with the chemical composition of a pixel, we can extract this contextual information. See Figure 1.
We take advantage of rounds 1-6 of the open source database Afrobarometer Afro , which is a pan-African non-partisan research network that conducts socio-economic public surveys serving policy making. In partnership with US Global Development Lab, AIDdata, the Afrobarometer survey are mapped to specific villages and towns in 36 African nations providing hyperlocal time-series information. In particular, we have used Afrobarometer data as ground truth, as the dataset contains geo-located survey data, which asked What was the roof of the respondent’s home or shelter made of? Respondents could choose from the following options: 1) Metal, tin or zinc, 2) Tiles, 3) Shingles, 4) Thatch or grass, 5) Plastic sheets, 6) Asbestos, 7) Multiple materials, 8) Some other material, 9) Could not tell/could not see. It is important to note, that whilst this data should be seen as ground truth, it itself is noisy. This is due to two reasons. 1) Even though the data is geo-located the location provided did not, on multiple occasions, align with any type of roof, or material other than vegetation. 2) Distortion in the geo-located coordinates is added to protect the privacy of the respondents. Because of this we use several pre-processing steps to remove any noisy data points, which reduced the number of classes to four.
Our approach uses domain knowledge about the types of materials used to build informal settlements. The freely available Afrobarometer Afro data and low-resolution Sentinel-2 Copernicussent2 satellite imagery is used to train a classifier to detect those type of materials.
Canonical Correlation Forests
are a decision tree ensemble method for classification and regression. CCFs are the state-of-the-art random forest technique, which have shown to achieve remarkable results for numerous regression and classification tasksrainforth2015canonical
. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Like most random forest based approaches, CCFs have very few hyper-parameters to tune and typically provide very good performance out of the box. The only parameter that has to be set is the number of trees,. For CCFs, setting provides a performance that is empirically equivalent to a random forest that has rainforth2015canonical , meaning CCFs have lower computational costs, whilst providing better classification. CCFs work by using canonical correlation analysis (CCA) and projection bootstrapping during the training of each tree, which projects the data into a space that maximally correlates the inputs with the outputs. This is particularly useful when we have small datasets, like in our case, as it reduces the amount of artificial randomness required to be added during the tree training procedure and improves the ensemble predictive performance rainforth2015canonical .
The computational efficiency aspects of CCFs and their suitability to both small and large datasets, makes them ideal for detecting informal settlements for three reasons. First, many of the organisations that we aim to help will not have access to a large amount of compute resources, therefore computational efficiency is important. Second, to run the CCFs for both training and prediction, all that has to be called is one function. This ensures that the end user does not need to be an expert in ensemble methods and makes the method akin to plug and play. Finally, our ground truth datasets are relatively small, for example, in the Afrobarometer data we have 11 data points per class for training. This means that we have to use the data as efficiently as possible, which CCFs allow us to do.
The classes that we choose to predict are the metal, shingles, thatch and environment class. Metal contains aluminum, zinc and tin signals. The shingles class contains asbestos, some types of metal and wood shingles. The thatch class contains a spectrum that is similar to that of dry vegetation and the environment class contains everything else.
We train only on the Sentinel-2 spectral signal extracted from the geo-located Afrobarometer data points regarding respondents roof type. Due to the inconsistency of the data, we have 11 Level-1C product spectral signals per class for training to ensure that the training data is balanced, as the largest class contains 373 annotations, whereas the lowest contained just 16 annotations. In order to validate our models we have to rely on the help of domain experts to verify our predictions, as we have no ground truth annotations regarding the spectral material of each pixel in our test images.
, the material prediction heuristically provides a useful model for detecting informal settlements within Mumbai and Cape Town by looking for known materials used in the construction of informal settlements. There are some clear mis-classifications, however, we hope that if we were to have larger amounts of training data, future models could be made more robust. It should also be noted that the ground truth maps are not entirely complete, especially for Cape Town. In order to develop a more formal way to compare results we are working with multiple partners to develop a larger dataset that consists of Sentinel-2 Level-1C and Level-2A spectral signals, with the corresponding material annotations. This would enable us to evaluate our predictions much more robustly without the need of an external expert.
Currently much of the work performed in this area requires multiple partnerships over varying disciplines and institutions, which makes it difficult to conduct research effectively. We are currently working on ways to make it easier for the machine learning community to participate within this area and related areas, by constructing datasets and highlighting socio-economic problems that need to be solved.
This project was executed during the Frontier Development Lab (FDL), Europe program, a partnership between the Phi-Lab at ESA, the Satellite Applications (SA) Catapult, Nvidia Corporation, Oxford University and Kellogg College. We gratefully acknowledge the support of Adrien Muller and Tom Jones of SA Catapult for their useful comments, providing VHR imagery and ground truth annotations for Nairobi. We thank UNICEF, in particular Do-Hyung Kim and Clara Palau Montava, for valuable discussions and AIData for access to geo-located Afrobarometer data. We thank Nvidia for computation resources. We thank Yarin Gal for his helpful comments. Patrick Helber was supported by the NVIDIA AI Lab program and the BMBF project DeFuseNN (Grant 01IW17002). Bradley Gram-Hansen was also supported by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems.
-  Afrobarometer. Round 6 survey manual. http://www.afrobarometer.org/sites/default/files/survey_manuals/ab_r6_survey_manual_en.pdf, 2014. Accessed : 2018-08-21.
-  Copernicus. http://www.copernicus.eu/, 2018. Accessed : 2018-08-27.
-  Peter Hofmann, Josef Strobl, Thomas Blaschke, and Hermann Kux. Detecting informal settlements from quickbird data in rio de janeiro using an object based approach. In Object-based image analysis, pages 531–553. Springer, 2008.
-  Marie Huchzermeyer. Informal settlements: A perpetual challenge? Juta and Company Ltd, 2006.
-  Ron Mahabir, Arie Croitoru, Andrew T Crooks, Peggy Agouris, and Anthony Stefanidis. A critical review of high and very high-resolution remote sensing approaches for detecting and mapping slums: Trends, challenges and emerging opportunities. Urban Science, 2(1):8, 2018.
Nicholus Mboga, Claudio Persello, John Ray Bergado, and Alfred Stein.
Detection of informal settlements from vhr images using convolutional neural networks.Remote sensing, 9(11):1106, 2017.
-  OECD. OECD Glossary of Statistical Terms. OECD glossaries OECD glossary of statistical terms. OECD Publishing, 2008.
-  Marta Santos Pais. Poverty and exclusion among urban children. 2002.
-  Tom Rainforth and Frank Wood. Canonical correlation forests. arXiv preprint arXiv:1507.05444, 2015.
-  United Nations. State of the World’s Cities 2012-2013: Prosperity of Cities. United Nations Publications, 2012.
-  Kush R Varshney, George H Chen, Brian Abelson, Kendall Nowocin, Vivek Sakhrani, Ling Xu, and Brian L Spatocco. Targeting villages for rural development using satellite image analysis. Big Data, 3(1):41–53, 2015.
-  B.W. Wekesa, G.S. Steyn, and F.A.O. (Fred) Otieno. A review of physical and socio-economic characteristics and intervention approaches of informal settlements. Habitat International, 35(2):238 – 245, 2011.