A Dynamic Network and Representation LearningApproach for Quantifying Economic Growth fromSatellite Imagery

12/01/2018
by   Jiqian Dong, et al.
Carnegie Mellon University
0

Quantifying the improvement in human living standard, as well as the city growth in developing countries, is a challenging problem due to the lack of reliable economic data. Therefore, there is a fundamental need for alternate, largely unsupervised, computational methods that can estimate the economic conditions in the developing regions. To this end, we propose a new network science- and representation learning-based approach that can quantify economic indicators and visualize the growth of various regions. More precisely, we first create a dynamic network drawn out of high-resolution nightlight satellite images. We then demonstrate that using representation learning to mine the resulting network, our proposed approach can accurately predict spatial gross economic expenditures over large regions. Our method, which requires only nightlight images and limited survey data, can capture city-growth, as well as how people's living standard is changing; this can ultimately facilitate the decision makers' understanding of growth without heavily relying on expensive and time-consuming surveys.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/10/2017

Poverty Prediction with Public Landsat 7 Satellite Imagery and Machine Learning

Obtaining detailed and reliable data about local economic livelihoods in...
09/11/2020

Object Recognition for Economic Development from Daytime Satellite Imagery

Reliable data about the stock of physical capital and infrastructure in ...
12/18/2019

Lightweight and Robust Representation of Economic Scales from Satellite Imagery

Satellite imagery has long been an attractive data source that provides ...
12/06/2017

On monitoring development using high resolution satellite images

We develop a machine learning based tool for accurate prediction of deve...
11/25/2020

Assessing the Quality of Gridded Population Data for Quantifying the Population Living in Deprived Communities

Over a billion people live in slums in settlements that are often locate...
05/03/2022

Learning Economic Indicators by Aggregating Multi-Level Geospatial Information

High-resolution daytime satellite imagery has become a promising source ...
10/17/2017

Preliminary steps toward a universal economic dynamics for monetary and fiscal policy

We consider the relationship between economic activity and intervention,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ongoing big data and machine learning revolution has greatly contributed to quantifying economic development. However, although massive data is available for developed countries, the developing countries still suffer from the lack of reliable economic data. This polarization also exists in the quality of data since the recorded data in developing countries has been known to be unstructured and inaccurate

baddata15 , thereby making the useful data even more scarce. All these challenges have greatly hindered the accurate modeling of economic conditions in the developing world.

Knowing how people live and the rate of improvement of their living standard are fundamental components of many decision making processes such as urban development, policy effectiveness evaluation, and conservation planning, etc. However, the lack of sufficient high-quality data in developing countries can result in poor decisions which can in turn lead to bad development policies. On the other hand, apart from this low-quality economic data obtained from time- and labor-intensive surveys, the massive amount of high-resolution nightlight satellite images forms a more reliable source of data for developing countries. Therefore, new machine learning-based methods that can more effectively exploit the power of nightlight satellite images to predict economic indicators and explicitly quantify the economic growth of different regions.

In this paper, we propose a new network science- and representation learning-based approach to explicitly capture the complex relationship between the economic growth of various regions, and also predict economic indicators over large regions. To this end, we propose a novel gravity-based model to create, for the very first time, a dynamic economic-growth network by using massive amount of nightlight satellite data. We also demonstrate that mining this network with representation learning techniques like node2vec node2vec-kdd2016 can help us accurately predict economic indicators over large regions.

Overall, we demonstrate that by using our proposed gravity model and publicly available satellite data, we can create large-scale economic-growth networks in a largely unsupervised manner (with the exception of fine-tuning a few hyper-parameters). Mining this network can give us features relevant to economic growth which can be further used for accurately predicting economic indicators. This way, we can clearly obtain socio-economic features for large regions without relying on expensive and time consuming surveys. Hence, our method can be used for large-scale economic modeling in the developing world.

2 Previous Work

A prior deep learning approach on nightlight satellite images for poverty prediction was proposed in

Jean790

. Since using a deep learning model directly with nightlight images requires a large data-set with millions of samples with labelled economic data, Jean et al. address this problem by applying a transfer learning technique with an ImageNet-pretrained deep network. As a result, feature embeddings from the CNN model can explain

of variation in local-level economic outcomes Jean790 ; Xie2016TransferLF . Other prior works extend such ideas to population estimation Robinson2017ADL and crop yield prediction Wang:2018:DTL:3209811.3212707 .

Although this prior art predicts poverty via satellite images, this prediction is mostly spatial in nature. Consequently, the temporal aspects of economic growth have been largely ignored in prior works. Moreover, most CNN models have an inherent problem of being non-interpretable. This is especially critical for applications where decisions must be made based on the machine learning model output (for instance, for urban planning, etc.). Specifically, using interpretable machine learning models can help to actually understand the underlying factors behind the model output in order to more accurately target such factors while making decisions.

In our work, we address both limitations above. Specifically, since we build economic-growth networks from the satellite data, we can analyze their properties like community structure newman2006modularity to understand which locations experience growth over time; this will result in a much more interpretable model than, say, the output of a CNN. Also, while previous work lacks temporal analysis and does not predict the future development trends, our dynamic network can be potentially used for spatio-temporal predictions (and not just spatial!). We show some preliminary results for accurate spatiotemporal predictions using our proposed approach.

3 Proposed Approach

Our approach can be divided in to three parts: network construction, network mining and regression.

3.1 Economic Gravity-Based Network Construction Model

Our gravitational model aims to predict the economic indicators. To achieve this, we divide the region of interest into smaller grids and treat each grid as a node in the network. For example, we consider economic prediction for Tanzania, and thus we divided the entire country into grids where each grid has an area of approximate km (km km). For each node (grid), we draw undirected edges to all the other nodes with a weight calculated with the following gravitational equation:

(1)

where, is the weight on the edge from node to , log is the total grid nightlight log-intensity for node , and is the normalized distance between each node (ranging from to ). Parameter controls the trade-off between forming links based on intensity vs distance.

After drawing edges for all nodes, the network is fully connected with total edges. Since there are many edges with low gravity weight, we set a threshold to remove low gravity links. Finally, to prevent nodes from getting disconnected from the network, we rewire the nodes with less than links to their nearest neighbors; this is similar to the - method introduced in bhardwaj2018dimensionality .

3.2 Network Mining

Intuitively, based on the gravity network, we can see that if a node (i.e., the location) is more developed, it will have a higher nightlight intensity and, hence, more links to nearby node; that is, if a community newman2006modularity of economically wealthy locations in the network grows over time, it is high likely that this region is experiencing rapid economic growth (see Fig. 1). These community-based features can be captured by existing representation learning frameworks like node2vec node2vec-kdd2016

. The algorithm is a biased random work on a graph, and the probability to travel between two nodes is a function edge-weight between them. Two adjustable parameters,

and are introduced representing priority of the walk to trade-off between exploration and exploitation. Here we choose the in node2vec implementation, smaller value of is emphasized to prefer exploitation over exploration.

Figure 1: (a)Random walk starts from the hub node, both search the within neighborhood and between community. (b)Random work starts from the non-hub node, mainly explore the neighborhood within community. (c),(d)Dynamic network evolution: Initially, there is a community A of rich locations; nearby locations that are not as rich form a separate community B. In the next time step, locations in community B experience growth and merge into community A. node2vec on various snapshots can take these dynamically changing communities into account and generate more intuitive features.

As illustrated in Fig. 1 c and d, if the number of nodes in the community grows, node2vec will generate similar features for the nodes that belong to a similar community. Moreover, since the network communities share similar characteristics, these locations will most likely share similar consumption index. Hence, the latent developing trend, as well as the similarity between nodes can be captured via network representation learning, thereby resulting in features that are easier to interpret. Also, visualizing the dynamic network can also inform decision makers of other latent trends (e.g., location X became similar to location Y in terms of economic development).

3.3 Regression Model over Economic Gravity-Based Features

We next convert the walks simulated from node2vec neighborhood search strategy into walks of light intensities by replacing the grid ID with its grid intensity. Then, the expected value of intensity at each step originating from node

is calculated. Therefore, the feature vector of a node represents the expected value of intensity at each step of a random walk originating from that node.

Finally, the features generated above for each node, and the corresponding economic indicators are used in a regression model (e.g., Random Forest or K-Nearest Neighbor regression or Bayesian) for training and testing. Note that, our proposed method does not require labelled economic data to build the economic-gravity network. More precisely, we can derive these community-based features for a large number of locations and use them for economic prediction on locations that do not have large-scale surveys. Therefore, our model can be used as an alternate way to quantify economic condition and growth trends in developing countries without heavily relying on surveys which are, time-, money-, and labor-intensive.

4 Results

4.1 Experimental Setup

We use monthly nightlight images from National Oceanic and Atmospheric Administration (NOAA) with a 15 arc-second spatial-resolution geographic grid noaa . For economic prediction, we use Living Standards Measurement Study (LSMS) survey data with consumption expenditure and coordinates (measured in Longitude and Latitude) from Tanzania, National Panel Survey (NPS) for the year , and Malawi, Integrated Household Panel Survey (IHPS) for the year lsms as our ground truth for training and prediction. We conduct spatial and spatio-temporal predictions in the current version of this paper. For the gravity model, we set and for Tanzania, and and for Malawi as a function of the country size and density of light-intensity.

4.2 Economic Prediction Results

In this section, we present results for spatially predicting economic indicators at several locations in Tanzania and Malawi. After, preprocessing the LSMS survey data-set (cleaning, binning, and mapping) we can recover around 500-1000 data points per set per year. We split it for training and for testing. Next, we build economic-gravity network from average nightlight images available for the corresponding year. After computing the features for all the nodes in the network (which are grids on the map), we associate the each node to the nearby houses with a Manhattan distance threshold of degree of longitude and latitude which is equivalently an km region around the grid center. We conducted experiments with different training-testing splits to obtain statistically significant results.

Figure 2: Coefficient of determination,

on test sets for (a) Bayesian-Ridge Regression (BRR), (b) Random-Forest Regression (RFR), (c) K-Nearest-Neighbours Regression (KNNR) (d) Linear Regression (LR) for Tanzania 2013. Similarly, (e), (f), (g) and (h) represent

on test sets for BRR, RFR, KNNR, and LR respectively for Malawi 2013. High for testing set shows that node2vec features for economic-gravity networks indeed capture economic information.
Country Year BRR Test RFR Test KNNR Test LR Test
Tanzania 2013 0.61718 0.52485 0.61048 0.67821
Tanzania 2015 0.60326 0.41516 0.47981 0.63308
Malawi 2013 0.58167 0.54436 0.51313 0.58330
Malawi 2016 0.44194 0.45513 0.34388 0.43937
Table 1: Spatial Predictions - Test by region and year for models BRR, RFR, KNNR, and LR.
Figure 3: Spatio-temporal Predictions - (a) coefficient of determination, for aggregate consumption of Tanzania 2015 from features weights learned from Tanzania 2013. K-Nearest-Neighbours Regression Model is used for prediction. (b) is an enlarged version of (a).

We evaluate our model via calculating

, which represents the amount of variance explained by the model. The prediction results (for the median

value) are shown in Fig. 2. As evident, Fig. 2(a) shows for the Random Forest model used on the training set, while Fig. 2(b) shows for the testing set for the same model. Similarly, Fig. 2(c) shows for the testing set with the K-Nearest Neighbor model. Clearly, the on the testing set for both regression models is high which indicates that features generated by the node2vec algorithm on the proposed gravitational network capture very relevant economic features. The results for the rest of the experiments are summarized in Table 1.

Fig. 3(a, b) show the spatio-temporal results for Tanzania 2015 when the night-light imagery data from Tanzania 2013 is used for training the models, indicating features weights learned through random walks are robust, and transferable temporally. Therefore, our model can be used to quantify large-scale economic conditions in developing countries without relying on expensive survey data. Of note, our test-set is close to the reported in the prior work for Tanzania Jean790 .

4.3 Growth Monitoring via Community Detection on Economic-Gravity Networks

For monitoring economic growth-similarity trends, we show our results in Fig. 4(a)-(c) for high resolution community detection on economic-gravity networks for 2012, 2013, and 2014 nightlight data.

Figure 4: Dynamic community structure for Tanzania: (a) 2012 has 23 communities, (b) 2013 has 20, and (c) 2014 has 18 communities. This shows that the communities merge due to similar economic growth trends. Some communities also get divided possibly due to different growth trends.

The number of communities in the network varied from 23 in 2012, to 20 in 2013, to 18 in 2014. This is in agreement with our initial hypothesis presented in Fig. 1 that when economic growth becomes similar among different regions, they will get merged into bigger communities. This is clearly evident from Fig. 4(a,b) when the communities merge due to similar growth trends in the Northwestern part of Tanzania (red arrows), while a community breaks down into two in the Southeastern part (violet arrow). Therefore, these dynamic growth trends can be easily captured by our network-based approach. This further provides more information to decision makers about which locations are experiencing similar/different growth rates.

5 Conclusion and future work

We have proposed a new dynamic gravity-based network model for quantifying economic growth across a large region. To this end, we have used representation learning on a gravity-network to extract different growth-related features. Our results have demonstrated that our proposed approach can indeed accurately predict the spatial and spatio-temporal gross economic expenditures. We have further used dynamic community structure to monitor the growth of different regions over time.

For future work, we plan to improve upon our current temporal economic predictions by building representation learning methods for dynamically changing networks.

References

  • (1) Justin Sandefur and Amanda Glassman. The political economy of bad data: Evidence from african survey and administrative statistics. The Journal of Development Studies, 51(2):116–132, 2015.
  • (2) Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
  • (3) Neal Jean, Marshall Burke, Michael Xie, Matthew Davis, David Lobell, and Stefano Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790–794, 2016.
  • (4) Sang Michael Xie, Neal Jean, Marshall Burke, David B. Lobell, and Stefano Ermon.

    Transfer learning from deep features for remote sensing and poverty mapping.

    In AAAI, 2016.
  • (5) Caleb Robinson, Fred Hohman, and Bistra N. Dilkina. A deep learning approach for population estimation from satellite imagery. CoRR, abs/1708.09086, 2017.
  • (6) Anna X. Wang, Caelin Tran, Nikhil Desai, David Lobell, and Stefano Ermon. Deep transfer learning for crop yield prediction with remote sensing data. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, COMPASS ’18, pages 50:1–50:5, New York, NY, USA, 2018. ACM.
  • (7) Mark EJ Newman. Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23):8577–8582, 2006.
  • (8) Kartikeya Bhardwaj and Radu Marculescu. Dimensionality reduction via community detection in small sample datasets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 102–114. Springer, 2018.
  • (9) NOAA. Monthly Nightlight Satellite Imagery Dataset. https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html, 2013. [Online; accessed 7-Sept-2018].
  • (10) The World Bank. Living Standards Measurement Study (LSMS) Dataset. http://microdata.worldbank.org/index.php/catalog/lsms, 2013. [Online; accessed 7-Sept-2018].