Household poverty classification in data-scarce environments: a machine learning approach

11/18/2017
by   Varun Kshirsagar, et al.
0

We describe a method to identify poor households in data-scarce countries by leveraging information contained in nationally representative household surveys. It employs standard statistical learning techniques---cross-validation and parameter regularization---which together reduce the extent to which the model is over-fitted to match the idiosyncracies of observed survey data. The automated framework satisfies three important constraints of this development setting: i) The prediction model uses at most ten questions, which limits the costs of data collection; ii) No computation beyond simple arithmetic is needed to calculate the probability that a given household is poor, immediately after data on the ten indicators is collected; and iii) One specification of the model (i.e. one scorecard) is used to predict poverty throughout a country that may be characterized by significant sub-national differences. Using survey data from Zambia, the model's out-of-sample predictions distinguish poor households from non-poor households using information contained in ten questions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

High-Resolution Poverty Maps in Sub-Saharan Africa

Up-to-date poverty maps are an important tool for policy makers, but unt...
research
08/28/2020

How is Machine Learning Useful for Macroeconomic Forecasting?

We move beyond "Is Machine Learning Useful for Macroeconomic Forecasting...
research
11/05/2020

Predicting respondent difficulty in web surveys: A machine-learning approach based on mouse movement features

A central goal of survey research is to collect robust and reliable data...
research
03/28/2021

On the limits of algorithmic prediction across the globe

The impact of predictive algorithms on people's lives and livelihoods ha...
research
07/15/2019

Quick, Stat!: A Statistical Analysis of the Quick, Draw! Dataset

The Quick, Draw! Dataset is a Google dataset with a collection of 50 mil...
research
09/29/2019

A Longitudinal Framework for Predicting Nonresponse in Panel Surveys

Nonresponse in panel studies can lead to a substantial loss in data qual...
research
10/07/2022

Geomagnetic Survey Interpolation with the Machine Learning Approach

This paper portrays the method of UAV magnetometry survey data interpola...

Please sign up or login with your details

Forgot password? Click here to reset