Can Strategic Data Collection Improve the Performance of Poverty Prediction Models?

11/16/2022
by   Satej Soman, et al.
0

Machine learning-based estimates of poverty and wealth are increasingly being used to guide the targeting of humanitarian aid and the allocation of social assistance. However, the ground truth labels used to train these models are typically borrowed from existing surveys that were designed to produce national statistics – not to train machine learning models. Here, we test whether adaptive sampling strategies for ground truth data collection can improve the performance of poverty prediction models. Through simulations, we compare the status quo sampling strategies (uniform at random and stratified random sampling) to alternatives that prioritize acquiring training data based on model uncertainty or model performance on sub-populations. Perhaps surprisingly, we find that none of these active learning methods improve over uniform-at-random sampling. We discuss how these results can help shape future efforts to refine machine learning-based estimates of poverty.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Data Uncertainty without Prediction Models

Data acquisition processes for machine learning are often costly. To con...
research
01/31/2022

Adaptive Sampling Strategies to Construct Equitable Training Datasets

In domains ranging from computer vision to natural language processing, ...
research
04/20/2021

Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets

The ground truth used for training image, video, or speech quality predi...
research
12/20/2022

Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

Data subsampling has become widely recognized as a tool to overcome comp...
research
02/04/2022

Capturing and incorporating expert knowledge into machine learning models for quality prediction in manufacturing

Increasing digitalization enables the use of machine learning methods fo...
research
03/28/2023

Supervised Learning for Table Tennis Match Prediction

Machine learning, classification and prediction models have applications...
research
07/18/2014

Bayesian Nonparametric Crowdsourcing

Crowdsourcing has been proven to be an effective and efficient tool to a...

Please sign up or login with your details

Forgot password? Click here to reset