Changing Data Sources in the Age of Machine Learning for Official Statistics

06/07/2023
by   Cedric De Boom, et al.
0

Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accuracy and completeness, but also the neutrality and potential discontinuation of the statistical offering. We offer a few important precautionary measures, such as enhancing robustness in both data sourcing and statistical techniques, and thorough monitoring. In doing so, machine learning-based official statistics can maintain integrity, reliability, consistency, and relevance in policy-making, decision-making, and public discourse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2023

A compendium of data sources for data science, machine learning, and artificial intelligence

Recent advances in data science, machine learning, and artificial intell...
research
11/23/2018

Construcción de un Mapa de Vulnerabilidad Sanitaria en Argentina a partir de datos públicos

This document is intended to present in detail the processing criteria a...
research
03/15/2020

On new data sources for the production of official statistics

In the past years we have witnessed the rise of new data sources for the...
research
12/15/2022

Inequality, Crime and Public Health: A Survey of Emerging Trends in Urban Data Science

Urban agglomerations are constantly and rapidly evolving ecosystems, wit...
research
07/27/2022

Challenges and Opportunities of Computational Social Science for Official Statistics

The vast amount of data produced everyday (so-called 'digital traces') a...
research
12/20/2017

Linking Administrative Data: An Evolutionary Schema

Statistics New Zealand (Stats NZ) has committed unreservedly to an admin...
research
02/03/2018

Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

Calls for heightened consideration of fairness and accountability in alg...

Please sign up or login with your details

Forgot password? Click here to reset