balance – a Python package for balancing biased data samples

07/12/2023
by   Tal Sarig, et al.
0

Surveys are an important research tool, providing unique measurements on subjective experiences such as sentiment and opinions that cannot be measured by other means. However, because survey data is collected from a self-selected group of participants, directly inferring insights from it to a population of interest, or training ML models on such data, can lead to erroneous estimates or under-performing models. In this paper we present balance, an open-source Python package by Meta, offering a simple workflow for analyzing and adjusting biased data samples with respect to a population of interest. The balance workflow includes three steps: understanding the initial bias in the data relative to a target we would like to infer, adjusting the data to correct for the bias by producing weights for each unit in the sample based on propensity scores, and evaluating the final biases and the variance inflation after applying the fitted weights. The package provides a simple API that can be used by researchers and data scientists from a wide range of fields on a variety of data. The paper provides the relevant context, methodological background, and presents the package's API.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2018

Seglearn: A Python Package for Learning Sequences and Time Series

Seglearn is an open-source python package for machine learning time seri...
research
08/13/2023

csSampling: An R Package for Bayesian Models for Complex Survey Data

We present csSampling, an R package for estimation of Bayesian models fo...
research
11/16/2017

Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey

Aims: A common objective of epidemiological surveys is to provide popula...
research
12/17/2019

Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more

Cyanure is an open-source C++ software package with a Python interface. ...
research
08/01/2022

Calculating incidence of Influenza-like and COVID-like symptoms from Flutracking participatory survey data

This article describes a new method for estimating weekly incidence (new...
research
04/21/2021

Adjustment for Biased Sampling Using NHANES Derived Propensity Weights

The Consent-to-Contact (C2C) registry at the University of California, I...
research
06/27/2019

A Python Library For Empirical Calibration

Dealing with biased data samples is a common task across many statistica...

Please sign up or login with your details

Forgot password? Click here to reset