Multicalibrated Partitions for Importance Weights

03/10/2021
by   Parikshit Gopalan, et al.
4

The ratio between the probability that two distributions R and P give to points x are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where R and P are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about R and P or on the importance weights themselves. In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution Q closest to P, that looks the same as R on every set C ∈𝒞, where 𝒞 may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets C ∈𝒞, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets C ∈𝒞. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2020

Moment-Based Domain Adaptation: Learning Bounds and Algorithms

This thesis contributes to the mathematical foundation of domain adaptat...
research
07/12/2022

An Information-Theoretic Analysis for Transfer Learning: Error Bounds and Applications

Transfer learning, or domain adaptation, is concerned with machine learn...
research
02/14/2012

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

Low-dimensional embedding, manifold learning, clustering, classification...
research
06/14/2022

Confidence Score for Source-Free Unsupervised Domain Adaptation

Source-free unsupervised domain adaptation (SFUDA) aims to obtain high p...
research
02/28/2022

KL Divergence Estimation with Multi-group Attribution

Estimating the Kullback-Leibler (KL) divergence between two distribution...
research
06/19/2019

On infinite covariance expansions

In this paper we provide a probabilistic representation of Lagrange's id...
research
02/06/2023

Data Selection for Language Models via Importance Resampling

Selecting a suitable training dataset is crucial for both general-domain...

Please sign up or login with your details

Forgot password? Click here to reset