Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

05/01/2023
by   Akash Srivastava, et al.
1

Functions of the ratio of the densities p/q are widely used in machine learning to quantify the discrepancy between the two distributions p and q. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities {m_k}_k=1^K and trains a multi-class logistic regression to classify the samples from p, q, and {m_k}_k=1^K into K+2 classes. We show that if these auxiliary densities are constructed such that they overlap with p and q, then a multi-class logistic regression allows for estimating log p/q on the domain of any of the K+2 distributions and resolves the distribution shift problems of the current state-of-the-art methods. We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning. Code: https://www.blackswhan.com/mdre/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

A Unified Framework for Multi-distribution Density Ratio Estimation

Binary density ratio estimation (DRE), the problem of estimating the rat...
research
05/07/2019

F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When appl...
research
06/22/2020

Telescoping Density-Ratio Estimation

Density-ratio estimation via classification is a cornerstone of unsuperv...
research
04/23/2021

Establishing phone-pair co-usage by comparing mobility patterns

In forensic investigations it is often of value to establish whether two...
research
12/31/2013

Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

We propose a high dimensional classification method that involves nonpar...
research
10/06/2021

Relative Entropy Gradient Sampler for Unnormalized Distributions

We propose a relative entropy gradient sampler (REGS) for sampling from ...
research
06/11/2019

Discrepancy, Coresets, and Sketches in Machine Learning

This paper defines the notion of class discrepancy for families of funct...

Please sign up or login with your details

Forgot password? Click here to reset