Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

07/15/2020
by   Uthsav Chitra, et al.
25

Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an anomaly family. For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for anomaly estimation is to compute the Maximum Likelihood Estimator (MLE) of the anomaly. However, it was recently observed that for some anomaly families, the MLE is an asymptotically biased estimator of the anomaly. Here, we demonstrate that the bias of the MLE depends on the size of the anomaly family. We prove that if the number of sets in the anomaly family that contain the anomaly is sub-exponential, then the MLE is asymptotically unbiased. At the same time, we provide empirical evidence that the converse is also true: if the number of such sets is exponential, then the MLE is asymptotically biased. Our analysis unifies a number of earlier results on the bias of the MLE for specific anomaly families, including intervals, submatrices, and connected subgraphs. Next, we derive a new anomaly estimator using a mixture model, and we empirically demonstrate that our estimator is asymptotically unbiased regardless of the size of the anomaly family. We illustrate the benefits of our estimator on both simulated disease outbreak data and a real-world highway traffic dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2021

A Computationally Efficient Method for Learning Exponential Family Distributions

We consider the question of learning the natural parameters of a k param...
research
04/08/2023

Inference on a class of exponential families on permutations

In this paper we study a class of exponential family on permutations, wh...
research
02/19/2020

Asymptotically Optimal Bias Reduction for Parametric Models

An important challenge in statistical analysis concerns the control of t...
research
07/03/2019

Unbiased Estimation of the Reciprocal Mean for Non-negative Random Variables

Many simulation problems require the estimation of a ratio of two expect...
research
03/29/2018

Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist

In a regular full exponential family, the maximum likelihood estimator (...
research
07/04/2012

Exploiting Evidence in Probabilistic Inference

We define the notion of compiling a Bayesian network with evidence and p...
research
11/16/2020

Estimating the correlation in network disturbance models

The Network Disturbance Model of Doreian (1989) expresses the dependency...

Please sign up or login with your details

Forgot password? Click here to reset