DeepAI AI Chat
Log In Sign Up

Separating populations with wide data: A spectral analysis

06/25/2007
by   Avrim Blum, et al.
0

In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality γ, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points n and the number of features K--needed to correctly perform this partitioning as a function of 1/γ for K>n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/01/2023

Semidefinite programming on population clustering: a global analysis

In this paper, we consider the problem of partitioning a small data samp...
02/10/2008

Learning Balanced Mixtures of Discrete Distributions with Small Sample

We study the problem of partitioning a small sample of n individuals fro...
09/06/2012

The Sample Complexity of Search over Multiple Populations

This paper studies the sample complexity of searching over multiple popu...
10/16/2012

A Model-Based Approach to Rounding in Spectral Clustering

In spectral clustering, one defines a similarity matrix for a collection...
08/01/2020

Learning from Mixtures of Private and Public Populations

We initiate the study of a new model of supervised learning under privac...
10/26/2021

Machine learning spectral functions in lattice QCD

We study the inverse problem of reconstructing spectral functions from E...