Relational Algorithms for k-means Clustering

08/01/2020
by   Benjamin Moseley, et al.
0

The majority of learning tasks faced by data scientists involve relational data, yet most standard algorithms for standard learning problems are not designed to accept relational data as input. The standard practice to address this issue is to join the relational data to create the type of geometric input that standard learning algorithms expect. Unfortunately, this standard practice has exponential worst-case time and space complexity. This leads us to consider what we call the Relational Learning Question: “Which standard learning algorithms can be efficiently implemented on relational data, and for those that can not, is there an alternative algorithm that can be efficiently implemented on relational data and that has similar performance guarantees to the standard algorithm?” In this paper, we address the relational learning question for two well-known algorithms for the standard k-means clustering problem. We first show that the k-means++ algorithm can be efficiently implemented on relational data. In contrast, we show that the adaptive k-means algorithm likely can not be efficiently implemented on relational data, as this would imply P = #P. However, we show that a slight variation of this adaptive k-means algorithm can be efficiently implemented on relational data, and that this alternative algorithm has the same performance guarantee as the original algorithm, that is that it outputs an O(1)-approximate sketch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2013

An implementation of the relational k-means algorithm

A C# implementation of a generalized k-means variant called relational k...
research
10/11/2019

Rk-means: Fast Clustering for Relational Data

Conventional machine learning algorithms cannot be applied until a data ...
research
10/09/2022

Coresets for Relational Data and The Applications

A coreset is a small set that can approximately preserve the structure o...
research
07/25/2021

Relational Boosted Regression Trees

Many tasks use data housed in relational databases to train boosted regr...
research
06/10/2020

Fitted Q-Learning for Relational Domains

We consider the problem of Approximate Dynamic Programming in relational...
research
12/02/2022

A Geometric-Relational Deep Learning Framework for BIM Object Classification

Interoperability issue is a significant problem in Building Information ...
research
12/27/2012

On-line relational SOM for dissimilarity data

In some applications and in order to address real world situations bette...

Please sign up or login with your details

Forgot password? Click here to reset