Relational Algorithms for k-means Clustering

by   Benjamin Moseley, et al.

The majority of learning tasks faced by data scientists involve relational data, yet most standard algorithms for standard learning problems are not designed to accept relational data as input. The standard practice to address this issue is to join the relational data to create the type of geometric input that standard learning algorithms expect. Unfortunately, this standard practice has exponential worst-case time and space complexity. This leads us to consider what we call the Relational Learning Question: “Which standard learning algorithms can be efficiently implemented on relational data, and for those that can not, is there an alternative algorithm that can be efficiently implemented on relational data and that has similar performance guarantees to the standard algorithm?” In this paper, we address the relational learning question for two well-known algorithms for the standard k-means clustering problem. We first show that the k-means++ algorithm can be efficiently implemented on relational data. In contrast, we show that the adaptive k-means algorithm likely can not be efficiently implemented on relational data, as this would imply P = #P. However, we show that a slight variation of this adaptive k-means algorithm can be efficiently implemented on relational data, and that this alternative algorithm has the same performance guarantee as the original algorithm, that is that it outputs an O(1)-approximate sketch.


page 1

page 2

page 3

page 4


An implementation of the relational k-means algorithm

A C# implementation of a generalized k-means variant called relational k...

Rk-means: Fast Clustering for Relational Data

Conventional machine learning algorithms cannot be applied until a data ...

Coresets for Relational Data and The Applications

A coreset is a small set that can approximately preserve the structure o...

Relational Boosted Regression Trees

Many tasks use data housed in relational databases to train boosted regr...

Fitted Q-Learning for Relational Domains

We consider the problem of Approximate Dynamic Programming in relational...

A Geometric-Relational Deep Learning Framework for BIM Object Classification

Interoperability issue is a significant problem in Building Information ...

On-line relational SOM for dissimilarity data

In some applications and in order to address real world situations bette...

Please sign up or login with your details

Forgot password? Click here to reset