MSPP: A Highly Efficient and Scalable Algorithm for Mining Similar Pairs of Points

07/31/2020
by   Subrata Saha, et al.
0

The closest pair of points problem or closest pair problem (CPP) is an important problem in computational geometry where we have to find a pair of points from a set of points in metric space with the smallest distance between them. This problem arises in a number of applications, such as but not limited to clustering, graph partitioning, image processing, patterns identification, and intrusion detection. For example, in air-traffic control, we must monitor aircrafts that come too close together, since this may potentially indicate a possible collision. Numerous algorithms have been presented for solving the CPP. The algorithms that are employed in practice have a worst case quadratic run time complexity. In this article we present an elegant approximation algorithm for the CPP called MSPP: Mining Similar Pairs of Points. It is faster than currently best known algorithms while maintaining a very good accuracy. The proposed algorithm also detects a set of closely similar pairs of points in Euclidean and Pearson metric spaces and can be adapted in numerous real world applications, such as clustering, dimension reduction, constructing and analyzing gene/transcript co-expression network, among others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2018

Algorithms for metric learning via contrastive embeddings

We study the problem of supervised learning a metric space under discrim...
research
02/04/2021

A Faster Algorithm for Finding Closest Pairs in Hamming Metric

We study the Closest Pair Problem in Hamming metric, which asks to find ...
research
12/03/2018

On Closest Pair in Euclidean Metric: Monochromatic is as Hard as Bichromatic

Given a set of n points in R^d, the (monochromatic) Closest Pair proble...
research
01/04/2020

Computing Euclidean k-Center over Sliding Windows

In the Euclidean k-center problem in sliding window model, input points ...
research
10/14/2022

GriT-DBSCAN: A Spatial Clustering Algorithm for Very Large Databases

DBSCAN is a fundamental spatial clustering algorithm with numerous pract...
research
02/16/2022

Distributed k-Means with Outliers in General Metrics

Center-based clustering is a pivotal primitive for unsupervised learning...
research
03/25/2019

Aligning Vector-spaces with Noisy Supervised Lexicons

The problem of learning to translate between two vector spaces given a s...

Please sign up or login with your details

Forgot password? Click here to reset