Cluster Analysis

What is the Cluster Analysis?

Cluster analysis is an unsupervised learning technique that groups a set of unlabeled objects into clusters that are more similar to each other than the data in other clusters. Cluster analysis is often referred to as segmentation or taxonomy analysis.

This is a form of exploratory analysis that makes no distinction between dependent and independent variables and just identifies similar structures within a dataset. The ultimate goal is to identify groups of similar data cases even if the grouping is not previously known.  This analysis it does not make any.  

How is a Cluster Analysis Used?

Cluster analysis is not so much a single algorithm as it is a process of many subordinate functions, such as discriminant analysis. It still requires human intervention to guarantee the clusters are meaningful in practice and not just statistical anomalies.

Cluster analysis has countless applications in any field requiring pattern recognition, segmentation or compression, but the most common uses in machine learning are:

  • Software troubleshooting and anomaly detection

    - To reduce junk code by restructuring functions that are too dispersed or obsolete. 
  • Image segmentation - Clustering divides a digital image into distinct regions for better border and object recognition.
  • Evolutionary algorithms

    - Clustering identifies different niches within the properties of an evolutionary algorithm so that “reproductive opportunity” is better distributed among subsequent programs.

  • Recommendation engines - Clustering algorithms predict a user with no background data’s preferences based on the preferences of other users in the user’s cluster.