Affinity Matrix

What is an Affinity Matrix?

An Affinity Matrix, also called a Similarity Matrix, is an essential statistical technique used to organize the mutual similarities between a set of data points.  Similarity is similar to distance, however, it does not satisfy the properties of a metric, two points that are the same will have a similarity score of 1, whereas computing the metric will result in zero.  Typical examples of similarity measures are the cosine similarity and the Jaccard similarity.  These similarity measures can be interpreted as the probability that that two points are related. hor example, if two data points have coordinates that are close, then their cosine similarity score ( or respective “affinity” score) will be much closer to 1 than two data points with a lot of space between them. 

Why is this Useful?

By assigning a numerical value to the abstract concept of similarity, the affinity matrix lets machine learning programs mimic human logic by making educated guesses about what information is related and how similar they are. Just as useful, the similarity matrix allows machine learning systems to work with data from unlabeled or corrupted datasets with human-like intuition, which leads to countless practical applications in many fields.

Practical Uses of an Affinity Matrix

  • Smart Information Retrieval –  The similarity matrix is the driving force behind smart search engines that pull additional relevant information for you that you didn’t even know you needed.
  • Advanced Genetic Research – Discovering and studying affinities among ostensibly random DNA combinations has delivered huge breakthroughs in medical science and pharmaceutical research.
  • Efficient Data Mining – A similarity matrix allows for fast and accurate recognition of hidden relationship patterns in any database full of unlabeled information. Particularly useful for business intelligence, law enforcement and all forms of scientific research.
  • Intelligent Unsupervised Machine Learning

    – Creating machine learning algorithms capable of deriving structure and meaning from “raw” unorganized data would simply be impossible without an similarity matrix to ensure a minimum standard of accuracy in what the network is teaching itself. Techniques used in this area are k-means or k-nearest neighbors that rely heavily on a choice of either a distance function or an affinity measure.