What is the Jaccard Index?
The Jaccard Index, also known as the Jaccard similarity coefficient, is a statistic used in understanding the similarities between sample sets. The measurement emphasizes similarity between finite sample sets, and is formally defined as the size of the intersection divided by the size of the union of the sample sets. The mathematical representation of the index is written as:
Similar to the Jaccard Index, which is a measurement of similarity, the Jaccard distance measures dissimilarity between sample sets. The Jaccard distance is calculated by finding the Jaccard index and subtracting it from 1, or alternatively dividing the differences ny the intersection of the two sets. The formula for the Jaccard distance is represented as:
How does the Jaccard Index work?
Breaking down the formula, the Jaccard Index is essentially the number in both sets, divided by the number in either set, multiplied by 100. This will produce a percentage measurement of similarity between the two sample sets. Accordingly, to find the Jaccard distance, simply subtract the percentage value from 1. For example, if the similarity measurement is 35%, then the Jaccard distance (1 - .35) is .65 or 65%.
Jaccard Index and Machine Learning
Convolutional Neural Networks, which are commonly tasked with image identification applications, apply the Jaccard Index measurements as a way of conceptualizing accuracy of object detection. For example, if a computer vision algorithm is tasked with detecting faces from an image, the Jaccard index is able to quantify the similarities between the computer's identification of faces those of the training data.
By Adrian Rosebrock - http://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=57718561