Accuracy (error rate)

Accuracy in Machine Learning

The accuracy of a machine learning classification algorithm is one way to measure how often the algorithm classifies a data point correctly. Accuracy is the number of correctly predicted data points out of all the data points. More formally, it is defined as the number of true positives and true negatives divided by the number of true positives, true negatives, false positives, and false negatives. A true positive or true negative is a data point that the algorithm correctly classified as true or false, respectively. A false positive or false negative, on the other hand, is a data point that the algorithm incorrectly classified. For example, if the algorithm classified a false data point as true, it would be a false positive. Often, accuracy is used along with precision and recall, which are other metrics that use various ratios of true/false positives/negatives. Together, these metrics provide a detailed look at how the algorithm is classifying data points. 

Example

Consider a classification algorithm that decides whether an email is spam or not. The algorithm is trained, and we want to see how well it performs on a set of ten emails it has never seen before. Of the ten emails, six are not spam and four are spam. The algorithm classifies three of the messages as spam, of which two are actually spam, and one is not spam. In the table, the true positives (the emails that are correctly identified as spam) are colored in green, the true negatives (the emails that are correctly identified as not spam) are colored in blue, the false positives (the not spam emails that are incorrectly classified as spam) are colored in red, and the false negatives (the spam emails that are incorrectly identified as not spam) are colored in orange. There are two true positives, five true negatives, two false negatives, and one false positive. Using the formula for accuracy, we get: 

This algorithm has 70% accuracy classifying emails as spam or not.