What is One-Hot Encoding?
One-hot encoding is used in machine learning as a method to quantify categorical data. In short, this method produces a vector with length equal to the number of categories in the data set. If a data point belongs to theith category then components of this vector are assigned the value 0 except for the ith component, which is assigned a value of 1. In this way one can keep track of the categories in a numerically meaningful way.
Consider the problem of classifying a person into one of four categories: male, female, gender-neutral, and other. We can represent this as an array with four positions. For every person we encounter, we want to be able to represent them as a one-hot encoding with relation to our four categories. Let’s say walking down the street, we encounter 4 people who identify as female, 3 people who identify as male, one person who identifies as gender-neutral, and 2 people who identify as something other than the other three categories. Then, we can represent these people in the following way: