What is a Dummy Variable?
Generally, a dummy variable is a placeholder for a variable that will be integrated over, summed over, or marginalized. However, in machine learning, it often describes the individual variables in a one-hot encoding scheme. Thus, dummy or Boolean variables are qualitative variables that can only take the value 0 or 1 to indicate the absence or presence of a specified condition. These “truth” variables are used to sort data into mutually exclusive categories or to trigger off/on commands.
How are Dummy Variables Used in Machine Learning?
These variables are most often used in regression, latent class analysis or one-hot encodig. They’re also used whenever you’re working with categorical variables that have no quantifiable relationship with each other.
For example, a product, say shoes, might be categorized by manufacturer or brand name, such as Nike, Adidas, or Puma.If each category is assigned a number from 1-3 when performing regression analysis, the results wouldn’t make any sense. If the first variable is called “ Nike” with 0 meaning false and 1 being true, then you can work with the qualitative data, in a way that is easy for the computer to understand.