Dummy Variable

What is a Dummy Variable?

Generally, a dummy variable is a placeholder for a variable that will be integrated over, summed over, or marginalized.   However, in machine learning, it often describes the individual variables in a one-hot encoding scheme. Thus, dummy or Boolean variables are qualitative variables that can only take the value 0 or 1 to indicate the absence or presence of a specified condition.  These “truth” variables are used to sort data into mutually exclusive categories or to trigger off/on commands.  

How are Dummy Variables Used in Machine Learning?

These variables are most often used in regression, latent class analysis or one-hot encodig. They’re also used whenever you’re working with categorical variables that have no quantifiable relationship with each other.

For example, a product, say shoes, might be categorized by manufacturer or brand name, such as Nike, Adidas, or Puma.If each category is assigned a number from 1-3 when performing regression analysis, the results wouldn’t make any sense. If the first variable is called “ Nike” with 0 meaning false and 1 being true, then you can work with the qualitative data, in a way that is easy for the computer to understand. 

In the same way during latent class analysis, a set of observed variables can indicate the presence of one or more latent (hidden) variables.