Categorical Variable

What is a Categorical Variable?

A categorical variable is a value that assumes a limited and fixed number of possible values, allowing a data unit to be assigned to a broad category for classification. Assigning each individual datapoint under observation to a labeled category is the first step in supervised deep learning. The important difference between a categorical variable and and nominal variable is that the categorical values have no relative relationship among each other, such a fork, knife and soon, where as nominal variables may indicate the degree of something such as very sad, sad, neutral, happy, very happy.    

How are Categorical Variables Used?

In broad strokes, all data can be organized as either categorical (qualitative) or quantitative (numerical).
  • Categorical

    : Categorical variables take on values that are names or labels. The color of a car, for example, could be black, white, or cherry red. Or a person could be an adult male, adult female, female child, male child. These values are often binarized to make it easier for the algorithm to process.

  • Quantitative

    : Quantitative variables are those that can already be expressed in numerical form. These are objectively measurable quantities. For example, the velocity of various atoms in a particle accelerator. Such data is often still batch normalized to allow for faster processing

How do Categorical Variables Help the Deep Learning Process?

Categorizing all know variables is the first step in supervised learning and the ultimate goal of semi-supervised and unsupervised learning. These variables are used in the training and predication phase of machine learning to both test and predict data labels. They’re also fundamental to all forms of feature extraction and any type of model training.