What is Binarization?
Binarization is the process of transforming data features of any entity into vectors of binary numbers to make classifier algorithms more efficient. In a simple example, transforming an image’s gray-scale from the 0-255 spectrum to a 0-1 spectrum is binarization.
How is Binarization used?
In machine learning, even the most complex concepts can be transformed into binary form. For example, to binarize the sentence “The dog ate the cat,” every word is assigned an ID (for example dog-1, ate-2, the-3, cat-4). Then replace each word with the tag to provide a binary vector. In this case the vector: <3,1,2,3,4> can be refined by providing each word with four possible slots, then setting the slot to correspond with a specific word: <0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1>. This is commonly referred to as the bag-of-words-method.