What is High-dimensional Data?
High-dimensional data is characterized by multiple dimensions. There can be thousands, if not millions, of dimensions.
A Practical Example of Dimension
In color selection, we see colors expressed as a group of three numbers - red, green, and blue values, or RGB. Each of these values must contain a number for a color to be expressed. For example, a particularly bright hue like lime green has a red value of 50, a green value of 205, and a blue value of 50. Lime green would not be the result if one of these values was different.
Since these three numbers in combination are what ultimately define the color, we say that color space is three-dimensional because there are three “directions” in which a color can vary. While color hue is not high-dimensional data, it is a simple example of data that has dimensions at all.
Applications in Artificial Intelligence
When teaching AI to recognize faces, even basic facial recognition algorithms use high-dimensional data.
Let’s say we have n images, and each of these images has a resolution of y by z pixels. If we consider every pixel within a given image as a variable, each n image resides in an y x z dimensional space. Using this, we can give a computer a set of images to “train” it to recognize new faces or in some cases, the differences between a human face and the face of an animal, for example.
AI often has difficulty working with high-dimensional data, and some essential algorithms are unable to properly produce any results, even if given months to work. To turn this data into something that is more easily processed by a computer, a different set of algorithms can be used to find a lower dimensional subspace.