In mathematics, the gradient of a function represents the steepness of said function. For example, in the figure on the right, the center part of the graph has a low gradient due to the line being almost flat, whereas the sides have a high gradient from the line being very steep. If you trace the line from left to right, it “changes direction” at the center of the graph. This is because at this point, the line is effectively flat, which means the gradient of the function is zero. This point on the graph is what is known as a critical point, which signal the highest and lowest parts of a graph.
The gradient descent algorithm uses the gradient of a function to find a critical point by following the line down the graph. One can think of gradient descent as “sliding down” the graph until it stops at the lowest point. (Contrastingly, gradient ascent “climbs up” the graph in order to find the highest point.)
True Gradient Descent
In machine learning, the effectiveness of a network is measured by an error function which measures how good a network is at its job. In general, the higher the error, the worse the network is. Conversely, the lower the error, the better the network is. When a machine is trying to learn how to do a task, it tries to make as few mistakes as possible; that is, minimize its error function. True gradient descent is the application of gradient descent to a machine learning network to minimize an error function. The network trains over all the training examples it knows. Then, it uses the gradient of the error function to adjust the configuration of the network to reduce the total error over all training examples. It repeats this process over and over until it can no longer reduce the error.