Ridge regression for neural networks performs regularization during the training phase with the L2 norm, i.e. it adds a term which is the sum of squares of the weights to the objective (loss) function being minimized. Thus, ridge regression minimizes the following during training: Objective = base_loss(weights) + alpha * (sum of squares of the weights) The base_loss will depend on the underling task (e.g. cross-entropy loss for classification) and alpha is generally adjusted during model validation, and is called the regularization parameter. Ridge regression is also called weight decay.