Local Critic Training for Model-Parallel Learning of Deep Neural Networks
This paper proposes a novel approach to train deep neural networks in a parallelized manner by unlocking the layer-wise dependency of backpropagation training. The approach employs additional modules called local critic networks besides the main network model to be trained, which estimate the output of the main network in order to obtain error gradients without complete feedforward and backward propagation processes. We propose a cascaded learning strategy for these local networks so that parallelized training of different layer groups is possible. Experimental results show the effectiveness of the proposed approach and suggest guidelines for determining appropriate algorithm parameters. In addition, we demonstrate that the approach can be also used for structural optimization of neural networks, computationally efficient progressive inference, and ensemble classification for performance improvement.
READ FULL TEXT