Correlated Initialization for Correlated Data
Spatial data exhibits the property that nearby points are correlated. This holds also for learnt representations across layers, but not for commonly used weight initialization methods. Our theoretical analysis reveals for uncorrelated initialization that (i) flow through layers suffers from much more rapid decrease and (ii) training of individual parameters is subject to more “zig-zagging”. We propose multiple methods for correlated initialization. For CNNs, they yield accuracy gains of several per cent in the absence of regularization. Even for properly tuned L2-regularization gains are often possible.
READ FULL TEXT