Symmetry critical points for a model shallow neural network
A detailed analysis is given of a family of critical points determining spurious minima for a model student-teacher 2-layer neural network, with ReLU activation function, and a natural Γ = S_k × S_k-symmetry. For a k-neuron shallow network of this type, analytic equations are given which, for example, determine the critical points of the spurious minima described by Safran and Shamir (2018) for 6 < k < 20. These critical points have isotropy (conjugate to) the diagonal subgroup Δ S_k-1⊂Δ S_k of Γ. It is shown that critical points of this family can be expressed as an infinite series in 1/√(k) (for large enough k) and, as an application, the critical values decay like a k^-1, where a ≈ 0.3. Other non-trivial families of critical points are also described with isotropy conjugate to Δ S_k-1, Δ S_k and Δ (S_2× S_k-2) (the latter giving spurious minima for k> 9). The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem, and are applicable to other families of critical points that occur in this network.
READ FULL TEXT