Symmetry critical points for a model shallow neural network
A detailed analysis is given of a family of critical points determining spurious minima for a model student-teacher 2-layer neural network, with ReLU activation function, and a natural Γ = S_k × S_k-symmetry. For a k-neuron shallow network of this type, analytic equations are given which, for example, determine the critical points of the spurious minima described by Safran and Shamir (2018) for 6 < k < 20. These critical points have isotropy (conjugate to) the diagonal subgroup Δ S_k-1⊂Δ S_k of Γ. It is shown that critical points of this family can be expressed as an infinite series in 1/√(k) (for large enough k) and, as an application, the critical values decay like a k^-1, where a ≈ 0.3. Other non-trivial families of critical points are also described with isotropy conjugate to Δ S_k-1, Δ S_k and Δ (S_2× S_k-2) (the latter giving spurious minima for k> 9). The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem, and are applicable to other families of critical points that occur in this network.