Appendix A Message Passing Algorithms for unsupervised learning with prior information
In our current setting, we assume that the statistical inference of synaptic weights from the raw data has the correct prior information , where , where is the correlation level between the two receptive fields. According to the Bayes’ rule, the posterior probability of synaptic weights is given by
where , representing the overlap of the two RFs, and is the so-called partition function in statistical physics.
Using the Bethe approximation Hou et al. (2019), we can easily write down the belief propagation equations as follows,
where we define auxiliary variables , , and . indicates neighbors of the data node excluding the synaptic weight . In our model, all synaptic weights are used to explain each data sample. The belief propagation is commonly defined in a factor graph representation, where the synaptic-weight-pair acts as the variable node, while the data sample acts as the factor node (or constraint to be satisfied) Mézard and Montanari (2009). The learning can then be interpreted as the process of synaptic weight inference based on the data constraints. The cavity probability is defined as the probability of the pair without considering the contribution of the data node . is thus a normalization constant for the cavity probability . The cavity probability can then be parameterized by the cavity magnetization and correlation as . represents the contribution of one data node given the value of
. Due to the central limit theorem,and can be considered as two correlated Gaussian random variables. We thus define , , , and
as the means and variances of the two variables, respectively. The covariance is given by. Moreover, is approximated by its cavity mean . As a result, the intractable summation in Eq. (S2b) can be replaced by a jointly-correlated Gaussian integral,
where the standard Gaussian measure , and .
We further define the cavity bias as
Using Eq. (S2a), the cavity magnetizations , , and the cavity correlation can be computed as follows,
Starting from random initialization values of cavity magnetizations and correlations, the above belief propagation iterates until convergence. To carry out the inference of synaptic weights (so-called learning), one only need to compute the full magnetizations by replacing in Eq. (S5) by . The free energy can also be estimated under the Bethe approximation, given by where the single synaptic-weight-pair contribution and the single data sample contribution are given as follows,
where , , , , , and .
Appendix B Replica analysis of the model
For a replica analysis, we need to evaluate a disorder average of an integer power of the partition function , where is the disorder average over the true RF distribution that is factorized over components and the corresponding data distribution as
where , and is the replica index. The typical free energy can then be obtained as . To compute explicitly , we need to specify the order parameters as follows:
Inserting these definitions in the form of the delta functions as well as their corresponding integral representations, one can decompose the computation of into entropic and energetic parts. However, to further simplify the computation, we make a simple Ansatz, i.e., the order parameters are invariant under the permutation of replica indexes. This is the so-called RS Ansatz. The RS Ansatz reads,
for any , and
for any and . Note that are conjugated order parameters introduced when using the integral representation of the delta function.
Then we can reorganize as
where and denote, respectively, all non-conjugated and conjugated order parameters. In the large limit, the integral is dominated by an equilibrium action:
where is the entropic term, and is the energetic term.
We first compute the entropic term as follows,
After a bit lengthy algebraic manipulation with the techniques developed in our previous work Hou et al. (2019), we arrive at the final result of as
where we define with three independent standard Gaussian random variables (, and ), denotes a disorder average with respect to the true prior. From this expression, an effective two-spin interaction Hamiltonian can be extracted, determining the effective partition function in the main text. The effective fields and coupling are given as follows,
Next, we compute the energetic term given by
where defines the disorder average. , , and , , where represents a typical data sample. These four quantities are correlated random Gaussian variables, due to the central limit theorem. To satisfy their covariance structure determined by the order parameters, the random variables and are parameterized by six standard Gaussian variables of zero mean and unit variance () as follows,
where , , and . Therefore, the term can be calculated by a standard Gaussian integration given by
By introducing the auxiliary variables as follows,
we finally arrive at the free energy as
where . The saddle-point analysis in Eq. (S11) requires that the order parameters should be the stationary point of the free energy. All these conjugated and non-conjugated order parameters are subject to saddle-point equations derived from setting the corresponding derivatives of the free energy with respect to the order parameters zero. Here we skip the technical details to derive the saddle-point equations. We refer the interested readers to our previous work Hou et al. (2019).
The saddle-point equations for non-conjugated order parameters are given by