1 Introduction and Overview
This article documents the verification and validation (V&V) results for the first two stages of code development for free energy minimization within a 2D Cluster Variation Method (CVM) system.
The intention is that this 2D CVM system can have its free energy minimized independent of its use in any other process. Ultimately, the 2D CVM system will be inserted as a single layer into a neural network, creating a new form of computational engine, which I call the CORTECON, standing for a COntentRetentive, TEmporallyCONnected neural network, first described in [1] and [2], both of which presented early results using a 1D CVM.
This work described here focuses on a 2D CVM grid, which can operate as both a hidden layer and as a independent functional unit, with the ability to achieve a free energy minimum when there is no extraneous signal coming into the layer, is shown in Figure 1.
This notion of using a 2D CVM as a computational engine’s hidden layer advances ideas originally proposed in [1] and [2], along with [3], and further incorporates (and makes practical) ideas put forth by Karl Friston, whose notation was adopted for Figure 1. Specifically, this figure illustrates the computational engine using Friston’s notion of a set of computational (representational) units separated from an external system by a Markov blanket. It also allows for the variational Bayes (free energy minimization) approach described by Friston [4], [5], and [6], and earlier by Beal [7].
In brief, Friston (building on work by Beal [7]) proposes a computational system in which a Markov blanket separates the computational (representational) elements of the engine from external events, as shown in Figure 1. The communication between the external system elements (denoted ) with those of the representational system (denoted or ) are mediated by two distinct layers or components of the Markov blanket; the sensing () elements and the action () ones.
This article provides V&V for the first two code development stages:

Computing values for the configuration variables in a 2D system – for various values of the interaction enthalpy parameter h, and

Computing the thermodynamic quantities associated with the 2D system – given the set of configuration variables, it is possible to then compute enthalpy, entropy, and free energy.
Most crucially, the code incorporates a free energy minimization process, so that once an initial (randomlygenerated pattern) has been created, it is adjusted in a twostage process:

Achieve desired specification – this allows us to implicitly enfold a nominal perunit activation energy (where the relationship between this parameter and cannot be explicitly stated at this time), and

Achieve free energy minimization for the given set of , values – typically, the 2D CVM grid needs to have state changes in its various units to achieve a free energy minimum.
2 The Configuration Variables
The first V&V aspect of the task documented here was to ensure that the configuration variables for the 2D grid were counted correctly.

Configuration variable definitions – including how they are counted in the 2D CVM grid,

2D CVM grid specifications – size and wraparounds, and

V&V results – configuration variable counts for select examples.
2.1 Introducing the Configuration Variables
The Cluster Variation Method, introduced by Kikuchi [8], uses an entropy term that includes not only the distribution into simple “on” and “off” states, but also distribution into local patterns, or configurations, as illustrated in the following figures.
A 2D CVM is characterized by a set of configuration variables, which collectively represent single unit, pairwise combination, and triplet values. The configuration variables are denoted as:

 Single units,

 Nearestneighbor pairs,

 Nextnearestneighbor pairs, and

 Triplets.
These configuration variables are illustrated for a single zigzag chain in Figure 2.
For a bistate system (one in which the units can be in either state A or state B), there are six different ways in which the triplet configuration variables () can be constructed, as shown in Figure 3, and also in Table 1.
Notice that within Figure 3, the triplets and have two possible configurations each: AAB and BAA for , and BBA and ABB for . This means that there is a degeneracy factor of 2 for each of the and triplets.
The degeneracy factors and (number of ways of constructing a given configuration variable) are shown in Figure 4; , as and can be constructed as either AB or as BA for , or as B A or as A B for . Similarly, (for the triplets), as there are two ways each for constructing the triplets and . All other degeneracy factors are set to 1.
Name  Variable  Instances 

Unit  2  
Nearestneighbor  3  
Nextnearestneighbor  3  
Triplet  6 
2.2 Counting the Configuration Variables
To experiment with the 2D CVM system, I constructed various grids of 256 (16 x 16) units each, as illustrated in Figure 5.
I decided to use a 16 x 16 grid for several reasons:

Sufficient variety in local patterns – I was able to construct grids that illustrated several distinct kinds of topographies (each corresponding to different hvalues),

Sufficient nodes – so that tripletconfiguration extrema could be explored in some detail, and

Countability – I needed to be able to manually count all the configuration values for a given 2D grid configuration, and match them against the results from the program, as a crucial V&V step.
One final advantage of the 16 x 16 grid layout was that the different grid configurations were both large enough to show diversity, but small enough so that I could create a figure illustrating the activation state (A or B) of each node, thus illustrating the detailed particulars of each configuration design.
I began with manuallydesigned grid configurations, such as the two shown in Figure 5. These two configurations correspond (somewhat) to the notions of “scalefree” and “richclub” topographies, as observed in various neural communities. (For references, please consult [3].)
These two different grid configurations are early attempts to characterize how the hvalues can be identified for grids with different total configuration variable values. The following Section 3 will discuss hvalues in the context of the free energy equation.
Both of these systems were created with the constraint of equiprobable occurrence of units in states (A or B; that is, . This was done to facilitate the next V&V step, which will be discussed in Section 3. Thus, for the configurations shown in Figure 5, both the (a) and (b) grids have 128 nodes each of units in state A and in state B.
The configuration on the left of Figure 5 is an effort to build a “scalefreelike” system. The notion of a “scalefree” system is that the same kind of pattern replicates itself throughout various scales of observation in a system. Thus, for the “scalefreelike” configuration shown in Figure 5 (a), I created a design that was originally intended to be 180degree symmetrical around a central axis (dihedral group2 symmetry). Specifically, the left and right sides were to be identical in a rotated180degree sense.
For ease in design of the “scalefreelike” system, I focused on creating a pattern on one side and duplicating it on the other. I used a paisleylike base pattern in order to create greater dispersion of values across the triplets; that is, I wanted to minimize having tightlyclustered islands that would yield little in the way of ABA and BAB triplets ( and , respectively).
The practical limitation of attempting to fit various “islands” of A nodes (black) into a surrounding “sea” of B nodes (white) meant that there were not quite enough B nodes to act as borders around the more compact sets of A nodes. Thus, the pattern in the right half of grid (a) is a bit more compressed than originally planned.
The original plan was that out of 256 nodes in the grid, half would be on the right, and half on the left; 128 nodes on each side. Of these, for each side, 64 nodes were to be in state A (black). Of these nodes (per side), sixteen (16 nodes) would be used to create a large, paisleyshaped island. The remaining 64  16 = 48 nodes would be used for smallersized islands; two islands of eight nodes each, etc. The plan is shown in Figure 6. The notation of “center” and “offcenter“ refers to the placement of the various islands; the largest (16node) islands were to be placed moreorless in the center of each of their respective (left or right) sides of the grid, and the remaining islands were to be “offcenter”; situated around their primary respective large islands.
The resulting patterns were close to the original plan, although not exactly the same. (Again, for details, see Figure 6.)
Even though some changes had to be made to the original design plan, the original constraint, that the number of units in states A and B would be identical (128 nodes in each), was kept. The details are shown in Figure 6.
The validation step for this stage of code development was to manually count all the configuration variables for several different configuration grids, such as the ones shown in Figure 5.
The counts for the “scalefreelike” grid shown in Figure 6 are shown in Figure 7. It suffices to say that the results from the manual counting (of all configuration variables) and those created by the computer code were identical. These held true across several different grids with different node configurations.
Note: To achieve the fractional variables shown in Figure 3, and also in Table 1, the following relations are used:

,

, for and , accounting for the degeneracy with which occurs,

, for and , accounting for the degeneracy with which occurs, and

, for and , , accounting for the degeneracy with which and occur.
Note: The exact details of the row counts are difficult to read in Figures 7 and 8; the original diagrams are in a corresponding slidedeck that will be available in the associated GitHub repository; see details at the end of this document.
The second configuration, for an “extremerichclublike” configuration, is shown in Figure 8.
As a contrast to the grid configuration used in Figures 6 and 7, I created a second configuration that had only one large compact region of nodes in state A, which was wrappedaround the grid envelope, as shown in Figure 8. This configuration was designed to maximize the number of pairwise and triplet configurations that put “likenearlike.” The previous configuration, shown in Figure 6, was more in the direction of “likenearunlike.”
The purpose of having configurations with such different dispersions among the configuration variable values was that they would putatively yield different hvalues, or correspond to different points on an equilibrium curve for the free energy equation (in the case of equiprobable units in states A and B). As I have analytic results for that free energy minimum curve (the equilibrium point for the free energy at different hvalues, or interaction enthalpy values), it would serve as both a useful experiment and V&V test. These results are discussed in the following Section 3.
The V&V for the initial stage of code development; ascertaining that the configuration variable counts were as they should be, was complete.
There is an accompanying slidedeck that documents the code block structure and provides other important elements of code documentation (other than V&V); this will also be available on GitHub; see the end of this document for details.
By far, the most complex element of the “configuration variable counting” code was in counting the triplets. The V&V step ensured that the counts wrapping around from right to left, and from top to bottom (creating a completelywrapped envelope of the initial 2D grid) performed as desired and expected.
3 Verification and Validation of Computing the Thermodynamic Variables
There were two primary means for obtaining validation that the code computing the thermodynamic variables was correct:

Comparison with analytic for the equiprobable case – for the case equiprobable distribution among the variables (), I have developed an analytic solution, which gives a means for comparing the codegenerated results against the expected (analytic) results, and

Comparison with analytic for the case where the interaction enthalpy is zero – the second means to check the codegenerated results is for the case where the distributuion of values is not equiprobable, however the interaction enthalpy is set to zero (), and thus the exact distribution of other configuration values can be precisely computed, allowing further for exact analytic computation of thermodynamic variables.
The previous section described the patterns generated for the validation of configuration variable counting. It was interesting to see how the thermodynamic variables emerged for the systems described there, however (as will be illustrated here), certain of those system were not at equilibrium, even though they had equiprobable distribution of values. As these results are more in the realm of theory and less V&V, they will be discussed elsewhere.
The realization that manuallygenerated patterns would not necessarily be at equilibrium meant that I needed to have test cases where the patterns would indeed be at equilibrium; this required not only random generation of patterns, but also that they be modified so that their associated free energy values achieved minimum. This generationandmodification process is described more thoroughly in the following Section 4.
3.1 Validation support: analytic solution
The analytic solution for the case where can be found when we are using the full interaction enthalpy term of . This solution is similar to the more limited enthalpy equation, used in [3] as well as in the predecessor work [9], where .
The free energy equation for a 2D CVM system, including configuration variables in the entropy term, is
(1)  
where and are Lagrange multipliers, and we have set .
Note: the full derivation of the 2D CVM free energy is presented in [9], and the preceding equation corresponds to Equations (2)–(14) of that reference. Also, the single enthalpy parameter here is , with the enthalpy parameter for unit activation implicitly set to zero, as the earlier intention was to solve the above equation for an analytic solution, which was possible only in the case where , meaning that the perunit enthalpy activation parameter .
(2) 
The approach that I am using currently is to take the same enthalpy equation as originally advocated by Kikuchi [8], [10], which gives
(3) 
Both of these equations are found using equivalence relations, specifically
(4)  
(5) 
I have previously found analytic solution for this equation, for the condition where , it is
(6) 
When the more complete enthalpy expression is used, viz. , the analytic solution becomes
(7) 
(Note: the full derivation of these results will be published separately.)
The experimentallygenerated results from probabilisticallygenerated data sets correspond to the analytic results in the neighborhood of . The reason that the range is so limited is that the analytic solution makes use of equivalence relations as expressed above. The resulting solution has divergences at and .
The comparison is shown in the following Figure 9. In this figure, the column in the table marked as z3Analyt1 corresponds to results from Eqns. 3 and 7 (the current approach) and in the next column, z3Analyt2 corresponds to results from Eqns. 2 and 6 (the previous approach).
The graph is shown in the following Figure 10.
The divergent behavior in the analytic solution is likely due to the use of equivalence relationships, as identified in Eqn. 4.
3.2 Validation support: basic thermodynamic results
The following Figure 11 shows the results when , which is the case where all of the results should conform with the analytic solution.
4 Verification and Validation of Free Energy Minimization
It is not enough to simply compute the thermodynamic variables for a given 2D grid configuration; it is important to have a mechanism by which the pattern of node activations on the grid can adjust in order to reach a free energy minimum.
I accomplished this by writing the code for two stages:

Bring close to desired value, and

Adjust configuration variables to achieve free energy minimum.
Adjusting total number of nodes to achieve desired :
The code has an initial specification for the desired value (in **main**), and randomly generates a 2D CVM grid according to a probabilistic assignment of “1” (state A) or “0” (state B
) to the units in the grid. However, just because the probability (of the random number generation) is set to a specified value (say, 0.35) does not mean that the resulting total of state A nodes will be precisely 0.35 of the total number of nodes (e.g., 0.35 * 256, or 90 nodes); thus, a few nodes will have to be “flipped” in order to bring the actual number of nodes in state
A closer to the desired value.The code specifies a tolerance value for how close the actual value needs to be to the desired value. It runs a function to randomly select and flip unit values (as needed, going in the right direction), and continues this until the resulting actual is within tolerance of the desired value.
Validation: Printing out the actual values for , ensuring that they are within tolerance of the desired value.
Adjusting configuration variables to achieve free energy minimum:
There is no guarantee (in the current version of the code) that the free energy minimum is actually met; instead, the code will run this entire process (generating a new grid, adjusting for within tolerance, and then adjusting the units so that free energy is progressively decreased) for a specified number of trials. During the debug phase, the number of trials was between 1  3, so that I could closely monitor the process. During actual runs, the trials were typically 10  20; there was not much variability in the results.
The goal of this process is to keep adjusting the grid units so that free energy is decreased. For each run, there is a constant value for . That means, before any nodes are flipped, the program will (randomly) find a node in state A, and another node in state B. It will flip the two (from state A to state B, and vice versa). It will then compute the new free energy; this requires recomputing the entire set of configuration variable values. While is held constant with this process, it is likely that all other configuration variables (, , and ) will change.
The program computes the new free energy value (using the new configuration variable values as well as the hvalue that is being tested for the run). If the free energy is lower, the change in the units is kept. If not, both units are reverted back to their original values.
The trials are strictly probabilistic for this generation of code development; there is no attempt to find nodes whose topographic position (i.e., sets of neighbors, nearestneighbors, and triplets) would be most likely to produce a free energy decrease if the node were to change.
One version of the code is designed less to run multiple trials, and more to collect, print, and plot the thermodynamic variables over a series of attempts to flip nodes and test the resulting free energy.
One validation step is visual observation of the thermodynamic variables over the course of any one of these trials; noting that the free energy does, in fact, decrease.
Another validation step is that when (), there is no interaction energy. In this case, the final configuration variable values should be very close to their probabilistic likelihoods. Thus, for example, when and , we expect that , etc. Thus, it is possible to compare the actual resultant configuration variable values with the probabilistic expectancies.
A final validation step is to compare the resulting behaviors against the theoretical expectations. This is discussed more fully in the following subsection.
4.1 Validation Support: Exemplar Code Run
An example is shown in the following Figure 13.
This data is actually from a perturbation run, where the 2D grid is established as previously described, and then perturbed by a given amount (in this case, a fraction of 0.1 of the existing nodes are flipped), and then taken to free energy minimum a second time.
These results were obtained from the program 2DCVMperturbexpt12b20180112.py, run on Friday, Jan. 12, 2018.
The parameter settings were for and , with a total of twenty trials () for each hvalue. A data table from this run is shown in Figure 14. All reported results for configuration variable and thermodynamic values are averages over runs, where .
4.1.1 Validation support: results
The values observed for conform to expectations. In Figure 13, is shown in green, as (in order to bring the values within the same visual range as other results).
When , , which is the expected result. (The true expected results is ; the observed value of is an average over twenty trials. The deviance from the theoretical expectation is acceptable. )
When , the values are greater, and when , the values are smaller. In fact, ranges from (when ) down to (when ). These again are expected results. A separate document will address the theoretical expectations in more detail. In brief, when , then , meaning that the interaction enthalpy parameter is negative. When is negative, the enthalpy is decreased by increasing , as the interaction enthalpy multiplies the term . Thus, maximizing is expected when .
There is a limit as to how far can be increased; presumably it can approach , however, that would mean that the units were arranged in a strict checkerboard manner; that there were no instances of likenearlike at all. This is rather difficult to achieve; both in creation of highlyordered systems, and in this particular code, which uses a simplistic findandflip strategy.
As previously noted, when , the values are smaller. This is the case where , and system enthalpy is decreased when is made smaller. Thus, the system moves more towards a likewithlike configuration (increasing and ); maximizing the size of the various “islands,” and decreasing the size of their borders (minimizing ).
There is a practical limit as to how far can be decreased; there will always be a border area between the state A islands (or even a single, massive state A continent) and the surrounding sea of state B units. This means that will not get close to zero. The actual practical limit for will actually depend on the total system size (total number of nodes), because the border area will progressively decrease (although not disappear) as more and more islands join to become continents. Thus, the value of , which occurs when , is not surprising.
Once is pushed to a suitably small value, it becomes increasingly difficult for the simple findandflip strategy to (randomly) find nodes where the flip will accomplish a free energy reduction. This is likely why there is general stability in the values beyond ; there are simply not that many nodes where the flip will do much good, keeping in mind that two nodes (each in a different state) have to be flipped in order to maintain the value.
Thus, the preliminary conclusion is that free energy minimization is being accomplished, and that the values are behaving as expected.
4.1.2 Validation support: results
Again referencing Figure 13, we examine the curve for delta (shown in cyan), defined as , which is the actual term that is multiplied by to achieve the interaction enthalpy term. (The delta curve is shown in dark green in this figure.) This curve behaves as expected.
In particular, we note that there is a nearly linear behavior in the region between and . When , delta = 0.2035 (according to the data table shown in Figure 14. When , delta = 0.3730. When , we would expect that there would be purely probabilistic distribution of units into their configurations, and thus expect that , , and (as mentioned earlier). We would have then that . The actual value is delta = 0.0887, which is acceptably close.
Similar arguments hold for the expected and observed values of delta as did for in the preceding discussion.
We again note that the values for delta level out as h increases beyond 1.2; this is because there are not that many units that the simple findandflip strategy can easily find. In particular, we note that the value at is typically around , which is very small.
In particular, we observe that this value indicates that we have pushed the system to its limit for minimizing , which is the AAB configuration. This value indicates a border of a rather large island of state A units in a sea of B units. Specifically, for the 256unit system that is the subject for this investigation, when , then triplets involve border units around islands / continents of state A. This is approximately 1/5th of the total number of units available. This suggests that we have pushed the system about as far as it can go. Of course, a visual inspection of the resulting grid would be enormously useful in confirming these assessments. This will be included in a subsequent document.
4.1.3 Validation support: thermodynamic results
The enthalpy is maximum when , which is to be expected. As we minimize free energy, we minimize enthalpy. As soon as we introduce some nonzero interaction enthalpy, we have an opportunity to adjust the configuration values (specifically the , as just discussed) to lower the enthalpy.
The entropy is similarly at a maximum (negentropy is at a minimum) when . The negative entropy increases for nonzero values of h, as expected.
We particularly note that in the vicinity of , or more generally, in the range of
, the variances in the entropy and enthalpy are approximately on the same scale; one does not appreciably dwarf the other.
When we move beyond , we find that the enthalpy term strongly dominates the entropy, and thus dominates the free energy. It does this because we are increasing the value of the interaction enthalpy coefficient, , and not because we are gaining any appreciable difference in the configuration values. As noted in the previous discussions, these values have moreorless stabilized in this range.
Thus, increasing beyond does not serve any useful value, suggesting a practical bound on hvalues for this kind of system.
Our actual and practical choices for the hvalues should be based on the kind of behavior that we want to see in the configuration values. For modeling brainlike systems, we will most likely want , as that induces likewithlike clustering, which seems to characterize certain neural collectives.
5 Code included in this V&V description

2DCVMperturbexpt12b20180112.py  perturbation analysis with userspecifiable (in **main**) values for , , , and many other parameters.
Code Availability: All code referenced here will be made available on a public GitHub repository (https://github.com/ajmaren/2DClusterVariationMethod) after a short delay from initial publication of this V&V document. This and related code will be supported by extensive documentation, which will also be placed in the GitHub repository. Anyone desiring access to the original code prior to its placement on a public GitHub repository should contact A.J. Maren at: alianna@aliannajmaren.com.
Copyright: All code referenced here has been independently develeoped by A.J. Maren. A.J. Maren holds the copyright to both the code and this V&V document itself. arXiv is granted nonexclusive and irrevocable license to distribute this article.
References
 [1] A. Maren, “Free energy as driving function in neural networks,” in Symposium on Nonlinear Theory and Its Applications, (Hawaii), pp. 89–93, December 510 1993. doi:10.13140/2.1.1621.1529.
 [2] A. Maren, E. Schwartz, and J. Seyfried, “Configurational entropy stabilizes pattern formation in a heteroassociative neural network,” in IEEE Int’l Conf. SMC, (Chicago, IL), pp. 89–93, October 1992. doi:10.1109/ICSMC.1992.271796.
 [3] A. Maren, “The cluster variation method: a primer for neuroscientists,” Brain Sciences, vol. 6, no. 4, p. 44, 2016.
 [4] K. Friston, M. Levin, B. Sengupta, and G. Pezzulo, “Knowing one’s place: a freeenergy approach to pattern regulation,” J. R. Soc. Interface, vol. 12, p. 20141383, 2015. doi:10.1098/rsif.2014.1383; available online at: http://dx.doi.org/10.1098/rsif.2014.1383.
 [5] K. Friston, “The freeenergy principle: a unified brain theory?,” Nat. Rev. Neurosci., vol. 11, p. 127–138, 2010. doi:10.1038/nrn2787.
 [6] K. Friston, “Life as we know it,” Journal of The Royal Society Interface, vol. 10, no. 86, 2013.

[7]
M. J. Beal,
Variational algorithms for approximate Bayesian inference
. PhD thesis, University College London, 2003. PDF: http://www.cse.buffalo.edu/faculty/mbeal/papers/beal03.pdf.  [8] R. Kikuchi, “A theory of cooperative phenomena,” Phys. Rev., vol. 81, no. 81, p. 988, 1951.
 [9] A. Maren, “The Cluster Variation Method II: 2D grid of zigzag chains: Basic theory, analytic solution and free energy variable distributions at midpoint (x1 = x2 = 0.5),” Tech. Rep. THM TR2014003 (ajm), Themasis, 2014. doi:10.13140/2.1.4112.5446.
 [10] R. Kikuchi and S. Brush, “Improvement of the Cluster Variation Method,” J. Chem. Phys., vol. 47, p. 195, 1967.
Comments
There are no comments yet.