Learning about knowledge: A complex network approach

This article describes an approach to modeling knowledge acquisition in terms of walks along complex networks. Each subset of knowledge is represented as a node, and relations between such knowledge are expressed as edges. Two types of edges are considered, corresponding to free and conditional transitions. The latter case implies that a node can only be reached after visiting previously a set of nodes (the required conditions). The process of knowledge acquisition can then be simulated by considering the number of nodes visited as a single agent moves along the network, starting from its lowest layer. It is shown that hierarchical networks, i.e. networks composed of successive interconnected layers, arise naturally as a consequence of compositions of the prerequisite relationships between the nodes. In order to avoid deadlocks, i.e. unreachable nodes, the subnetwork in each layer is assumed to be a connected component. Several configurations of such hierarchical knowledge networks are simulated and the performance of the moving agent quantified in terms of the percentage of visited nodes after each movement. The Barabási-Albert and random models are considered for the layer and interconnecting subnetworks. Although all subnetworks in each realization have the same number of nodes, several interconnectivities, defined by the average node degree of the interconnection networks, have been considered. Two visiting strategies are investigated: random choice among the existing edges and preferential choice to so far untracked edges. A series of interesting results are obtained, including the identification of a series of plateaux of knowledge stagnation in the case of the preferential movements strategy in presence of conditional edges.


page 11

page 17


Sampling-based Estimation of In-degree Distribution with Applications to Directed Complex Networks

The focus of this work is on estimation of the in-degree distribution in...

Patrolling on Dynamic Ring Networks

We study the problem of patrolling the nodes of a network collaborativel...

Pebble Guided Near Optimal Treasure Hunt in Anonymous Graphs

We study the problem of treasure hunt in a graph by a mobile agent. The ...

Almost-Optimal Deterministic Treasure Hunt in Arbitrary Graphs

A mobile agent navigating along edges of a simple connected graph, eithe...

On the Distance Between the Rumor Source and Its Optimal Estimate in a Regular Tree

This paper addresses the rumor source identification problem, where the ...

Matrix function-based centrality measures for layer-coupled multiplex networks

Centrality measures identify the most important nodes in a complex netwo...

On Attractors of Isospectral Compressions of Networks

In the recently developed theory of isospectral transformations of netwo...

I Introduction

Science is the art of building good models of nature, including science itself. This is the subject of the present article, i.e. to revisit the problem of modeling how knowledge is represented and acquired in the new light of complex network research.

Modeling involves representations of the phenomenon of interest as well as the dynamics unfolding in such representations in a way which should be systematically consistent with repetitive confrontation with experimental data. Because of their generality, complex networks Albert and Barabási (2002); Dorogovtsev and Mendes (2002); Newman (2003) provide a natural and powerful resource for representing structures of knowledge, where facts are expressed as nodes and relations between facts are indicated by edges. Such an approach allows the process of knowledge acquisition to be modeled in terms of the number of nodes (or edges) visited during walks through the knowledge network representation. The present work describes a simple approach to knowledge acquisition based on complex networks and single-agent random walks.

The plan of the article is as follows. After revising the main related works, focusing on knowledge representation and random walks in scale free networks, each of the hypotheses adopted in our model are justified and discussed. Among other issues, it is shown that hierarchical networks arise naturally as a consequence of composition of the prerequisites implied by the conditional links. Absence of deadlocks (in the sense of node unreachability) in conditional transitions are avoided by providing that the networks at each layer correspond to a connected component(i.e. any node in a layer can be reached through paths from any other nodes in that same network). Hierarchical complex networks (e.g. da F. Costa (2004); da F. Costa and Diambra (2005)) include a series of layers, each containing a subnetwork, which are interconnected through subnetworks. In the proposed model, conditional links are restricted to those connections between successive layers. Two types of random walks are considered, involving random transitions as well as transitions favoring new links. The simulations and respectively obtained results are presented and discussed next, followed by the development of an analyitical model of plateaux formation. The article concludes by presenting several perspectives for further developments.

Ii A Brief Review of Related Concepts and Developments

The subject of knowledge representation provides one of the main issues in artificial intelligence (e.g. 

Russell and Norvig (2002); Jackson (1985); Sowa (2001)). Several discrete structures, including graphs and trees, have been considered for the representation of knowledge. Of particular interest are semantic networks, which code each concept as a node and several relationships between such elements (e.g. proximity, precedence, relative position, etc.) are encoded as edge labels. However, such structures are mainly considered as a reference for inferences during pattern analysis, not as a substrate over which to perform walks or explorations. The possibility to connect nodes through logical expressions associated to nodes has provided one of the main features of random Boolean networks Gershenson (2004); Iguchi et al. (2005). These expressions have been used mainly to combine local states of the nodes, not to control random walks. The possibility to associate control on the flow between nodes in graphs has been adopted in Petri nets (e.g. Reisig (2001)), which has been used mainly for simulating computing and logical circuits. The subject of random walks itself corresponds to a well-developed and important area in statistical mechanics (e.g. ben Avraham and Havlin (2000)). The analysis of random walks in scale-free networks has been addressed by Tadic in Tadic (2001, 2003) regarding a special type of network aimed at simulating the Web, and by Bollt and Avraham Bollt and ben Avraham (2005) and Noh and Rieger Noh and Rieger (2004) considering recursive and hierarchical scale-free networks, the latter being concerned with the deterministic type of hierarchical network proposed in Ravasz et al. (2002). Random hierarchical networks similar to those considered in the present work have been introduced in da F. Costa (2004); da F. Costa and Diambra (2005).

Iii Hypotheses

Representability of Knowledge as a Network: The basic assumption underlying the present work is that knowledge can be represented as a complex network. First, it is understood that knowledge can be partitioned into chunks which are henceforth represented as network nodes, while relations between such subsets are represented as edges. Two types of edge transitions are considered in this work: free and conditional. In the first case, one is allowed to move freely from a node to the neighboring node, and to come back. The latter type of transition requires the moving agent to have passed first along a set of nodes which represent the condition for the movement. The process of learning can then be modeled in terms of the number of nodes (or edges) during walks proceeding along the respective knowledge network.

Figure 1(a) illustrates a free transition between two subsets and of knowledge, while the example in (b) expresses the simplest conditional case where the moving agent can go from to , but not from to , unless it has already made at least one move from to . It is also possible to have hybrid situations such as those depicted in Figure 1(c), where can be reached from or , but only can be reached from . In order to allow the representation of multiple conditions (i.e. the fact that can only be reached after visiting , being a positive integer) we introduce the concept of token controlled network. This multiple conditional case is illustrated in Figure 1(d). Here, the subset of knowledge can only be reached after visiting and (in any order). In other words, it is as if the agent would be collecting a token from each of the required nodes, which it keeps henceforth as keys allowing them to proceed through the respective conditional nodes. In the present work, it is assumed that all conditional nodes, i.e. those having tokens required for movement to a node , are connected to node through directed edges. It is also possible to have alternative multiple conditions, as illustrated in Figure 1(e), where the labels associated to the edges identify the respective conditional structures. In this case, can be reached if and only if both and were visited before; or after visiting and . The case in which a node can be accessed after visiting or is represented by two undirected edges, without associated labels, from those nodes to . In brief, the free edges are represented by undirected edges, and the conditional by directed arrows with associated identifying labels. Alternative multiple conditions are not considered in the present work in order to limit the complexity and number of parameters in the experimental and analytical characterization of the dynamics of knowledge acquisition.

Figure 1: Types of relationships between knowledge subsets (i.e. nodes): equivalence (a); implication (b); hybrid relationship involving equivalence and implication (c); multiple implication (d) and alternative implications (e). 

Regarding the movement of agents along such networks as they integrate the knowledge available from the nodes, it is natural that a free transition can be tracked in any direction. However, a conditional edge from and to is considered to be direction restrictive only until is reached for the first time (after visiting and ), becoming a free edge henceforth. This type of dynamics is implemented in order to express the fact that once knowledge about , and is achieved, i.e. the conditional transition is mastered, it becomes possible to reach any of the conditions from node .

Hierarchical Knowledge Networks: The indiscriminate incorporation of the multiple conditions into a network can easily lead to deadlocks such as that illustrated in Figure 2. This deadlock is a direct consequence of the fact that there is no path connecting and in the network represented in this figure. We henceforth assume that the knowledge network is consistent, in the sense that all nodes should be reachable. In addition, the several prerequisites between the portions of knowledge assigned to nodes naturally define a hierarchy along the networks. For instance, some knowledge at node may require previous visits to nodes and , which we shall represent as . The access to and may demand previous visits to nodes and and and , respectively, represented by the composition of prerequisites , implying two hierarchies. Therefore, a network of consistent knowledge can be general and naturally organized as a hierarchy of layers, with the first layer corresponding to all nodes which have no prerequisites, while the remainder of the nodes are partitioned into layer by the composed prerequisites. Each layer contains a connected subnetwork (i.e. any node in the subnetwork can be reached through at least one path from any node) which is interconnected, via conditional edges, to nodes in layer . It is important to note that, although related to previous works such as da F. Costa (2004); da F. Costa and Diambra (2005), the hierarchical organization for knowledge representation is self-contained and follows naturally from knowledge consistence and the composition of prerequisites between nodes. The total number of layers is henceforth expressed as , the number of nodes in layer as , while the number of nodes in the whole hierarchical network is denoted as .

Figure 2: The indiscriminate use of multiple conditional implications quickly leads to deadlocks such as that illustrated here. The subset of knowledge in node can never be reached by an agent starting at or , as there is no connection between these two nodes. 

Figure 3 illustrates a simple hierarchical knowledge network containing three layers. For simplicity’s sake, hybrid relationships or alternative implications are not considered henceforth. In addition, all network layers are assumed to be of the same type (e.g. random or Barabási-Albert — BA) and have the same number of nodes and average node degree. The nodes at the highest hierarchy are called assumptions, and are the place where all the walks start. Note that the highest hierarchical levels are found at the lowest portion of Figure 3.

Figure 3: A example of simple hierarchical network. 

The set of interconnecting networks is also of uniform type and have the same number of nodes and edges. In the current work, these subnetworks can be of random (i.e. Erdős-Rényi) or BA types, defining how the subnetwork in layer connects to the nodes in layer . Figure 4 illustrates how such interconnections are henceforth understood. The layers (Figure 4(a)), and (Figure 4(c))are to be connected through the interconnection subnetwork in Figure 4(b). Each edge in the interconnection layer implies that node in layer is connected to node in layer and that node in layer is connected to node in layer . Note that although a more flexible interconnecting scheme could be achieved by using directed interconnecting networks, the present study considers all layer and interconnecting networks to be undirected because such a structure favors the analytical model developed in Section VII without loss of generality except for the respectively implied doubled average node degree. The connections implemented by the three subnetworks in Figure 4(a-c) are illustrated in Figure 4(d).



Figure 4: Two layers (a,c) and one interconnecting (b) simple subnetworks and the section of the hierarchical network respectively implemented (d). 

Iv Computational Implementation

Knowledge networks involving the free and conditional edges described above can be conveniently represented in terms of an extended adjacency matrix 111The term weight matrix has been deliberately avoided here because the values (labels) in the matrix are more related to the adjacency between nodes than to weights. henceforth represented as . Each node is labeled by consecutive integer values . The equivalence between two nodes and is indicated by making and . The single conditional connection from node to is represented as and . Note that such an assignment implements the adopted strategy that an implication edge can be backtraced unconditionally. The multiple conditional transition from to is represented as and . Figure 5 illustrates an extended adjacency matrix considering BA models for layer and interconnecting networks.

Figure 5: Example of extended adjacency matrix considering BA layers and random interconnections. The conditional connections are represented in white and the equivalence edges in gray. 

The moving agent keeps at all times a vector

of visited nodes and an individual adjacency matrix , which are continuously updated after each movement. The agent is granted to know about all feasible connections emanating from the current node , while the feasibility of a given edge is decided by taking into account its list of visited nodes. More specifically, an edge will be feasible, and accessible to the agent, in case it has already visited the required nodes and collected the respective tokens.

The movement strategies described in the two following subsections have been considered in the reported simulations.

iv.1 Random choice of edges

In this case, the next edge to be taken from the current node

is drawn with the same probability between all the feasible connections between

and all other nodes. By feasible connection it is meant either a free edge or a conditional edge for which all conditions have already been met.

iv.2 Preferential choice of edges

Unlike the previous case, the free edges which have not yet been tracked are considered first, with uniform probability. In case no such edges exist, the next edge is drawn uniformly among the other allowed movements, i.e. free links which have already been tracked and enabled conditional links which remain untracked. The exclusion of the untracked conditional links, even if enabled, from the preferential movements is considered in order to express the fact that such a kind of knowledge enlargement is more demanding than exploring first the untracked unconditional connections.

Note that in neither case the agent uses its knowledge about the status (i.e. already visited or not) of the node to which the emanating edges lead to. Although more sophisticated moving strategies which make full use of the information stored in partial graph stored with the agent can be devised, including the choice of shortest paths to unvisited edges, they are not pursued further in the present work.

The two strategies above aim to represent, though very naïvely and incompletely, two possible ways to acquire knowledge. In the first case, no distinction is made between a new or already taken relation. It is as if the researcher (i.e. the agent walking through the network) is equally interested in revising a relationship or seeking for new possible connections. In the second visiting scheme, the researcher is more actively interested in exploring new relationships, resorting to already tracked connections or enabled conditional links only in case no untracked free links are available. Intuitively, the second strategy would seem to be more effective in finding new knowledge, by covering the edges more effectively.

V Simulations

For simplicity’s sake, all simulations reported in this work are restricted to hierarchical networks with network layers, with all layer and interconnecting subnetworks having nodes, implying a total of nodes for all layer subnetworks (larger networks involve much longer execution times). Random and Barabási-Albert models are considered for layers and interconnections. The latter are defined by the number of edges of each new added node, starting with nodes. For each such BA network, an ‘equivalent’ random network – in the sense of having the same average degree and number of edges – is obtained. Since the average degree of a BA network with edges per node is known to be


the Poisson rate of the equivalent random (Erdős-Rényi) network with the same number of nodes and same average degree can be verified to be given as


The above result follows from the fact that in an Erdős-Rényi network we have . The number of edges in any of the BA or random networks can be calculated as


Therefore, in this work the values of are used to define the connectivity of the BA models and then of the respective random counterparts.

Three configurations have been chosen for the BA layer models: and , while eight configurations are considered for the interconnecting networks: . Table 1 shows the expected values of number of edges , average degree , and Poisson rate for values of ranging from 1 to 10.

1 20 2 0.105
2 40 4 0.210
3 60 6 0.316
4 80 8 0.421
5 100 10 0.526
6 120 12 0.631
7 140 14 0.737
8 160 16 0.842
9 180 18 0.947
10 200 20 1.05
Table 1: Values of for the BA models and the respective total number of edges , average degree and equivalent Poisson rate expected for each subnetwork (layer or interconnecting) with nodes.

The following configurations were addressed in the reported simulations:

(i) all layers and interconnecting subnetworks are BA; (ii) all layers subnetworks are random and all interconnecting networks are BA; (iii) all layers subnetworks are BA and all interconnecting subnetworks are random;
(iv) all layers and interconnecting subnetworks are random.

Each of the above configurations was investigated while considering two visiting strategies: (a) allowed edges are chosen randomly; and (b) if available, untracked allowed edges are selected randomly, otherwise allowed tracked edges are selected randomly. In order to assess the effect of the conditional edges between successive layers, counterparts of each considered configuration interconnected by unconditional networks have also been simulated and had their performance quantified. Although several alternative or complementary performance indices could have been considered, for simplicity’s sake our attention is restricted to the percentage of visited nodes and percentage of visited edges at time instant

. The speed of knowledge acquisition can be estimated by taking the time derivative of this quantity, i.e.

. A total of 100 realizations involving time steps (corresponding to each movement along the walk) have been performed.

Figure 6 shows the learning curves obtained for considering layer networks defined by and five interconnecting layers with for the several combinations of types of networks, presence of conditional connections between layers, and consideration of the random choice of movement. The title of each graph is henceforth organized as , where and indicates the model assumed for the layer and interconnecting networks, identifies the moving agent strategy (random or preferential for new edges), and indicates the type of interconnecting edges (conditional or free/unconditional).

Analogous results obtained for the preferential movements / conditional connections; random movements / unconditional connections and preferential movements / unconditional connections are given in Figures 6,  7,  8 and  9 respectively. In all the remaining figures in this article, the legend bar indicates the density of the interconnections. The values in these legends correspond to the parameter adopted for the BA model, therefore defining the density of interconnections for this model and also for the equivalent random counterpart (see Equation 2).

Figure 6: The percentage of visited nodes in terms of the time for the configuration involving random choice of edges and conditional interconnections. The legend indicates the density of interconnections in terms of
Figure 7: The percentage of visited nodes in terms of the time for the configuration involving random choice of edges and unconditional interconnections. 
Figure 8: The percentage of visited nodes in terms of the time for the configuration involving preferential choice of edges and conditional interconnections. 
Figure 9: The percentage of visited nodes in terms of the time for the configuration involving preferential choice of edges and unconditional interconnections. 

Models considering different connectivities for the layer networks, namely and , have also been simulated and investigated. The percentage of visited nodes obtained for the preferential choice / conditional interconnections situation is shown in Figures 10 and 11, respectively.

Figure 10: The percentage of visited nodes in terms of the time for the configuration involving preferential choice of edges and conditional interconnections. All layer networks consider  
Figure 11: The percentage of visited nodes in terms of the time for the configuration involving preferential choice of edges and conditional interconnections. All layer networks consider  

The percentages of visited edges at each time instant are given in Figures 12 (random movements, conditional interconnections),  13 (random movements, unconditional interconnections),  14 (preferential movements, conditional interconnections) and  15 (preferential movements, unconditional interconnections).

Figure 12: The percentage of visited edges in terms of the time for the configuration involving random choice of edges and conditional interconnections. 
Figure 13: The percentage of visited edges in terms of the time for the configuration involving random choice of edges and unconditional interconnections. 
Figure 14: The percentage of visited edges in terms of the time for the configuration involving preferential choice of edges and conditional interconnections. 
Figure 15: The percentage of visited edges in terms of the time for the configuration involving preferential choice of edges and unconditional interconnections. 

Vi Discussion

The results presented in the previous section are discussed in the following with respect to the two main performance situations considered in this work: number of visited nodes and edges.

vi.1 Knowledge in terms of Visited Nodes

Effects of conditional interconnections Compared to unconditional interconnections, the case of conditional interconnections tends to substantially reduce the knowledge acquisition speed. This was expected indeed, because the conditional interconnections imply the moving agent to stay longer wandering at previous layer networks in order to collect the tokens necessary to proceed into new layers.

Effects of the network models As can be easily inferred by comparing the left and right columns of figures 69, interconnections through BA subnetworks have about the same effect as random networks on the knowledge acquisition in all cases. This is mainly a consequence of the imposed similar connectivity of the BA and random counterpart models used for the interconnecting subnetworks. Similarly, the use of BA or random models for the layer networks also led to minimal effect on the knowledge acquisition dynamics. In brief, the type of network model, BA or random, had little effect on the overall knowledge acquisition efficiency.

Effects of the density of interconnections Denser interconnecting subnetworks tend to decrease the knowledge acquisition speed in the case of conditional interconnections, having little effect for unconditional interconnections (i.e. the learning curves are nearly identical whatever the interconnecting density in Figures 7 and  9). Such a behavior is explained because a larger number of conditional interconnections implies the moving agent to collect more tokens in the previous layers before proceeding to further layers.

Presence of Plateaux The preferential movement strategy defined for conditional interconnections has implied a series of plateaux of knowledge acquisition along the learning curves. The learning curves in Figure 8 are characterized by being preceeded by a quick acquisition stage, followed by the respective plateau, whose width tends to become larger as time goes by. These plateaux indicate a phase of knowledge stagnation, corresponding to the state of dynamics of the system where the walks proceed predominantly over edges in the previous layers, while the conditional links leading to the subsequent layers are not yet feasible. With this respect, it is possible to draw a naïve analogy with a particle moving along a series of chambers limited by successive compartments which are progressively removed. Congruently, the plateaux tend to become larger along time because the walks have each time more alternatives of random movement among the feasible edges. This possibility is corroborated by the fact that the plateaux become more discernible for large interconnectivities (i.e. large values of adopted for the interconnections in BA and random counterparts), which imply more edges between subsequent layers. The (possibly counterintuitive) tendency of the preferential movements to reduce the knowledge acquisition rate when compared to the random strategy can be explained by the fact that in the preferential case the agent is forced to waste time going through untracked edges in both layer and interconnecting networks even cases where most adjacent nodes have already been visited.

An important issue related to the characterization of such plateaux is whether the several individual trajectories obtained during simulation are well represented in terms of their respective average values. Figure 16(a) shows 100 such different trajectories obtained for 5 layers with and , interconnected through networks with

. The average and standard deviations of these trajectories are shown in Figure 

16. It is clear from these figures that the trajectories do present small dispersion, being therefore properly represented in terms of the respective average. Note also the first short plateau with height 0.2, which was not visible in the previous figures because of the smaller resolution of those pictures.

Figure 16: Visualization of 100 different trajectories obtained for 5 layers with and , interconnected through networks with (a), and respective average and standard deviations (b).  

Additional insight about the evolution of the ratio of visited nodes in the presence of conditional edges can be obtained by considering the number of the layer visited by the agent along the random walk steps. Such a curve is illustrated in Figure 17 for specific realizations considering random (a) and preferential (b) agent movements. It is clear from this figure that the preferential random walk implies the agent to explore most of the nodes in the current layer, while seeking for free edges, before proceeding to explore the subsequent layers, therefore implying the formation of plateaux.

Figure 17: The number of layer occupied by the moving agent in terms of the random walk steps for specific realizations considering random (a) and preferential (b) movements. 

Layer Networks with other Connectivities In order to investigate the effect of the connectivity of the layer networks on the overall knowledge acquisition dynamics, the above simulations were performed also for and (all other situations discussed in this subsection refer to layer networks with ). It is clear from Figures 10 and 11 that the larger number of edges in each layer implied by tends to substantially slow down the node coverage and to yield more marked plateaux.

The overall fastest knowledge acquisition was observed for the cases involving free transitions, with some speed up verified for the preferential movement strategy (i.e. Figure 9).

vi.2 Knowledge in terms of Visited Edges

The dynamics of knowledge acquisition can also be quantified in terms of the percentage of visited edges, which provides additional insights about the considered models and strategies. These results are shown in Figures 12 to 9. The curves obtained for random movements (i.e. Figures 12 and 13) are quite similar, indicating that the presence of conditional edges has little effect 0n the edge coverage under the random movement strategy. The results obtained for preferential movements and conditional transitions (i.e. Figure 8) indicate that the edges are covered less effectivey than in the two previous cases, especially for denser interconnections. The fastest coverage of edges was clearly obtained for preferential movements with free transitions (i.e. 15), which is a direct consequence of the preference for new edges imposed by that strategy. Such a fast edge coverage is also accompanied by the fastest node coverage in Figure 9. Also interesting is the fact that, though the edge coverage obtained for random movements (i.e. Figures 12 and 13) resulted quite similar, the node coverage was verified to be much faster in the former situation than in the latter. Actually, the case involving random movements and conditional transitions is characterized by the fact that most nodes are covered after approximately 1500 basic time steps (see Figure 6) even though only a fraction of respective edges have been covered at that time, as indicated by Figure 12.

In order to better understand the preferential dynamics, let us first consider its initial stages, where the agent starts its exploration of the first and second layers. Because preference is given to free untracked edges, the agent tends to remain in layer 1 until most of its free edges are tracked. At this point, not only few free edges remain untracked, but also most conditional links have been enabled. Therefore, in presence of few untracked free edges, the agent considers more frequently movements going through tracked or enabled conditional edges, the latter leading to higher hierarchies. At subsequent stages, when the agent is exploring a higher hierarchy, it will tend to go through the free untracked edges, which now include the conditional edges leading back to previous layers, proceeding into higher layers only when the edges within and between the previous layers have been mostly covered, so that few preferential movements are allowed and the agent now considers more frequently going through already tracked links on the current or previous layers or enabled conditional links leading to higher layers.

The lack of plateaux in traditional random walks in presence of conditional links can now more easily understood as being a consequence of the fact that, by treating free or enabled conditional edges with the same priority, irrespectively of being already traked or not, allows more frequent explorations of the enabled conditional edges leading to higher hierarchies. Note that even at the earliest stages of the exploration depicted in Figure LABEL:layers(a), the agent manages to get as far as the last layer. In this way, plateaux of stagnation are completely avoided. However, it remains an interesting fact that the barriers of conditional links are overcomed with relative ease by the moving agent.

Vii Analytical Model

The several interesting dynamical features so far identified through numerical simulations are investigated further, especially regarding their behavior under scaling of the network sizes and number of layers, in the present section through a simplified analytical model. Though we limit this investigation to preferential random walks in the networks, more specifically the case leading to plateaux, the other situations considered in this article can be treated similarly.

We start by considering the fact that, at step of a random walk preferential to untracked edges, the ratio of visited nodes of a random or BA network with nodes and free links has been been verified, through extensive simulations, to be approximated (at least for ) as


We shall make a small modification of the way in which the conditional edges are considered so as to simplify the analytical characterization of the knowledge acquisition dynamics. More specifically, starting at layer , movements to the subsequent layer will only be allowed after a ratio of visited nodes in has been achieved. The difference between such an assumption and the situations so far considered in the present article is that in the latter situation the moving agent is allowed to explore subsequent layers at any time, provided it holds the respective prerequisites. However, the above simplification holds particularly well when the connectivity between subsequent layers is relatively large with respect to the connectivity between the nodes in each layer, because in such a situation most movements in the subsequent layer will be mostly blocked by the prerequisites and preference to free untracked edges in the current and previous layers.

Let be the average degree at each layer and be the average degree of the interconnecting layer. At the beginning of the random walk, the exploration is limited to layer , so that is reached at a critical step so that which can be calculated through Equation 4.

Afterwards, all conditional links at layer are enabled, so that the exploration of that layer begins. However, as the interconnecting edges are bidirectional, the agent will now exchange between layers and until the ratio is achieved for layer

. The respective occupancy of these two layers can be approximated in terms of the Markov chain shown in Figure 


Figure 18: The movement of the agent between subsequent hierarchical layers can be modeled in terms of Markovian models, as illustrated for two (a) and three (b) layers. Note that the progress to the subsequent layers are blocked until the critical ratio of visited nodes is achieved for the last enabled layer. 

The stochastic matrix

associated to this Markovian system is immediately obtained as


where the factor 2 stands for the fact that one undirected edge in the interconnecting layer implies two conditional links between layer and (see Figure 4).

Although the moving agent will soon be spending the same proportion of time at layers 1 or 2, it is the relative frequency of time steps at which the moving agent remains at or enters into layer that matters for the coverage of the nodes in this layer and respective acquisition of prerequisites. This frequency is immediately given as being equal to the relative number of movements through the two edges leading to node . Tt follows by symmetry that , implying the exploration of layer to be effectively performed at a renormalized given as


Note that is relative to each exploration stage, always starting when the critical ratio of visited nodes is achieved for the last current layer. After liberating layer

for exploration, the Markov model becomes as shown in Figure 

18(b), and the respective stochastic matrix now reads


Note that all subsequent stochastic matrices will share the right-hand lower block with the above matrix, from which a generic probabilistic model can be developed. At a generic hierarchical level , these four elements are given as follows


Because of the inherent symmetry of the transition probabilities in the matrix , the occupancy of each state can be calculated as


As the relative frequency in which each of these transitions are performed is immediately given as , we have that the relative frequency of movements into the last currently enabled layer is therefore given as


By substituting Equations 812 and 13 into Equation 14, it follows that


The evolution of the ratio of visited nodes can therefore be estimated by using Equations 4 (for ) and 16 (for ).


This implies the overall evolution to be composed by subsequent time scaled versions of Equation 4, given by Equation 16 in terms of the value . Consequently, the length of each stage along the liberation of the layers will be given as


where , and are constants defined by and . It is now clear that this length scales proportionally to . Equation 16 also provides the means for analyzing the scaling of with . Because the coefficient of the exponential in , i.e. , corresponds to a product between and , the scaling of the subnetworks size from to will imply for all , i.e. the length of all stages will be equal to . In other words, the overall shape of the ratio of visited nodes will not change when is scaled while all other parameters are kept fixed.

Figure 19 illustrates the evolution of the ratio of visited nodes in a BA network as estimated by the above model assuming , , , , and five layers. Except for the value , these parameters correspond precisely to those considered in the evolution shown in Figure 16. The specific value of was chosen so as to obtain proper fitting between the experimental data and the theoretical model. A good overall adherence can be observed between the analytical and respective experimental evolutions regarding both the lengths and heights of each plateau. Interestingly, the analytical model also captures the fact that progressively smoother transitions are obtained at higher hierarchies. The main difference between these evolutions are related to the the fact that, in the original experiment, the moving agent was allowed to proceed to subsequent layers more freely, i.e. before the critical ratio of visited nodes had been reached. Such a dynamics would contribute to smoothing the lenf-hand side of the evolution curve at each transition, as is the case in the experimental results in Figure 16.

Figure 19: Analytical evolution of the ratio of visited nodes for a random network with , , , , and five layers. 

Viii Concluding Remarks

This article has presented a simple approach to knowledge acquisition based on representation of knowledge as a hierarchical complex network da F. Costa (2004) and the modeling of the process of knowledge acquisition in terms of walks along such networks. Though simple, the considered models incorporate the existence of two types of edges (free and conditional), including multiple conditional transitions where the access to specific nodes are granted only after the agent has visited specific nodes. This movement strategy represents a possibly new mechanism for complex network and random walk researches.

Two visiting strategies have been considered: at random and preferential to still untracked free edges. Simulations considering several densities of connectivity between 5 hierarchical layers have been evaluated with respect to conditional interconnecting networks and unconditional counterparts, and the knowledge acquisition dynamics quantified in terms of the number of visited nodes and edges as a function of time (i.e. each basic movement of the agent). A Markovian analytical model of the learning dynamics is developed for the case of preferential random walks in presence of conditional links, which reproduces the plateaux heights and lengths. This model has allowed the discussion of the scaling properties of the dynamics with respect to the network size and number of layers. Among other findings, the lengths of the plateaux have been verified to be proportional to the number of already explored layers.

Despite the simplicity of the approach, a series of interesting complex dynamics and effects have been identified from the learning curves, including the fact that the preferential movement strategy was slower than the random counterpart for the case of conditional interconnections, as well as the identification of plateaux of stagnation of learning for the latter situation.

The reported work has paved the way to several future works, including the consideration of multiple agents Acedo and Yuste (2002), which may or may not share information about their individual adjacency matrices. Another relevant issue to be incorporated into the model is the fact that the transitions from one node to another, i.e. the inference of some subset of knowledge from another, may not always take the same time. It would therefore be interesting to consider diverse distributions of time-weights along the hierarchical knowledge networks. Also interesting is the fact that the suggested approach and models provide an interesting framework for investigating data flow architectures (e.g. Silc et al. (1999)). This type of parallel computing architecture is characterized by a hierarchical processing flow constrained by dependences between intermediate computing stages, which could be conveniently modeled by the hierarchical complex networks with multiple conditional edges. It would be interesting to consider additional measurements typically used in random walk investigations, such as return time and correlations.

Luciano da F. Costa is grateful to Dietrich Stauffer, Osvaldo Novais de Oliveira Jr. and Gonzalo Travieso for careful reading and commenting on this article, and to FAPESP (process 99/12765-2) and CNPq (308231/03-1) for financial sponsorship.