Network Unfolding Map by Edge Dynamics Modeling

03/03/2016 ∙ by Filipe Alves Neto Verri, et al. ∙ Universidade de São Paulo 0

The emergence of collective dynamics in neural networks is a mechanism of the animal and human brain for information processing. In this paper, we develop a computational technique of distributed processing elements, which are called particles. We observe the collective dynamics of particles in a complex network for transductive inference on semi-supervised learning problems. Three actions govern the particles' dynamics: walking, absorption, and generation. Labeled vertices generate new particles that compete against rival particles for edge domination. Active particles randomly walk in the network until they are absorbed by either a rival vertex or an edge currently dominated by rival particles. The result from the model simulation consists of sets of edges sorted by the label dominance. Each set tends to form a connected subnetwork to represent a data class. Although the intrinsic dynamics of the model is a stochastic one, we prove there exists a deterministic version with largely reduced computational complexity; specifically, with subquadratic growth. Furthermore, the edge domination process corresponds to an unfolding map. Intuitively, edges "stretch" and "shrink" according to edge dynamics. Consequently, such effect summarizes the relevant relationships between vertices and uncovered data classes. The proposed model captures important details of connectivity patterns over the edge dynamics evolution, which contrasts with previous approaches focused on vertex dynamics. Computer simulations reveal that our model can identify nonlinear features in both real and artificial data, including boundaries between distinct classes and the overlapping structure of data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Semi-supervised learning (SSL) is one of the machine learning paradigms, which lies between the unsupervised and supervised learning paradigms. In SSL problems, both unlabeled and labeled data are taken into account in class or cluster formation and prediction processes 

[1, 2]

. In real-world applications, we usually have partial knowledge on a given dataset. For example, we certainly do not know every movie actor except a few famous ones; in a large-scale social network, we just know some friends; in biological domain, we are far away from completely obtaining a figure of the functions of all genes, but we know the functions of some of them. Sometimes, although we have a complete or almost complete knowledge of a dataset, labeling it by hand is lengthy and expensive. So it is necessary to restrict the labeling scope. For these reasons, partially labeled datasets are often encountered. In this sense, supervised and unsupervised learning can be considered as extreme and special cases of semi-supervised learning. Many semi-supervised learning techniques have been developed, including generative models 

[3], discriminative models [4], clustering and labeling techniques [5], multi-training [6], low-density separation models [7], and graph-based methods [8, 9, 10]

. Among the approaches listed above, graph-based SSL has triggered much attention. In this case, each data instance is represented by a vertex and is linked to other vertices according to a predefined affinity rule. The labels are propagated to the whole graph using a particular optimization heuristic 

[11].

Complex networks are large-scale graphs with nontrivial topology [12]. Such networks introduce a powerful tool to describe the interplay of topology, structure, and dynamics of complex systems [12, 13]. Therefore, they provide a groundbreaking mechanism to help us understand the behavior of many real systems. Networks also turn out to be an important mechanism for data representation and analysis [14]. Interpreting data sets as complex networks grant us to access the inter-relational nature of data items further. For this reason, we consider the network-based approach for SSL in this work. However, the above-mentioned network-based approach focuses on the optimization of the label propagation result and pays little attention to the detailed dynamics of the learning process itself. On the other hand, it is well-known that collective neural dynamics generate rich information, and such a redundant processing handles the adaptability and robustness of the learning process. Moreover, traditional graph-based techniques have high computational complexity, usually at cubic order [15]. A common strategy to overcome this disadvantage is using a set of sparse prototypes derived from the data [10]. However, such a sampling process usually loses information of the original data.

Taking into account the facts above, we study a new type of dynamical competitive learning mechanism in a complex network, called particle competition. Consider a network where several particles walk and compete to occupy as many vertices as possible while attempting to reject rival particles. Each particle performs a combined random and preferential walk by choosing a neighbor vertex to visit. Finally, it is expected that each particle occupies a subset of vertices, called a community of the network. In this way, community detection is a direct result of the particle competition. The particle competition model was originally proposed in [16] and extended for the data clustering task in [17]. Later, it has been applied to semi-supervised learning [18, 19] where the particle competition is formally represented by a nonlinear stochastic dynamical system. In all the models mentioned above, the authors concern vertex dynamics—how each vertex changes its state (the level of dominance of each particle). Intuitively, vertex dynamics is a rough modeling of a network because each vertex can have several edges. A problem with the data analysis in this approach is the overlapping nature of the vertices, where a data item (a vertex in the networked form) can belong to more than one class. Therefore, it is interesting to know how each edge changes its state in the competition process to acquire detailed knowledge of the dynamical system.

In this paper, we propose a transductive semi-supervised learning model that employs a vertex–edge dynamical system in complex networks. In this dynamical system, namely Labeled Component Unfolding system, particles compete for edges in a network. Subnetworks are generated with the edges grouped by class dominance. Here, we call each subnetwork an unfolding

. The learning model employs the unfoldings to classify unlabeled data. The proposed model offers satisfactory performance on semi-supervised learning problems, in both artificial and real dataset. Also, it has shown to be suitable for detecting overlapping regions of data points by simply counting the edges dominated by each class of particles. Moreover, it has low computational complexity order.

In comparison to the original particle competition models and other graph-based semi-supervised learning techniques, the proposed one presents the following salient features:

Particle competition dynamics occurs on nodes as well as on edges

The inclusion of the edge domination model can give us more detailed information to capture connectivity pattern of the input data. This is because there are much more edges than vertices even in a sparse network. Consequently, the proposed model has the benefit of granting essential information concerning overlapping vertices. Computer simulations show the proposed technique achieves a good classification accuracy and it is suitable for situations with a small number of labeled samples.

In the proposed model, particles are continuously generated and removed from the system

Such a feature contrasts to previous particle competition models that incorporate a preferential walking mechanism where particles tend to avoid rival particles. As a consequence, the number of active particles in the system varies over time. It is worth noting that the elimination of preferential walking mechanism largely simplifies the dynamical rules of particle competition model. Now, the new model is characterized by the competition of only random walking particles, which, in turn, permits us to find out an equivalent deterministic version. The original particle competition model is intrinsically stochastic. Then, each run may generate a different result. Consequently, it has high computational cost. In this work, we find out a deterministic system with running time independent of the number of particles, and we demonstrate that it is mathematically equivalent to the stochastic model. Moreover, the deterministic model has linear time order and ensures stable learning. In other words, the model generates the same output for each run with the same input. Furthermore, the system is simpler and easier to be understood and implemented. Thus, the proposed model is more efficient than the original particle competition model.

There is no explicit objective function

In classical graph-based semi-supervised learning techniques, usually, an objective function is defined for optimization. Such function considers not only the label information, but also the semi-supervised assumptions of smoothness, cluster, or manifold. In particle competition models, we do not need to define an objective function. Instead, dynamical rules which govern the time evolution of particles and vertices (or edges) are defined. Those dynamical rules mimic the phenomena observed in some natural and social systems, such as resource competition among animals, territory exploration by humans (or animals), election campaigns, etc. In other words, the particle competition technique is typically inspired by nature. In such kind of technique, we have focused on behavior modeling instead of objective modeling. Certain objectives can be achieved if the corresponding behavioral rules are properly defined. In this way, we may classify classical graph-based semi-supervised learning techniques as objective-based design and the particle competition technique as behavior-based design.

The remainder of this paper is organized as follows. The proposed particle competition system is studied in Section II. Our transductive semi-supervised learning model is represented in Section III. In Section IV, results of computer simulations are shown to assess the proposed model performance on both artificial and real-world datasets. Finally, Section V concludes this paper.

Ii Labeled Component Unfolding system

In this section, we give an introduction to the Labeled Component Unfolding (LCU) system—a particle competition system for edge domination—explaining its basic design. Whenever pertinent, we go into detail for further clarification.

Ii-a Overview

We consider a complex network expressed by a simple, unweighted, undirected graph , where is the set of vertices and is the set of edges. If two vertices are considered similar, an edge connects them. The network contains vertices that can be either labeled or unlabeled data points. The set contains the labeled vertices, where a vertex has a label . We also use the terms label and class synonymously—if a vertex is labeled with , we say this vertex belongs to class . The set contains the unlabeled vertices. We suppose that . Thus, we have that and . The network is represented by the adjacency matrix where if is connected to . We denote as the edge between vertices and . For practical reasons, we consider a connected network, and there is at least one labeled vertex of each class.

In this model, particles are objects that flow within the network while carrying a label. Labeled vertices are sources for particles of the same class and sinks

for particles of other classes. After a particle is released, it randomly walks the network. There is equal probability among adjacent vertices to be chosen as the next vertex to be visited by the particle. Consider that a particle is in

, it decides to move to with probability

with denoting the degree of .

In each step, at the moment that a particle decides to move to a next vertex, it can be absorbed (removed from the system). If a particle is not absorbed, we say that it has survived and it remains active; and if it survives, then it continues walking. Otherwise, the particle is absorbed and ceases to affect the system. The absorption depends on the level of subordination and domination of a class against all other classes in the edges.

To determine the level of domination and subordination of each class in an edge, we take into account the active particles in the system. The current directed domination is the number of active particles belonging to class  that decided to move from to at time  and survived. Similarly, the current relative subordination is the fraction of active particles that do not belong to class  and have successfully passed through edge , regardless of direction, at time . Mathematically, we define the latter as

The survival of a particle depends on the current relative subordination of the edge and the destination vertex. If a particle decides to move into a sink, it will be absorbed with probability 1. If the destination vertex is not a sink, its survival probability is

where is a parameter for characterizing the competition level.

A source generates particles according to its degree and the current number of active particles in the system. Let be the number of active particles belonging to class  in the system at time , a source generates new particles if .

Let be the set of sources for particles that belong to class , the number of newly generated particles belonging to class  in at time  follows the distribution

where

and

is a binomial distribution. In other words, if the number of active particles is fewer than the initial number of particles,

, each source performs trials with probability of generating a new particle.

Therefore, the expected number of new particles belonging to class  in at time  is

We are interested in the total number of visits of particles of each class to each edge. Thus, we introduce the cumulative domination that is the total number of particles belonging to class  that passed through edge  up to time . Mathematically, this is defined as

(1)

Using the cumulative domination, we can group the edges by class domination. For each class , the subset is

We define the subnetwork

(2)

as the unfolding of network according to class  at time . We interpret the unfolding as a subspace with the most relevant relationships for a given class. We use the available information in these subnetworks for the study of overlapping regions and for semi-supervised learning.

Ii-B An Illustrative Example

Fig. 1: Illustration of one iteration of the system’s evolution. The network consists of 7 vertices and 10 edges; each color represents a label of a particle or a source. The first and the third networks depict the cumulative domination before and after the iteration. The cumulative domination is the number of visits of particles to an edge since the initial state. In the second network, particles are depicted in small circles. Active particles at time  are depicted in dashed borders, whereas active particles at time  are in full borders. An arrow indicates a particle movement, while an interrupted line indicates that the particle has been absorbed when trying to move through an edge. Particles without an adjacent arrow are generated by the sources at time .

One iteration of the system’s evolution is illustrated by Figure 1. The considered system contains 22 active particles at time  and 20 at time . In an iteration, each particle moves to a neighbor vertex, without preference. The movement of a particle is indicated by an arrow. An interrupted line indicates an edge in which the coming particle is absorbed. A total of 6 particles are absorbed during this iteration, and the sources have generated 4 new particles.

At time , for example, one of the red particles passing through edge  is absorbed due to a current edge dominance of 0.5 in that edge (one red particle and one green particle). Conversely, all green particles that moved through edge  remain active at time . Since there is no rival particle (red particle) passing through this edge, the updated value of the current edge dominance is 1 and 0 for green and red classes, respectively.

In edge , one red and two green particles chose to pass through. One green particle is absorbed without affecting the new current level of dominance. Since one particle of each class successfully passed through edge , the new current level of dominance on this edge is 0.5. The same occurs for edge  where no particles have passed through and, thus, the current level of dominance is set equally among all classes.

In edges  and , particles have tried to move into a source of rival particles (sinks). These particles are absorbed independently from the current level of dominance.

Our edge-centric system can measure the overlapping nature of the by counting the edges dominated by each class, while a vertex-centric approach would have lost such information.

Ii-C Mathematical Modeling

Formally, we define the Labeled Component Unfolding system as a dynamical system . The state of the system is

(3)

where

is a vector, and each element

is the number of active particles belonging to class  in at time . Furthermore, is a matrix whose elements are given by Equation 1.

Let and be, respectively, the number of particles generated and absorbed by at time . The evolution function  of the dynamical system is

Intuitively, the number of active particles that are in a vertex is the total number of particles arriving, , minus the number of particles leaving, , or being absorbed, ; additionally for labeled vertices, the number of generated particles, . Moreover, to calculate the total number of visits of particles to an edge, we simply add up the number at each time. Values , , and are obtained stochastically according to the dynamics of walking, absorption, and generation.

The initial state of the system is given by an arbitrary number of initial active particles and

To achieve the desirable network unfolding, it is necessary to average the results of several simulations of the system with a very large number of initial particles . Thus, the computational cost of such a simulation is very high. Conversely, we provide an alternative system  that achieves similar results in a deterministic manner. More details will follow.

Ii-D Alternative Mathematical Modeling

Consider the dynamical system

(4)

where is a row vector whose elements give the population of particles with label in each vertex at time . These values are associated to the number of active particles of system . The elements and of the sparse matrices and are related to the current directed domination, , and the cumulative domination, , respectively. In other words, gives the number of particles of class that moved from to at time , while gives the total number up to time .

The system  is a nonlinear Markovian dynamical system with the deterministic evolution function

(5)

where is a square matrix with the elements of vector on the main diagonal and stands for the vector-matrix product.

The function of the system  at time gives a square matrix whose elements are

(6)

where

(7)

Given that we know the initial state of the system, the function of the system  at time returns a row vector where the -th element is

(8)

where is a row vector whose elements are , and stands for the inner product between vectors.

The initial state of the system  is given by an arbitrary population size111In system , vector describes the quantity of particles in each vertex. Since has multiplicative scaling behavior, is not necessarily composed only of integer values; values can be a discrete distribution of particles. See Section II-E5 for more details. of initial active particles and

If the initial number of particles in each vertex in system  is proportional to the initial population size in system , we provide evidence that the unfolding result tends to be the same for both systems—represented by Equation 3 and Equation 4, respectively—, as for all and .

Ii-E Mathematical Analysis

In the previous subsections, we modeled two possibly equivalent systems, and . In this section, we present mathematical results that prove the equivalence of the two systems under certain assumptions.

Theorem 1.

Systems and are asymptotically equivalent if the following conditions hold:

for all , , and , we have

for some constant.

In order to prove Theorem 1, we study the following mechanisms of the particle competition system:

Ii-E1 Particle motion and absorption

In the proposed system, each particle moves independently from the others. Particle’s movement through an edge affects the absorption of rival particles only in the next iteration. Such conditions are favorable to naturally regard the system’s evolution in terms of the distribution of particles over the network. Next, we present a formal model for particle movement.

Let

be a discrete random variable that is 1 if particle 

was in at time  and moved into at time ; and it is 0 otherwise. Since each particle in a vertex moves independently, we can write this probability in terms of a particle’s class; that is, for any particle  that belongs to class  and is in at time .

The probability is affected by the movement decision of a particle and whether it was absorbed after the decision. By formulation, in dynamical system  the conditional probability, given that , is

That is, when a particle tries to move into a sink, the survival probability is zero. Otherwise, a particle only reaches if it chooses to move into the vertex and it is not absorbed.

Let

be the probability density function of the random variable

. Hence, the probability is

if or . Otherwise, it is zero.

Furthermore, is convex with fixed values of for all . Thus, with the Jensen’s inequality [20], we have

(9)

Ii-E2 Particle generation

In dynamical system  the expected number of particles belonging to class  generated at at time  is

The conditional expectation is, by formulation,

and thus, is

Since is convex for all and according to Jensen’s inequality, we have

(10)

Ii-E3 Expected edge domination

At the beginning of system  we have

and, for ,

(11)

Given that is known and since each particle in a vertex moves independently, the number of particles that successfully reaches at time  is

where is a particle that belongs to class  and is in . Then, the expected value is

Finally,

(12)

for all , , and .

Ii-E4 Expected number of particles

We know the number of particles at the beginning of system , so

and, for all , the expected value is

However, the expected number of particles that were absorbed in is the expected number of particles in minus the expected number of particles that survived when moving away. Thus, can be written as

And, finally

(13)

for all , , and .

Ii-E5 Scale invariance

The unfolding from system  is invariant under real positive multiplication of the row vector . In order to prove this property, consider the following lemma.

Lemma 1.

System has positive multiplicative scaling behavior of order 1. Given an arbitrary initial state of the system , it means that

(14)

for all and .

Proof:

First, we show that the functions are invariant to parameter scaling. Given an arbitrary system state and ,

since the term can be either

or

Now, consider two arbitrary initial states,

for all . We have that,

and

Thus, Relation (14) holds true for .

Assuming that Relation (14) holds true for some time , we show that the relation holds true for :

since

and

So Relation (14) indeed holds true for .

Since both the basis and the inductive step have been performed, by mathematical induction, the lemma is proved for all natural. ∎

Finally, using these studies, we may prove the theorem.

Proof:

By Equations 13, 12 and 11, we have

which is system  assuming that Inequalities (9) and (10) tend to equality when there is a large number of particles and , for any constant (scale invariance property). ∎

Remark 1.

Even if the convergence of Inequalities (9) and (10) are not true, another property that possibly makes the two systems equivalent is the compensation over time. At the beginning, both systems are equal; however, in the next iteration both absorption probability (9) and generated particles (10) are underestimated. Consequently, particles that have survived may compensate the ones that were not generated. Furthermore, the lower the number of absorbed particles in an iteration, the higher the absorption probability in the next iteration. Likewise, the lower the number of generated particles in an iteration, the higher is the expected number of new particles in the next iteration.

Iii Semi-supervised Learning by Labeled Component Unfolding

Unfoldings generated by LCU system are incorpored in a semi-supervised learning model. Consider two sets and such that for all . Each data point is associated to a label . In the semi-supervised learning setting, our goal is to correctly assign existing labels to the unlabeled data .

In short, the proposed learning model has three steps: a) a network is constructed based on a dataset composed of feature vectors, where vertices represent data points, and edges represent similarity relationship; b) LCU system is applied to obtain the unfoldings, that is, a distinct set of edges for each class of the dataset; and c) infer labels for every data point in .

Next, each step of the proposed learning model is presented in detail. Further to the model’s algorithm description, its computational complexity analysis is also presented.

Since the proposed dynamical system takes place on a complex network, the original dataset needs to be represented in a network structure. Therefore, the first step of our learning model is to obtain a network representation. Each data point is associated to a single vertex of the network. Moreover, the network must be sparse, undirected, and unweighted. Labeled vertices correspond to the set of points in , and unlabeled vertices to the set of points in . Two vertices are connected by an edge if they have a relationship of similarity, which is determined by some metric or by the particular problem. Any graph construction method that satisfies such conditions may be used in this step. The k-Nearest Neighbor (k-NN) graph construction method is one of them.

The second step is to run system  defined by Equation 5 using the constructed complex network as its input. Two conditions are satisfied on the system initialization. First, no class should be privileged. Second, during the first iterations, all particles should be able to flow within the network with a small probability of absorption. Thus, the initial conditions of the system, for all , , and , are

(15)

Since there are always particles in the system, the iteration of system  should be stopped if the time limit has been reached. The time limit parameter controls the maximum number of iterations of the system.

At the last step, the networks are used for vertex classification. We assign a label for each unlabeled vertex , with the information provided by the networks . Label is assigned based on the density of edges in its neighborhood. Formally, the label for is

(16)

where is the neighborhood of in the unfolding . We denote the number of edges in this neighborhood as .

Iii-a Algorithm

1:function Classifier(, , , )
2:      BuildNetwork(, )
3:     subnetworks Unfold(, , )
4:     return Classify(, subnetworks)
5:end function
Algorithm 1 Semi-supervised Learning by LCU.
1:function Unfold(, , )
2:     for  do
3:          (, ) Equation 15
4:          (, )
5:          (, )
6:     end for
7:     for  do
8:         for  do
9:               (, , …, , ) Equation 6
10:               (, , ) Equation 8
11:              
12:              
13:              
14:         end for
15:     end for
16:     return Subnetworks(, ) Equation 2
17:end function
Algorithm 2 Labeled Component Unfolding system.

Algorithm 1 summarizes the steps of our learning model. The algorithm accepts the labeled dataset , the unlabeled dataset , and 2 user-defined parameters—the competition () parameter of the system  and the time limit parameter (). Moreover, it is necessary to choose a network formation technique.

The first step of the learning model is mapping the original vector-formed data to a network using a chosen network formation technique. Afterward, we unfold the network as described in Algorithm 2. This algorithm iterates the LCU system to produce one subnetwork for each class. Steps 2–6 initialize the system state as indicated in Equation 15. Steps 7–15 iterate the system until using the evolution function  (5). Step 16 calculates and returns the unfoldings for each class. Back to Algorithm 1, by using the produced unfoldings, the unlabeled data are classified as described in Equation 16.

Iii-B Computational Complexity and Running Time

Here, we provide the computational complexity analysis step by step.

The construction of the complex network from the input dataset depends on the chosen method. Since is the number of data samples. The k-NN method, for example, has complexity order of using multidimensional binary search tree [21].

The second step is running system  defined by Equation 5. Using sparse matrices, the system initialization, steps 2–6 of Algorithm 2, has complexity order of . The system iteration calculates times the evolution function  (5) represented in steps 8–14. The time complexity of each part of the system evolution is presented below.

  • Step 9, computation of the matrix . This matrix has non-zero entries. It is necessary to calculate for each non-zero entry. Hence, this step has complexity order of . However, the denominator of Equation 7 is the same for all values of .

  • Step 10, computation of the vector . This vector has non-zero entries. It is also necessary to calculate the total number of particles in the system. So, this calculation has time complexity order of .

  • Step 11, computation of the matrix . The multiplication between a diagonal matrix and a sparse matrix with non-zero entries has time complexity order of .

  • Step 12, computation of the vector . Suppose that is the average vertex degree of the input network; it follows that this can be performed in .

  • Step 13, computation of the matrix . This sparse matrix summation has complexity order of .

After the system evolution, the unfolding process performs operations. Thus, the total time complexity order of the system simulation is . However, the value of is fixed and the value of is usually very small.

The vertex labeling step is the last step of the learning model. The time complexity of this step depends on the calculation of the number of edges in the neighborhood of each unlabeled vertex in each unfolding. It can be efficiently calculated by using one step of a breadth-first search in . Hence, the order of the average-time complexity is .

Algorithm Time Complexity
Transductive SVM [7]
Local and Global Consistency [22]
Large Scale Transductive SVM [23]
Dynamic Label Propagation [24]
Label Propagation [25]
Original Particle Competition [17]
Labeled Component Unfolding
Minimum Tree Cut [26]
TABLE I: Time Complexity of Common Graph-based Techniques Disregarding the Graph Construction Step

In summary, considering the discussion above, our learning model runs in including the transformation from vector-based dataset to a network. Table I compares the time complexity of common graph-based techniques disregarding the graph construction step. Only the proposed LCU method and Minimum Tree Cut [26] have linear time, though the latter must either receive or construct a spanning tree. Consequently, the Minimum Tree Cut has a performance similar to the scalable version of traditional algorithms, such as those using subsampling practices.

(a)
(b)
Fig. 4: Running time in seconds of iterations of the system in random networks. (a) The input networks have 400 000 edges and many different numbers of vertices. (b) 2 000 vertices and many different numbers of edges.

Figure 4 depicts the running time of a single iteration of the system varying the number of vertices and edges, respectively. With 10 independent runs, we measure the time for 30 iterations, totalizing 300 samples for each network size. We set , two classes, and 5% of labeled vertices. Experiments were run on an Intel® Core™ i7 CPU 860 @ 2.80GHz with 16 GB RAM memory DDR3 @ 1333 MHz. This experiment shows that the system runs in linear time as a function of the number of vertices and edges, which conforms our theoretical analysis.

Iv Computer Simulations

To study the stochastic system and the deterministic version , we present experimental analyses that concern their equivalence. Additionally, we study the meaning of the parameters of our learning model. After that, we evaluate the model performance using both artificial and synthetic datasets. Then, we show the unfolding process and the learning model on synthetic data. Finally, we present the simulation results for a well-known benchmark dataset and for a real application on human activity and handwritten digits recognition.

Iv-a Experimental Analysis

In this section, we present an experiment that assesses the equivalence between the unfolding results of both systems with an increasing initial number of particles in system .

The networks used for the analysis are generated by the following model: a complex network is constructed given a labeled vector , a number of edges by vertex, and a weight that controls the preferential attachment between vertices of different classes. The resulting network contains vertices. For each , edges are randomly connected, with replacement. If , the preferential attachment weight is ; otherwise, the weight is . The parameter is proportional to the overlap between classes.

Fig. 5: Proportionality simulation. Lines are the correlation measure between the cumulative domination matrices of systems  and , varying the initial number of active particles. Values close to 1 indicate that the cumulative domination matrices of both systems tend to be proportional.

If there exists a positive constant such that

both systems generate the same unfoldings. To assess this proportionality, both systems are simulated in 10 different networks , with vertices arranged in two classes. The system’s parameter is discretized in . Varying the total number of initial particles, we set for all and .

We consider the correlation between the cumulative domination matrices of systems  and . If the two matrices are proportional, then they must be correlated. Values of correlation close to 1 indicate the cumulative domination matrices are proportional. In Figure 5, the correlation is depicted. As the number of initial particles increases, the correlation approaches 1. This result suggests that both systems generate the same unfolding when the number of initial particles grows to infinity.

Iv-B Parameter Analysis

The LCU model has two parameters apart from the network construction. In this section, we discuss their meaning. To do so, the learning model is applied in synthetic datasets whose data items are sampled from a three dimensional knot torus with parametric curve

where and .

Fig. 6: Three dimensional knot torus dataset with 500 samples without noise (left-hand side) and with noise (right-hand side). Colors are the classes.

We sampled 500 data items uniformly along the possible values of . We randomly split the data items from 2 to 10 classes so that the samples with adjacent belongs to the same class. We also added to each sample a random noise in each dimension with distribution with and . Figure 6 depicts an example of the dataset with 4 classes with and without noise. Since the dataset has a complex form, a small change of parameter value may generate different results. Therefore, it is suitable to study the sensitivity of parameters.

We run the LCU model with parameters and . Finally, 30 unbiased sets of 40 labeled points are employed. The -NN is used for the network construction with .

Below, we discuss each parameter of the model.

Iv-B1 Discussion about the network construction parameter

In our model, the input network must be simple (between any pair of vertices there must exist at most one edge), unweighted, undirected, and connected. Besides these requirements, two vertices must be connected if their data items are considered similar enough to the particular problem. In our experiments, we use k-NN graph with Euclidean distance since it is proved to approximate the low-dimensional manifold of point set [27]. The smaller the value of , the better are the results.

Iv-B2 Discussion about the system parameter

Fig. 7: Average error of the proposed model for different numbers of classes in the problem. Colors and shapes indicate the values of parameter .

The LCU system has only one parameter: the competition parameter . This parameter defines the intensity of competition between particles. When , particles randomly walk the network, without competition. As approaches to 1, particles are more likely to compete and, consequently, to be absorbed. Figure 7 depicts the average error of our method with different values of . Based on the figure, our model is not sensitive to . In general, we suggest setting because of better and more consistent classification than other values.

Iv-B3 Discussion about the system iteration stopping parameter

The time limit parameter controls when the simulation should stop; it must be at least as large as the diameter of the network. That way, it is guaranteed every edge to be visited by a particle. Since the network diameter is usually a small value, the simulation stops in few iterations.

Iv-C Simulations on Artificial Datasets

(a)
(b)
Fig. 10: Unfoldings generated by the proposed system at time  on Highleyman dataset. Edges are colored according to the dominating class at the time. Light gray edges stand for edges presented in the original network but not in the unfolding. (a) Vertex position is imposed by the original data points and blue squares represent vertices connected in both unfoldings. (b) Vertex position is not imposed by the original data points and color of the vertices are the result of the classification.

For better understanding the details of the LCU system, in this subsection we illustrate it using two synthetic datasets. Each dataset has a different class distribution—banana shape and Highleyman. (The datasets are generated using the PRTools framework [28]

.) The banana shape dataset is uniformly distributed along specific shape, and then superimposed on a normal distribution with the standard deviation of 1 along the axes. In Highleyman distribution, two classes are defined by bivariate normal distributions with different parameters. Because the datasets are not in a network representation, we use

k-NN graph construction method to transform them into respective network form. In the constructed network, a vertex represents a datum, and it connects to its k nearest neighbors, determined by the Euclidean distance. We set for the simulation.

Firstly, the technique is tested on the Highleyman dataset. Each class has 300 samples, of which 6 are labeled. (We set for the k-NN algorithm.) We can observe that the labeled data points of the green class form a barrier to samples of the red class. The unfoldings and are presented in creftype (a)a. In this figure, blue squares represent vertices that are connected by edges of both unfoldings. Besides of the labeled data of green class forming a barrier, the constructed subnetworks are still connected—there is a single component connecting all the vertices of the subnetwork. It is better visualized in creftype (b)b. In this figure, the same unfoldings are presented, but the positions of the vertices are not imposed by the original data. Furthermore, colors of vertices in the figure indicate the result of classification. The overlapping data can be identified by the vertices that belong to two or more unfoldings. This result reveals that the competition system in edges provide more information than the competition in vertices since it can identify the overlapping vertices as part of the system, that is, without special treatments or modifications.

(a)
(b)
(c)
(d)
Fig. 15: System evolution on a banana-shaped distribution dataset. Red and green colors represent the two classes. Unlabeled points are black ones; labeled vertices are represented by larger and colored points. Edges are colored according to the dominating class at current iteration, where a light gray point stands for a vertex, which is not dominated yet. (a) The network representation of the dataset at the beginning of the system. (b) and (c) System iteration at time 4 and 20, respectively. (d) The result of the dataset classification.

The last synthetic dataset has 600 samples equally split into two classes. In creftype (a)a, the initial state of the system is illustrated, where the dataset is represented by a network. (The network representation is obtained by setting for the k-NN–graph construction.) At this stage, the edges are not dominated by any of the classes. Starting from this state, labeled vertices (sources) generate particles that carry the label of the sources. Though the particles are not shown, creftypeplural (c)c and (b)b are snapshots of the system evolution—at time 4 and 20—where each edge is colored by its dominating class at that iteration. In these illustrations, a solid red line stands for an edge  that , while a dashed green line stands for the opposite. When , an edge is drawn in a solid light gray line. As expected, edges close to sources are dominated initially, and farther edges are progressively dominated. At time 20, creftype (c)c, every edge has been dominated, and the edge domination does not change anymore. creftype (d)d shows dataset classification following the system result. In this example, with 1% of points in the labeled set, the technique can correctly identify the pattern formed by each class. Both results are satisfactory, reinforcing the ability of the technique of learning arbitrary class distributions.

Iv-D Simulations on Benchmark Datasets

width= 10 labeled 100 labeled g241c 42.90 4.33 10 0.25 30.03 2.18 10 0.875 g241n 46.94 3.93 9 0 36.08 6.32 9 0 Digit1 4.93 2.63 5 0.75 1.51 0.31 6 0.625 USPS 15.65 3.81 3 1 8.36 2.92 3 1 COIL 59.96 6.13 3 0.625 13.73 2.91 3 0 BCI 47.56 1.80 9 1 34.68 2.26 3 0.25 Text 29.71 3.53 9 0.875 22.41 1.74 10 0.75

TABLE II: Test Errors (%) with Standard Deviation and the Best Parameters

width= g241c g241d Digit1 USPS COIL BCI Text Avg. Rank 1-NN 47.88 46.72 13.65 16.66 63.36 49.00 38.12 9.3 SVM 47.32 46.66 30.60 20.03 68.86 49.85 45.37 13.0 MVU + 1-NN 47.15 45.56 14.42 23.34 62.62 47.95 45.32 9.3 LEM + 1-NN 44.05 43.22 23.47 19.82 65.91 48.74 39.44 9.1 QC + CMR 39.96 46.55 9.80 13.61 59.63 50.36 40.79 6.9 Discrete Reg. 49.59 49.05 12.64 16.07 63.38 49.51 40.37 10.4 TSVM 24.71 50.08 17.77 25.20 67.50 49.15 31.21 10.0 Cluster–Kernel 48.28 42.05 18.73 19.41 67.32 48.31 42.72 10.1 LDS 28.85 50.63 15.63 17.57 61.90 49.27 27.15 8.0 Laplacian RLS 43.85 45.68 5.44 18.99 54.54 48.97 33.68 5.9 LGC 45.82 44.09 9.89 9.03 63.45 47.09 46.83 6.9 LP 42.61 41.93 11.31 14.83 55.82 46.37 49.53 5.1 LNP 47.82 46.24 8.58 17.87 55.50 47.65 41.06 7.1 Original Particle Competition 41.17 43.51 8.10 15.69 54.18 48.00 34.84 4.0 Labeled Component Unfolding 42.90 46.94 4.93 15.65 59.96 47.56 29.71 4.9

TABLE III: Test Errors (%) with 10 Labeled Training Points

width= g241c g241d Digit1 USPS COIL BCI Text Avg. Rank 1-NN 43.93 42.45 3.89 5.81 17.35 48.67 30.11 11.4 SVM 23.11 24.64 5.53 9.75 22.93 34.31 26.45 8.1 MVU + 1-NN 43.01 38.20 2.83 6.50 28.71 47.89 32.83 10.6 LEM + 1-NN 40.28 37.49 6.12 7.64 23.27 44.83 30.77 10.9 QC + CMR 22.05 28.20 3.15 6.36 10.03 46.22 25.71 6.6 Discrete Reg. 43.65 41.65 2.77 4.68 9.61 47.67 24.00 7.1 TSVM 18.46 22.42 6.15 9.77 25.80 33.25 24.52 7.7 Cluster-Kernel 13.49 4.95 3.79 9.68 21.99 35.17 34.28 7.4 LDS 18.04 23.74 3.46 4.96 13.72 43.97 23.15 5.4 Laplacian RLS 24.36 26.46 2.92 4.68 11.92 31.36 23.57 4.4 LGC 41.64 40.08 2.72 3.68 45.55 43.50 56.83 9.3 LP 30.39 29.22 3.05 6.98 11.14 42.69 40.79 8.3 LNP 44.13 38.30 3.27 17.22 11.01 46.22 38.45 11.4 Original Particle Competition 21.41 25.85 3.11 4.82 10.94 41.57 27.92 5.3 Labeled Component Unfolding 30.03 36.08 1.51 8.36 13.73 34.68 22.41 6.0

TABLE IV: Test Errors (%) with 100 Labeled Training Points

We compare our model with 14 semi-supervised techniques tested on Chapelle’s benchmark [1]. The benchmark is formed by seven datasets that have 1500 data points, except for BCI that has 400 points. The datasets are described in [1].

For each dataset, 24 distinct, unbiased sets (splits) of labeled points are provided within the benchmark. Half of the splits are formed by 10 labeled points and the other half by 100 labeled points. The author of the benchmark ensured that each split contains at least one data point of each class. The result is the average test error—the proportion of data points incorrectly labeled—over the splits. We compare our results to the ones obtained by the following techniques: 1-Nearest Neighbors (1-NN), Support Vector Machines (SVM), Maximum variance unfolding (MVU + 1-NN), Laplacian eigenmaps (LEM + 1-NN), Quadratic criterion and class mass regularization (QC + CMR), Discrete regularization (Discrete reg.), Transductive support vector machines (TSVM), Cluster kernels (Cluster-Kernel), Low-density separation (LDS), Laplacian regularized least squares (Laplacian RLS), Local and global consistency (LGC), Label propagation (LP), Linear neighborhood propagation (LNP), and Network-Based Stochastic Semisupervised Learning (Vertex Domination), The simulation results are collected from 

[1], except for LGC, LP, LNP, and Original Particle Competition that are found in [17].

For the simulation of the LCU system, we discretize the interval of the parameter in . Also, we vary the k-NN parameter . We tested every combination of and . Moreover, we fix . In Table II, we present the results with the standard deviation over the splits along with the best combination of parameters that generated the best accuracy result.

The test error comparison for 10 labeled points are shown in Table III; comparison for 100 labeled points are in Table IV. Apart from each dataset, the last column is the average performance rank of a technique over the datasets. A ranking arranges the methods under comparison by test error rate in ascending order. For a single dataset, we assign rank 1 for the method with the lowest average test error on that dataset, then rank 2 for the method with the second lowest test error, and so on. The average ranking is the average value of the rankings of the method on all the datasets. The smaller the ranking score, the better the method has performed.

From the average rank column, the LCU technique is not the best ranked, but it is in the best group of techniques in both 10 labeled and 100 labeled cases.

We statistically compare the results presented in Tables IV and III. For all tests we set a significance level of

. First, we use a test based on the average rank of each method to evaluate the null hypothesis that all the techniques are equivalent. With the Friedman test 

[29], there is statistically significant difference between the rank of the techniques

Since the Friedman test result reports statistical significance, we use the Wilcoxon signed-rank test [29]. In this pairwise difference test, we test for the null hypothesis that the first technique has greater or equal error results than the second. If rejected at a 5% significance level, then we say the first technique is superior to the second. By analyzing results for 10 and 100 labeled points together, we conclude that our technique is superior to 1-NN, LEM + 1-NN, and MVU + 1-NN. Examining separately, for 10 labeled points, our method is also superior to discrete regularization, cluster kernel, and SVM. For 100 labeled points, it is also superior to LNP and LGC; whereas Laplacian RLS is superior to ours.

Iv-E Simulations on Human Activity Dataset

width= Labeled Component Unfolding SVM 5% labeled 10% labeled 20% labeled 70% labeled [30] Precision Recall F Score Precision Recall F Score Precision Recall F Score Precision Recall F Score WK .984 .013 .941 .030 .962 .016 .992 .004 .985 .011 .989 .006 .994 .002 .997 .001 .995 .001 .957 .992 .974 WU .981 .009 .935 .026 .957 .015 .988 .008 .961 .013 .974 .008 .991 .003 .981 .006 .986 .004 .980 .958 .969 WD .987 .017 .901 .016 .942 .011 .994 .008 .918 .011 .955 .007 .998 .001 .945 .008 .971 .004 .988 .976 .982 ST .864 .034 .698 .049 .770 .022 .883 .015 .743 .039 .806 .020 .905 .014 .814 .015 .857 .006 .969 .880 .922 SD .840 .024 .842 .053 .839 .017 .870 .023 .844 .022 .856 .006 .896 .013 .872 .021 .884 .009 .901 .974 .936 LD .996 .002 .999 .000 .998 .001 .997 .001 .999 .000 .998 .000 .997 .001 .999 .000 .998 .000 1.000 1.000 1.000

TABLE V: Performance Comparison in the Human Activity Recognition Using Smartphones Dataset

The Human Activity Recognition Using Smartphones [30]

dataset comprises of 10299 data samples. Each sample matches 561 features extracted from motion sensors attached to a person during a time window. Each person performed six activities which are target labels in the dataset—walking (WK), walking upstairs (WU), walking downstairs (WD), sitting (ST), standing (SD), and laying down (LD).

We use k-NN with for the dataset network representation once it is the smallest value that generates a connected network. The parameters are fixed in and . We compare our results with the ones published in [30], splitting the problem into six binary classification tasks.

Table V summarises the results. For our technique, we provide the precision, recall and F Score using 5%, 10%, and 20% of labeled samples. We average the results of 10 independent labeled set for each configuration. We also provide the original results from [30] using SVM with approximately 70% of labeled samples. Our technique performs as well as SVM using far fewer labeled samples and using the suggested parameter set. Such a feature is quite attractive because it may represent a big saving in money or efforts when involving manually data labeling in semi-supervised learning.

Iv-F Simulations on MNIST Dataset

Method 100 labeled 1000 labeled
LCU
TSVM* [23]
Embed NN* [31]
Embed CNN* [31]
  • * The comparison is biased since the results from [23, 31] rely on a single and unique labeled set. See text for more details.

TABLE VI: Test Errors (%) in the MNIST Dataset

The MNIST dataset comprises 70,000 examples of handwritten digits. All digits have been size-normalized and centered in a fixed-size image. In a supervised learning setting, this dataset is split into two sets: 60,000 examples for training and 10,000 for testing.

To adapt the dataset to a semi-supervised learning problem, we use a setting similar to [23, 31]: the labeled input data items are selected from the training set, and the unlabeled ones from the test set. Although we do not use a validation set, [23] and [31] use an additional set of at least 1,000 labeled samples for parameter tuning.

The network representation is obtained from the images without preprocessing. We use the Euclidean distance between items and