Semi-supervised learning (SSL) is one of the machine learning paradigms, which lies between the unsupervised and supervised learning paradigms. In SSL problems, both unlabeled and labeled data are taken into account in class or cluster formation and prediction processes[1, 2]
. In real-world applications, we usually have partial knowledge on a given dataset. For example, we certainly do not know every movie actor except a few famous ones; in a large-scale social network, we just know some friends; in biological domain, we are far away from completely obtaining a figure of the functions of all genes, but we know the functions of some of them. Sometimes, although we have a complete or almost complete knowledge of a dataset, labeling it by hand is lengthy and expensive. So it is necessary to restrict the labeling scope. For these reasons, partially labeled datasets are often encountered. In this sense, supervised and unsupervised learning can be considered as extreme and special cases of semi-supervised learning. Many semi-supervised learning techniques have been developed, including generative models, discriminative models , clustering and labeling techniques , multi-training , low-density separation models , and graph-based methods [8, 9, 10]
. Among the approaches listed above, graph-based SSL has triggered much attention. In this case, each data instance is represented by a vertex and is linked to other vertices according to a predefined affinity rule. The labels are propagated to the whole graph using a particular optimization heuristic.
Complex networks are large-scale graphs with nontrivial topology . Such networks introduce a powerful tool to describe the interplay of topology, structure, and dynamics of complex systems [12, 13]. Therefore, they provide a groundbreaking mechanism to help us understand the behavior of many real systems. Networks also turn out to be an important mechanism for data representation and analysis . Interpreting data sets as complex networks grant us to access the inter-relational nature of data items further. For this reason, we consider the network-based approach for SSL in this work. However, the above-mentioned network-based approach focuses on the optimization of the label propagation result and pays little attention to the detailed dynamics of the learning process itself. On the other hand, it is well-known that collective neural dynamics generate rich information, and such a redundant processing handles the adaptability and robustness of the learning process. Moreover, traditional graph-based techniques have high computational complexity, usually at cubic order . A common strategy to overcome this disadvantage is using a set of sparse prototypes derived from the data . However, such a sampling process usually loses information of the original data.
Taking into account the facts above, we study a new type of dynamical competitive learning mechanism in a complex network, called particle competition. Consider a network where several particles walk and compete to occupy as many vertices as possible while attempting to reject rival particles. Each particle performs a combined random and preferential walk by choosing a neighbor vertex to visit. Finally, it is expected that each particle occupies a subset of vertices, called a community of the network. In this way, community detection is a direct result of the particle competition. The particle competition model was originally proposed in  and extended for the data clustering task in . Later, it has been applied to semi-supervised learning [18, 19] where the particle competition is formally represented by a nonlinear stochastic dynamical system. In all the models mentioned above, the authors concern vertex dynamics—how each vertex changes its state (the level of dominance of each particle). Intuitively, vertex dynamics is a rough modeling of a network because each vertex can have several edges. A problem with the data analysis in this approach is the overlapping nature of the vertices, where a data item (a vertex in the networked form) can belong to more than one class. Therefore, it is interesting to know how each edge changes its state in the competition process to acquire detailed knowledge of the dynamical system.
In this paper, we propose a transductive semi-supervised learning model that employs a vertex–edge dynamical system in complex networks. In this dynamical system, namely Labeled Component Unfolding system, particles compete for edges in a network. Subnetworks are generated with the edges grouped by class dominance. Here, we call each subnetwork an unfolding
. The learning model employs the unfoldings to classify unlabeled data. The proposed model offers satisfactory performance on semi-supervised learning problems, in both artificial and real dataset. Also, it has shown to be suitable for detecting overlapping regions of data points by simply counting the edges dominated by each class of particles. Moreover, it has low computational complexity order.
In comparison to the original particle competition models and other graph-based semi-supervised learning techniques, the proposed one presents the following salient features:
Particle competition dynamics occurs on nodes as well as on edges
The inclusion of the edge domination model can give us more detailed information to capture connectivity pattern of the input data. This is because there are much more edges than vertices even in a sparse network. Consequently, the proposed model has the benefit of granting essential information concerning overlapping vertices. Computer simulations show the proposed technique achieves a good classification accuracy and it is suitable for situations with a small number of labeled samples.
In the proposed model, particles are continuously generated and removed from the system
Such a feature contrasts to previous particle competition models that incorporate a preferential walking mechanism where particles tend to avoid rival particles. As a consequence, the number of active particles in the system varies over time. It is worth noting that the elimination of preferential walking mechanism largely simplifies the dynamical rules of particle competition model. Now, the new model is characterized by the competition of only random walking particles, which, in turn, permits us to find out an equivalent deterministic version. The original particle competition model is intrinsically stochastic. Then, each run may generate a different result. Consequently, it has high computational cost. In this work, we find out a deterministic system with running time independent of the number of particles, and we demonstrate that it is mathematically equivalent to the stochastic model. Moreover, the deterministic model has linear time order and ensures stable learning. In other words, the model generates the same output for each run with the same input. Furthermore, the system is simpler and easier to be understood and implemented. Thus, the proposed model is more efficient than the original particle competition model.
There is no explicit objective function
In classical graph-based semi-supervised learning techniques, usually, an objective function is defined for optimization. Such function considers not only the label information, but also the semi-supervised assumptions of smoothness, cluster, or manifold. In particle competition models, we do not need to define an objective function. Instead, dynamical rules which govern the time evolution of particles and vertices (or edges) are defined. Those dynamical rules mimic the phenomena observed in some natural and social systems, such as resource competition among animals, territory exploration by humans (or animals), election campaigns, etc. In other words, the particle competition technique is typically inspired by nature. In such kind of technique, we have focused on behavior modeling instead of objective modeling. Certain objectives can be achieved if the corresponding behavioral rules are properly defined. In this way, we may classify classical graph-based semi-supervised learning techniques as objective-based design and the particle competition technique as behavior-based design.
The remainder of this paper is organized as follows. The proposed particle competition system is studied in Section II. Our transductive semi-supervised learning model is represented in Section III. In Section IV, results of computer simulations are shown to assess the proposed model performance on both artificial and real-world datasets. Finally, Section V concludes this paper.
Ii Labeled Component Unfolding system
In this section, we give an introduction to the Labeled Component Unfolding (LCU) system—a particle competition system for edge domination—explaining its basic design. Whenever pertinent, we go into detail for further clarification.
We consider a complex network expressed by a simple, unweighted, undirected graph , where is the set of vertices and is the set of edges. If two vertices are considered similar, an edge connects them. The network contains vertices that can be either labeled or unlabeled data points. The set contains the labeled vertices, where a vertex has a label . We also use the terms label and class synonymously—if a vertex is labeled with , we say this vertex belongs to class . The set contains the unlabeled vertices. We suppose that . Thus, we have that and . The network is represented by the adjacency matrix where if is connected to . We denote as the edge between vertices and . For practical reasons, we consider a connected network, and there is at least one labeled vertex of each class.
In this model, particles are objects that flow within the network while carrying a label. Labeled vertices are sources for particles of the same class and sinks
for particles of other classes. After a particle is released, it randomly walks the network. There is equal probability among adjacent vertices to be chosen as the next vertex to be visited by the particle. Consider that a particle is in, it decides to move to with probability
with denoting the degree of .
In each step, at the moment that a particle decides to move to a next vertex, it can be absorbed (removed from the system). If a particle is not absorbed, we say that it has survived and it remains active; and if it survives, then it continues walking. Otherwise, the particle is absorbed and ceases to affect the system. The absorption depends on the level of subordination and domination of a class against all other classes in the edges.
To determine the level of domination and subordination of each class in an edge, we take into account the active particles in the system. The current directed domination is the number of active particles belonging to class that decided to move from to at time and survived. Similarly, the current relative subordination is the fraction of active particles that do not belong to class and have successfully passed through edge , regardless of direction, at time . Mathematically, we define the latter as
The survival of a particle depends on the current relative subordination of the edge and the destination vertex. If a particle decides to move into a sink, it will be absorbed with probability 1. If the destination vertex is not a sink, its survival probability is
where is a parameter for characterizing the competition level.
A source generates particles according to its degree and the current number of active particles in the system. Let be the number of active particles belonging to class in the system at time , a source generates new particles if .
Let be the set of sources for particles that belong to class , the number of newly generated particles belonging to class in at time follows the distribution
is a binomial distribution. In other words, if the number of active particles is fewer than the initial number of particles,, each source performs trials with probability of generating a new particle.
Therefore, the expected number of new particles belonging to class in at time is
We are interested in the total number of visits of particles of each class to each edge. Thus, we introduce the cumulative domination that is the total number of particles belonging to class that passed through edge up to time . Mathematically, this is defined as
Using the cumulative domination, we can group the edges by class domination. For each class , the subset is
We define the subnetwork
as the unfolding of network according to class at time . We interpret the unfolding as a subspace with the most relevant relationships for a given class. We use the available information in these subnetworks for the study of overlapping regions and for semi-supervised learning.
Ii-B An Illustrative Example
One iteration of the system’s evolution is illustrated by Figure 1. The considered system contains 22 active particles at time and 20 at time . In an iteration, each particle moves to a neighbor vertex, without preference. The movement of a particle is indicated by an arrow. An interrupted line indicates an edge in which the coming particle is absorbed. A total of 6 particles are absorbed during this iteration, and the sources have generated 4 new particles.
At time , for example, one of the red particles passing through edge is absorbed due to a current edge dominance of 0.5 in that edge (one red particle and one green particle). Conversely, all green particles that moved through edge remain active at time . Since there is no rival particle (red particle) passing through this edge, the updated value of the current edge dominance is 1 and 0 for green and red classes, respectively.
In edge , one red and two green particles chose to pass through. One green particle is absorbed without affecting the new current level of dominance. Since one particle of each class successfully passed through edge , the new current level of dominance on this edge is 0.5. The same occurs for edge where no particles have passed through and, thus, the current level of dominance is set equally among all classes.
In edges and , particles have tried to move into a source of rival particles (sinks). These particles are absorbed independently from the current level of dominance.
Our edge-centric system can measure the overlapping nature of the by counting the edges dominated by each class, while a vertex-centric approach would have lost such information.
Ii-C Mathematical Modeling
Formally, we define the Labeled Component Unfolding system as a dynamical system . The state of the system is
is a vector, and each elementis the number of active particles belonging to class in at time . Furthermore, is a matrix whose elements are given by Equation 1.
Let and be, respectively, the number of particles generated and absorbed by at time . The evolution function of the dynamical system is
Intuitively, the number of active particles that are in a vertex is the total number of particles arriving, , minus the number of particles leaving, , or being absorbed, ; additionally for labeled vertices, the number of generated particles, . Moreover, to calculate the total number of visits of particles to an edge, we simply add up the number at each time. Values , , and are obtained stochastically according to the dynamics of walking, absorption, and generation.
The initial state of the system is given by an arbitrary number of initial active particles and
To achieve the desirable network unfolding, it is necessary to average the results of several simulations of the system with a very large number of initial particles . Thus, the computational cost of such a simulation is very high. Conversely, we provide an alternative system that achieves similar results in a deterministic manner. More details will follow.
Ii-D Alternative Mathematical Modeling
Consider the dynamical system
where is a row vector whose elements give the population of particles with label in each vertex at time . These values are associated to the number of active particles of system . The elements and of the sparse matrices and are related to the current directed domination, , and the cumulative domination, , respectively. In other words, gives the number of particles of class that moved from to at time , while gives the total number up to time .
The system is a nonlinear Markovian dynamical system with the deterministic evolution function
where is a square matrix with the elements of vector on the main diagonal and stands for the vector-matrix product.
The function of the system at time gives a square matrix whose elements are
Given that we know the initial state of the system, the function of the system at time returns a row vector where the -th element is
where is a row vector whose elements are , and stands for the inner product between vectors.
The initial state of the system is given by an arbitrary population size111In system , vector describes the quantity of particles in each vertex. Since has multiplicative scaling behavior, is not necessarily composed only of integer values; values can be a discrete distribution of particles. See Section II-E5 for more details. of initial active particles and
Ii-E Mathematical Analysis
In the previous subsections, we modeled two possibly equivalent systems, and . In this section, we present mathematical results that prove the equivalence of the two systems under certain assumptions.
Systems and are asymptotically equivalent if the following conditions hold:
for all , , and , we have
for some constant.
In order to prove Theorem 1, we study the following mechanisms of the particle competition system:
Ii-E1 Particle motion and absorption
In the proposed system, each particle moves independently from the others. Particle’s movement through an edge affects the absorption of rival particles only in the next iteration. Such conditions are favorable to naturally regard the system’s evolution in terms of the distribution of particles over the network. Next, we present a formal model for particle movement.
be a discrete random variable that is 1 if particlewas in at time and moved into at time ; and it is 0 otherwise. Since each particle in a vertex moves independently, we can write this probability in terms of a particle’s class; that is, for any particle that belongs to class and is in at time .
The probability is affected by the movement decision of a particle and whether it was absorbed after the decision. By formulation, in dynamical system the conditional probability, given that , is
That is, when a particle tries to move into a sink, the survival probability is zero. Otherwise, a particle only reaches if it chooses to move into the vertex and it is not absorbed.
be the probability density function of the random variable. Hence, the probability is
if or . Otherwise, it is zero.
Furthermore, is convex with fixed values of for all . Thus, with the Jensen’s inequality , we have
Ii-E2 Particle generation
In dynamical system the expected number of particles belonging to class generated at at time is
The conditional expectation is, by formulation,
and thus, is
Since is convex for all and according to Jensen’s inequality, we have
Ii-E3 Expected edge domination
At the beginning of system we have
and, for ,
Given that is known and since each particle in a vertex moves independently, the number of particles that successfully reaches at time is
where is a particle that belongs to class and is in . Then, the expected value is
for all , , and .
Ii-E4 Expected number of particles
We know the number of particles at the beginning of system , so
and, for all , the expected value is
However, the expected number of particles that were absorbed in is the expected number of particles in minus the expected number of particles that survived when moving away. Thus, can be written as
for all , , and .
Ii-E5 Scale invariance
The unfolding from system is invariant under real positive multiplication of the row vector . In order to prove this property, consider the following lemma.
System has positive multiplicative scaling behavior of order 1. Given an arbitrary initial state of the system , it means that
for all and .
First, we show that the functions are invariant to parameter scaling. Given an arbitrary system state and ,
since the term can be either
Now, consider two arbitrary initial states,
for all . We have that,
Thus, Relation (14) holds true for .
Assuming that Relation (14) holds true for some time , we show that the relation holds true for :
So Relation (14) indeed holds true for .
Since both the basis and the inductive step have been performed, by mathematical induction, the lemma is proved for all natural. ∎
Finally, using these studies, we may prove the theorem.
Even if the convergence of Inequalities (9) and (10) are not true, another property that possibly makes the two systems equivalent is the compensation over time. At the beginning, both systems are equal; however, in the next iteration both absorption probability (9) and generated particles (10) are underestimated. Consequently, particles that have survived may compensate the ones that were not generated. Furthermore, the lower the number of absorbed particles in an iteration, the higher the absorption probability in the next iteration. Likewise, the lower the number of generated particles in an iteration, the higher is the expected number of new particles in the next iteration.
Iii Semi-supervised Learning by Labeled Component Unfolding
Unfoldings generated by LCU system are incorpored in a semi-supervised learning model. Consider two sets and such that for all . Each data point is associated to a label . In the semi-supervised learning setting, our goal is to correctly assign existing labels to the unlabeled data .
In short, the proposed learning model has three steps: a) a network is constructed based on a dataset composed of feature vectors, where vertices represent data points, and edges represent similarity relationship; b) LCU system is applied to obtain the unfoldings, that is, a distinct set of edges for each class of the dataset; and c) infer labels for every data point in .
Next, each step of the proposed learning model is presented in detail. Further to the model’s algorithm description, its computational complexity analysis is also presented.
Since the proposed dynamical system takes place on a complex network, the original dataset needs to be represented in a network structure. Therefore, the first step of our learning model is to obtain a network representation. Each data point is associated to a single vertex of the network. Moreover, the network must be sparse, undirected, and unweighted. Labeled vertices correspond to the set of points in , and unlabeled vertices to the set of points in . Two vertices are connected by an edge if they have a relationship of similarity, which is determined by some metric or by the particular problem. Any graph construction method that satisfies such conditions may be used in this step. The k-Nearest Neighbor (k-NN) graph construction method is one of them.
The second step is to run system defined by Equation 5 using the constructed complex network as its input. Two conditions are satisfied on the system initialization. First, no class should be privileged. Second, during the first iterations, all particles should be able to flow within the network with a small probability of absorption. Thus, the initial conditions of the system, for all , , and , are
Since there are always particles in the system, the iteration of system should be stopped if the time limit has been reached. The time limit parameter controls the maximum number of iterations of the system.
At the last step, the networks are used for vertex classification. We assign a label for each unlabeled vertex , with the information provided by the networks . Label is assigned based on the density of edges in its neighborhood. Formally, the label for is
where is the neighborhood of in the unfolding . We denote the number of edges in this neighborhood as .
Algorithm 1 summarizes the steps of our learning model. The algorithm accepts the labeled dataset , the unlabeled dataset , and 2 user-defined parameters—the competition () parameter of the system and the time limit parameter (). Moreover, it is necessary to choose a network formation technique.
The first step of the learning model is mapping the original vector-formed data to a network using a chosen network formation technique. Afterward, we unfold the network as described in Algorithm 2. This algorithm iterates the LCU system to produce one subnetwork for each class. Steps 2–6 initialize the system state as indicated in Equation 15. Steps 7–15 iterate the system until using the evolution function (5). Step 16 calculates and returns the unfoldings for each class. Back to Algorithm 1, by using the produced unfoldings, the unlabeled data are classified as described in Equation 16.
Iii-B Computational Complexity and Running Time
Here, we provide the computational complexity analysis step by step.
The construction of the complex network from the input dataset depends on the chosen method. Since is the number of data samples. The k-NN method, for example, has complexity order of using multidimensional binary search tree .
The second step is running system defined by Equation 5. Using sparse matrices, the system initialization, steps 2–6 of Algorithm 2, has complexity order of . The system iteration calculates times the evolution function (5) represented in steps 8–14. The time complexity of each part of the system evolution is presented below.
Step 9, computation of the matrix . This matrix has non-zero entries. It is necessary to calculate for each non-zero entry. Hence, this step has complexity order of . However, the denominator of Equation 7 is the same for all values of .
Step 10, computation of the vector . This vector has non-zero entries. It is also necessary to calculate the total number of particles in the system. So, this calculation has time complexity order of .
Step 11, computation of the matrix . The multiplication between a diagonal matrix and a sparse matrix with non-zero entries has time complexity order of .
Step 12, computation of the vector . Suppose that is the average vertex degree of the input network; it follows that this can be performed in .
Step 13, computation of the matrix . This sparse matrix summation has complexity order of .
After the system evolution, the unfolding process performs operations. Thus, the total time complexity order of the system simulation is . However, the value of is fixed and the value of is usually very small.
The vertex labeling step is the last step of the learning model. The time complexity of this step depends on the calculation of the number of edges in the neighborhood of each unlabeled vertex in each unfolding. It can be efficiently calculated by using one step of a breadth-first search in . Hence, the order of the average-time complexity is .
|Transductive SVM |
|Local and Global Consistency |
|Large Scale Transductive SVM |
|Dynamic Label Propagation |
|Label Propagation |
|Original Particle Competition |
|Labeled Component Unfolding|
|Minimum Tree Cut |
In summary, considering the discussion above, our learning model runs in including the transformation from vector-based dataset to a network. Table I compares the time complexity of common graph-based techniques disregarding the graph construction step. Only the proposed LCU method and Minimum Tree Cut  have linear time, though the latter must either receive or construct a spanning tree. Consequently, the Minimum Tree Cut has a performance similar to the scalable version of traditional algorithms, such as those using subsampling practices.
Figure 4 depicts the running time of a single iteration of the system varying the number of vertices and edges, respectively. With 10 independent runs, we measure the time for 30 iterations, totalizing 300 samples for each network size. We set , two classes, and 5% of labeled vertices. Experiments were run on an Intel® Core™ i7 CPU 860 @ 2.80GHz with 16 GB RAM memory DDR3 @ 1333 MHz. This experiment shows that the system runs in linear time as a function of the number of vertices and edges, which conforms our theoretical analysis.
Iv Computer Simulations
To study the stochastic system and the deterministic version , we present experimental analyses that concern their equivalence. Additionally, we study the meaning of the parameters of our learning model. After that, we evaluate the model performance using both artificial and synthetic datasets. Then, we show the unfolding process and the learning model on synthetic data. Finally, we present the simulation results for a well-known benchmark dataset and for a real application on human activity and handwritten digits recognition.
Iv-a Experimental Analysis
In this section, we present an experiment that assesses the equivalence between the unfolding results of both systems with an increasing initial number of particles in system .
The networks used for the analysis are generated by the following model: a complex network is constructed given a labeled vector , a number of edges by vertex, and a weight that controls the preferential attachment between vertices of different classes. The resulting network contains vertices. For each , edges are randomly connected, with replacement. If , the preferential attachment weight is ; otherwise, the weight is . The parameter is proportional to the overlap between classes.
If there exists a positive constant such that
both systems generate the same unfoldings. To assess this proportionality, both systems are simulated in 10 different networks , with vertices arranged in two classes. The system’s parameter is discretized in . Varying the total number of initial particles, we set for all and .
We consider the correlation between the cumulative domination matrices of systems and . If the two matrices are proportional, then they must be correlated. Values of correlation close to 1 indicate the cumulative domination matrices are proportional. In Figure 5, the correlation is depicted. As the number of initial particles increases, the correlation approaches 1. This result suggests that both systems generate the same unfolding when the number of initial particles grows to infinity.
Iv-B Parameter Analysis
The LCU model has two parameters apart from the network construction. In this section, we discuss their meaning. To do so, the learning model is applied in synthetic datasets whose data items are sampled from a three dimensional knot torus with parametric curve
where and .
We sampled 500 data items uniformly along the possible values of . We randomly split the data items from 2 to 10 classes so that the samples with adjacent belongs to the same class. We also added to each sample a random noise in each dimension with distribution with and . Figure 6 depicts an example of the dataset with 4 classes with and without noise. Since the dataset has a complex form, a small change of parameter value may generate different results. Therefore, it is suitable to study the sensitivity of parameters.
We run the LCU model with parameters and . Finally, 30 unbiased sets of 40 labeled points are employed. The -NN is used for the network construction with .
Below, we discuss each parameter of the model.
Iv-B1 Discussion about the network construction parameter
In our model, the input network must be simple (between any pair of vertices there must exist at most one edge), unweighted, undirected, and connected. Besides these requirements, two vertices must be connected if their data items are considered similar enough to the particular problem. In our experiments, we use k-NN graph with Euclidean distance since it is proved to approximate the low-dimensional manifold of point set . The smaller the value of , the better are the results.
Iv-B2 Discussion about the system parameter
The LCU system has only one parameter: the competition parameter . This parameter defines the intensity of competition between particles. When , particles randomly walk the network, without competition. As approaches to 1, particles are more likely to compete and, consequently, to be absorbed. Figure 7 depicts the average error of our method with different values of . Based on the figure, our model is not sensitive to . In general, we suggest setting because of better and more consistent classification than other values.
Iv-B3 Discussion about the system iteration stopping parameter
The time limit parameter controls when the simulation should stop; it must be at least as large as the diameter of the network. That way, it is guaranteed every edge to be visited by a particle. Since the network diameter is usually a small value, the simulation stops in few iterations.
Iv-C Simulations on Artificial Datasets
For better understanding the details of the LCU system, in this subsection we illustrate it using two synthetic datasets. Each dataset has a different class distribution—banana shape and Highleyman. (The datasets are generated using the PRTools framework 
.) The banana shape dataset is uniformly distributed along specific shape, and then superimposed on a normal distribution with the standard deviation of 1 along the axes. In Highleyman distribution, two classes are defined by bivariate normal distributions with different parameters. Because the datasets are not in a network representation, we usek-NN graph construction method to transform them into respective network form. In the constructed network, a vertex represents a datum, and it connects to its k nearest neighbors, determined by the Euclidean distance. We set for the simulation.
Firstly, the technique is tested on the Highleyman dataset. Each class has 300 samples, of which 6 are labeled. (We set for the k-NN algorithm.) We can observe that the labeled data points of the green class form a barrier to samples of the red class. The unfoldings and are presented in creftype (a)a. In this figure, blue squares represent vertices that are connected by edges of both unfoldings. Besides of the labeled data of green class forming a barrier, the constructed subnetworks are still connected—there is a single component connecting all the vertices of the subnetwork. It is better visualized in creftype (b)b. In this figure, the same unfoldings are presented, but the positions of the vertices are not imposed by the original data. Furthermore, colors of vertices in the figure indicate the result of classification. The overlapping data can be identified by the vertices that belong to two or more unfoldings. This result reveals that the competition system in edges provide more information than the competition in vertices since it can identify the overlapping vertices as part of the system, that is, without special treatments or modifications.
The last synthetic dataset has 600 samples equally split into two classes. In creftype (a)a, the initial state of the system is illustrated, where the dataset is represented by a network. (The network representation is obtained by setting for the k-NN–graph construction.) At this stage, the edges are not dominated by any of the classes. Starting from this state, labeled vertices (sources) generate particles that carry the label of the sources. Though the particles are not shown, creftypeplural (c)c and (b)b are snapshots of the system evolution—at time 4 and 20—where each edge is colored by its dominating class at that iteration. In these illustrations, a solid red line stands for an edge that , while a dashed green line stands for the opposite. When , an edge is drawn in a solid light gray line. As expected, edges close to sources are dominated initially, and farther edges are progressively dominated. At time 20, creftype (c)c, every edge has been dominated, and the edge domination does not change anymore. creftype (d)d shows dataset classification following the system result. In this example, with 1% of points in the labeled set, the technique can correctly identify the pattern formed by each class. Both results are satisfactory, reinforcing the ability of the technique of learning arbitrary class distributions.
Iv-D Simulations on Benchmark Datasets
We compare our model with 14 semi-supervised techniques tested on Chapelle’s benchmark . The benchmark is formed by seven datasets that have 1500 data points, except for BCI that has 400 points. The datasets are described in .
For each dataset, 24 distinct, unbiased sets (splits) of labeled points are provided within the benchmark. Half of the splits are formed by 10 labeled points and the other half by 100 labeled points. The author of the benchmark ensured that each split contains at least one data point of each class. The result is the average test error—the proportion of data points incorrectly labeled—over the splits. We compare our results to the ones obtained by the following techniques: 1-Nearest Neighbors (1-NN), Support Vector Machines (SVM), Maximum variance unfolding (MVU + 1-NN), Laplacian eigenmaps (LEM + 1-NN), Quadratic criterion and class mass regularization (QC + CMR), Discrete regularization (Discrete reg.), Transductive support vector machines (TSVM), Cluster kernels (Cluster-Kernel), Low-density separation (LDS), Laplacian regularized least squares (Laplacian RLS), Local and global consistency (LGC), Label propagation (LP), Linear neighborhood propagation (LNP), and Network-Based Stochastic Semisupervised Learning (Vertex Domination), The simulation results are collected from, except for LGC, LP, LNP, and Original Particle Competition that are found in .
For the simulation of the LCU system, we discretize the interval of the parameter in . Also, we vary the k-NN parameter . We tested every combination of and . Moreover, we fix . In Table II, we present the results with the standard deviation over the splits along with the best combination of parameters that generated the best accuracy result.
The test error comparison for 10 labeled points are shown in Table III; comparison for 100 labeled points are in Table IV. Apart from each dataset, the last column is the average performance rank of a technique over the datasets. A ranking arranges the methods under comparison by test error rate in ascending order. For a single dataset, we assign rank 1 for the method with the lowest average test error on that dataset, then rank 2 for the method with the second lowest test error, and so on. The average ranking is the average value of the rankings of the method on all the datasets. The smaller the ranking score, the better the method has performed.
From the average rank column, the LCU technique is not the best ranked, but it is in the best group of techniques in both 10 labeled and 100 labeled cases.
. First, we use a test based on the average rank of each method to evaluate the null hypothesis that all the techniques are equivalent. With the Friedman test, there is statistically significant difference between the rank of the techniques
Since the Friedman test result reports statistical significance, we use the Wilcoxon signed-rank test . In this pairwise difference test, we test for the null hypothesis that the first technique has greater or equal error results than the second. If rejected at a 5% significance level, then we say the first technique is superior to the second. By analyzing results for 10 and 100 labeled points together, we conclude that our technique is superior to 1-NN, LEM + 1-NN, and MVU + 1-NN. Examining separately, for 10 labeled points, our method is also superior to discrete regularization, cluster kernel, and SVM. For 100 labeled points, it is also superior to LNP and LGC; whereas Laplacian RLS is superior to ours.
Iv-E Simulations on Human Activity Dataset
The Human Activity Recognition Using Smartphones 
dataset comprises of 10299 data samples. Each sample matches 561 features extracted from motion sensors attached to a person during a time window. Each person performed six activities which are target labels in the dataset—walking (WK), walking upstairs (WU), walking downstairs (WD), sitting (ST), standing (SD), and laying down (LD).
We use k-NN with for the dataset network representation once it is the smallest value that generates a connected network. The parameters are fixed in and . We compare our results with the ones published in , splitting the problem into six binary classification tasks.
Table V summarises the results. For our technique, we provide the precision, recall and F Score using 5%, 10%, and 20% of labeled samples. We average the results of 10 independent labeled set for each configuration. We also provide the original results from  using SVM with approximately 70% of labeled samples. Our technique performs as well as SVM using far fewer labeled samples and using the suggested parameter set. Such a feature is quite attractive because it may represent a big saving in money or efforts when involving manually data labeling in semi-supervised learning.
Iv-F Simulations on MNIST Dataset
|Method||100 labeled||1000 labeled|
|Embed NN* |
|Embed CNN* |
The MNIST dataset comprises 70,000 examples of handwritten digits. All digits have been size-normalized and centered in a fixed-size image. In a supervised learning setting, this dataset is split into two sets: 60,000 examples for training and 10,000 for testing.
To adapt the dataset to a semi-supervised learning problem, we use a setting similar to [23, 31]: the labeled input data items are selected from the training set, and the unlabeled ones from the test set. Although we do not use a validation set,  and  use an additional set of at least 1,000 labeled samples for parameter tuning.
The network representation is obtained from the images without preprocessing. We use the Euclidean distance between items and