Deep learning has become a hot topic in machine learning and artificial intelligence fields. In specific tasks such as speech recognition, natural language processing, and image processing several algorithms, hypotheses, and large-scale training frameworks for deep learning have been developed and widely implemented. Nevertheless, the use of deep learning in clustering has not yet been thoroughly studied, to our knowledge. The goal of this research is to carry out some preliminary research along that path.
Graphs are structures constructed by a set of nodes(also called vertices) and a set of edges that are relationships between pairs of nodes or vertices. Graph clustering is the process of grouping the nodes of the graph into clusters, taking into account the edge structure of the graph in such a way that there are several edges within each cluster and very few between clusters. Graph clustering intends to partition the nodes in the graph into disjoint groups. Graph clustering has been a long-standing subject of research. Early methods used various shallow approaches to graph clustering.  used centrality indexes to define community divisions and social communities .  applied the distribution of opinion to community detection and identified the possible group structure. Graph clustering is grouping the nodes of a given input graph into clusters which is the topic of this paper and not to be mistaken with the clustering of graph sets based on structural similarity.
Very recently, Deepak and Huaming 
proposed feature selection and extraction for Graph Neural Networks, where they are selecting the features which dominate the node classification. Feature extraction and selection for Graph Neural Networks uses a differentiable relaxation of the concrete distribution, and the reparameterization trick
. The reparameterization trick helps to differentiate through a loss function and to select input features to minimize the loss. Applying their method, they selected 225 features out of 1433 features for the Cora citation dataset. They were able to classify with good accuracy using selected 225 features out of 1433 features by using a 2-layered graph convolution network.
In this paper, we extend the Gumbel softmax approach that  uses to select and extract the features for the Graph neural network to clustering the graph datasets using deep learning based approach. We conduct a series of experiments using a variety of datasets such as Zachary karate club, Highland Tribe, Train bombing Italian Gangs, Jazz musicians, American Revolution, Dolphins, Zebra, Windsurfers, Les Misérables, Political Books. The result seems to be very impressive with high accuracy.
Ii Background and related work
Deepak and Huaming  selected Graph Neural Network(GNN) features in the paper feature collection and extraction for Graph Neural Networks, with the citation network datasets. (1) They apply the feature selection algorithm to GNNs using Gumbel Softmax and conduct a series of tests using various comparison datasets: Cora, Pubmed, and Citeseer. (2) They develop a method for ranking the features picked. To show the usefulness of algorithms for both feature selection and ranking of features. Deepak and Huaming demo is an illustrative example for the Cora dataset, where they pick 225 features out of 1433 features and rank them according to prominent features. The proposed deep learning model works well with reduced features, which is around 80-85 percent decrease in the number of features that the dataset had initially. Results of the experiment reveal that the accuracy slowly decreases by using the selected features falling within range 1-50, 51-100, 100-150, and 151-200 for classification.
Consider the graph of ’n’ nodes and ’f’ features. Applying the principle of feature selection also takes down the number of features from to , where reflects the number of features picked. First, to train the data collection and characteristics, using the selection matrix of the Gumbel function applied (the matrix that has the features chosen while implementing the Gumbel-Softmax algorithm).
In general, let be the matrix of the input features where ’n’ represents the total number of nodes, and ’f’ represents the total number of features in the graph data set for each node. Consider where ’f’ represents the total number of features in the dataset of the graph, and ’k’ denotes the features that we select from ’f’ features.
Using the Gumbel Softmax function and the method proposed, Deepak and Huaming select the features in a graph citation dataset. Gumbel-Softmax distribution is 
, ”a continuous distribution over the simplex which can approximate samples from a categorical distribution”. A categorical distribution, by defining the highest probability to one, and all the other probability to zero is a one-hot vector.
Let z be a categorical, random variable with probabilities of the form. Julius  uses Gumbel-Max trick which provides an easy and efficient way to draw samples z from a categorical distribution with the stated class probabilities :
Training a neural network by gradient descent requires the differentiation of each network operation. Remember that in Equation 1, where , the argmax function and the stochastic sampling process are not differentiable. Next, an optimal solution for having argmax differentiable is to approximate it by a softmax function. One can also use a temperature of to control the argmax approximation standard as follows:
The two layer Graph Convolution Network (GCN) used in the experiment is defined as
To verify the selected features and calculate the accuracy for classification they use the following two layer Graph Convolution Network as defined below
: Adjacency matrix of the undirected graph G.
: Input feature matrix.
: Gumbel-Softmax feature selection / feature extraction matrix.
: feature selection / feature extraction matrix obtain ed from the result of Equation 3.
: Layer-specific trainable weight matrix.
Iii Proposed Method
Consider the graph of ’n’ nodes and the adjacency matrix ’A’. An adjacency matrix is a square matrix used to describe a finite graph. Matrix elements signify whether or not the pairs of vertices in the graph are adjacent. Then, applying our idea, we cluster the graph into k clusters.
The method used in the experiment is defined as below:
In the Equation 5, ’’ indicates the Adjacency matrix of the undirected graph G and ’’ indicates the Gumbel cluster weight matrix.
In general, let be the adjacency matrix where ’n’ represents the total number of nodes in the graph dataset. The size of the matrix is , where k indicates the number of clusters. When we perform operation, we obtain a matrix of the size and let us call this matrix as . The resultant matrix shows the strength of the cluster where the primary diagonal shows the strength of data points within cluster groups, and other elements of the matrix provide details on the strength of data points between different cluster groups. Then, we apply softmax function on the obtained matrix
to express our inputs as a discrete probability distribution. Mathematically, this is defined as follows:
Now, consider the trained Gumbel cluster weight matrix(), which is of the size . Here, each row is a graph node and will sum up to 1, and the index of the maximum row value is the cluster to which the row node belongs. For example, let us consider k = 2, i.e., we are trying to cluster the dataset into 2 cluster groups. Here, we assume the first row data as [0.9 0.1], where 0.9 is at the index, and 0.1 is at the index. Looking at the index of the maximum value in the row vector, one can easily say that the graph node belongs to cluster 0. If the second-row assumed data is [0.29 0.71], where 0.29 is at the index, and 0.71 is at the index. Looking at the index of the maximum value in the row vector, one can easily say that the graph node belongs to cluster 1. Likewise, we look into all the rows and then cluster all the nodes of the graph dataset into the cluster group. The row data values indicate the influence of the graph node towards the cluster group.
The result obtained from the equation 5
is compared with the loss function. The loss function for our experiment used is the identity matrix() where each diagonal element represents the cluster group.
Using the gumbel softmax function and the method proposed, we cluster the graph dataset nodes.
Iv Experiment Results
We cluster different category dataset that belongs to Affiliation networks, Animal networks, Human contact networks, Human social networks, Miscellaneous networks.
Iv-A1 Human Social Networks
Human social networks are real-world social networks among human beings. The links are offline, and not from a social network.
Zachary Karate Club(Fig 1 and Fig 2): The network data were collected from members of the University Karate Club by Wayne Zachary in 1977. Every node represents a member of the club, and an edge represents a link between the two members of the club. The network is undirected. The often-discussed problem of using this dataset is to find the two groups of people to whom the karate club splits after an argument between two teachers.
Highland Tribes(Fig 3 and Fig 4): The network is the Gahuku – Gama alliance system of the Eastern Central Highlands of New Guinea, signed social network of tribes from Kenneth Read (1954). The network comprises seventeen tribes connected by friendship (”Rova”) and enmity (”Hina”).
Train Bombing(Fig 5 and Fig 6): The network is undirected, as retrieved from newspapers, and includes communications between alleged terrorists involved in the Madrid train bombing on 11 March 2004. A node represents a terrorist, and an edge between two terrorists shows that the two terrorists had a connection. The weights on edge indicate how ’strong’ relation was. The relationship includes friendship and co-participating in training camps or previous attacks..
Iv-A2 Affiliation Networks
Affiliation networks are the networks denoting the membership of actors in groups.
American Revolution(Fig 11 and Fig 12): The network includes membership records of 136 people in 5 organizations from the pre-American Revolution era Figure. The graph contains well-known persons such as US activist Paul Revere. An edge between an individual and an agency suggests the individual was an organization member and is represented in the Figure 11.
Iv-A3 Animal Networks
Animal networks are networks of animal communications. They are the animal equivalent to human social networks.
Dolphins(Fig 13 and Fig 14): The network includes a Bottlenose Dolphin Social Network. The nodes are the bottlenose dolphins (genus Tursiops) of a group of bottlenose dolphins live off Doubtful Sound, a New Zealand fjord (spelled fiord in New Zealand). An edge suggests the interaction is regular. The dolphins were observed from 1994 through 2001.
Zebra(Fig 15 and Fig 16): The network involves networks of animals from the group that communicates to each other. They are the animal equivalent to human social networks. Note that website datasets such as Dogster are excluded here in the category of Social networks because humans generate the networks.
Iv-A4 Human Contact Networks
Human communication networks are real networks of interaction between people, i.e., talking to each other, spending time together, or at least being physically close. More often than not, by giving out RFID labels to individuals with chips that monitor whether other individuals are nearby, these datasets are collected..
Iv-A5 Miscellaneous Networks
Miscellaneous networks are any networks that do not fit into one of the other categories.
Political Books(Fig 19 and Fig 20): The network is of books published around the time of the 2004 presidential election about US politics and distributed through online bookseller Amazon.com. Edges between books reflect repeated co-purchases by the same purchasers of books.
Les Misérables(Fig 21 and Fig 22): This undirected network includes character co-occurrences in the novel ’Les Misérables’ by Victor Hugo. A node represents a character, and an edge between two nodes suggests that these two characters existed in the book’s same chapter. The weight of each relation reveals how much this kind of co-appearance happened.
Figures 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 and 21 shows the dataset graph before clustering. Figures 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 and 22 shows the beauty of the proposed method after clustering. Here, we can cluster a supervised, unsupervised graph dataset, and the accuracy for the labeled dataset is more than 97%.
We introduced the clustering strategy for the Graph network datasets using deep learning based approach. The experimental findings demonstrate the effectiveness by choosing appropriate parameter values and evaluating the consistency of the resultant clustering. The method currently available is just as diverse as the graph clustering applications. The research completed, however, is on a dataset of unweighted and undirected graphs. We are currently experimenting with applying the principle of clustering for Graphs on a weighted and directed graph dataset.
-  (2020) Feature Selection and Extraction for Graph Neural Networks. In Proceedings of the 2020 ACM Southeast Conference, ACM SE ’20, New York, NY, USA, pp. 252–255. External Links: Cited by: §I, §I, §II.
-  (2002) Community structure in social and biological networks. Proceedings of the national academy of sciences 99 (12), pp. 7821–7826. Cited by: §I.
-  (1954) Statistical Theory of Extreme Values and some Practical Applications. NBS Applied Mathematics Series 33. Cited by: §II.
-  (2006) Community detection as an inference problem. Physical Review E 74 (3), pp. 035102. Cited by: §I.
-  (2017) Categorical Reparameterization with Gumbel-softmax. In ICLR, Toulon, France. Cited by: §II.
-  (2013) Auto-encoding Variational Bayes. In ICLR, Scottsdale, USA. Cited by: §I.
-  (2007) Clustering social networks. In Algorithms and Models for the Web-Graph, A. Bonato and F. R. K. Chung (Eds.), Berlin, Heidelberg, pp. 56–67. External Links: Cited by: §I.