1 Introduction
In machine learning, data are usually described as points in a vector space (
). Nowadays, structured data are ubiquitous and the capability to capture the structural relationships among the points can be particularly useful to improve the effectiveness of the models learned on them.To this aim, graphs are widely employed to represent this kind of information in terms of nodes/vertices and edges including the local and spatial information arising from data. Consider a dimensional dataset , the graph is extracted from by considering each point as a node and computing the edge weights by means of a function. We obtain a new data representation , where is a set, which contains vertices, and is a set of weighted pairs of vertices (edges).
Applications to a graph domain can be usually divided into two main categories, called vertexfocused and graphfocused applications. For simplicity of exposition, we just consider the classification problem.^{1}^{1}1Notice that, the proposed formulation can be trivially rewritten for the regression problem. Under this setting, the vertexfocused applications are characterized by a set of labels , a dataset , the related graph , and we assume that the first points (where ) are labeled and the remaining (where
) are unlabeled. The goal is to classify the unlabeled nodes exploiting the combination of their features and the graph structure by means of a semisupervised learning approach. Instead,
graphfocused applications are related to the goal of learning a function that maps different graphs to integer values by taking into account the features of the nodes of each graph: . This task can usually be solved using a supervised classification approach on the graph structures.A number of research works are devoted to classify structured data both for vertexfocused and graphfocused applications [9, 19, 21, 23]. Nevertheless, there is a major limitation in existing studies, most of these research works are focused on static graphs. However, many realworld structured data are dynamic and nodes/edges in the graphs may change during time. In such dynamic scenario, temporal information can also play an important role.
In the last decade, (deep) neural networks have shown their great power and flexibility by learning to represent the world as a nested hierarchy of concepts, achieving outstanding results in many different fields of application. It is important to underline that, just a few research works have been devoted to encode the graph structure directly using a neural network model [1, 3, 4, 12, 15, 20]. Among them, to the best of our knowledge, no one is able to manage dynamic graphs.
To exploit both structured data and temporal information through the use of a neural network model, we propose two novel approaches that combine Long Short TermMemory networks (LSTMs, [8]) and Graph Convolutional Networks (GCNs, [12]). Both of them are able to deal with vertexfocused applications. These techniques are respectively able to capture temporal information and to properly manage structured data. Furthermore, we have also extended our approaches to deal with graphfocused applications.
LSTM
s are a special kind of Recurrent Neural Networks (
RNNs, [10]), which are able to improve the learning of long shortterm dependencies. All RNNs have the form of a chain of repeating modules of neural networks. Precisely, RNNs are artificial neural networks where connections among units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. In standard RNNs, the repeating module is based on a simple structure, such as a single (hyperbolic tangent) unit. LSTMs extend the repeating module by combining four interacting units.GCN is a neural network model that directly encodes graph structure, which is trained on a supervised target loss for all the nodes with labels. This approach is able to distribute the gradient information from the supervised loss and to enable it to learn representations exploiting both labeled and unlabeled nodes, thus achieving stateoftheart results.
2 Related Work
Many important realworld datasets are in graph form; among all, it is enough to consider: knowledge graphs, social networks, proteininteraction networks, and the World Wide Web.
To deal with this kind of data achieving good classification results, the traditional approaches proposed in literature mainly follow two different directions: to identify structural properties as features to manage them using traditional learning methods, or to propagate the labels to obtain a direct classification.
Zhu et al. [24] propose a semisupervised learning algorithm based on a Gaussian random field model (also known as Label Propagation). The learning problem is formulated as Gaussian random fields on graphs, where a field is described in terms of harmonic functions, and is efficiently solved using matrix methods or belief propagation. Xu et al. [21] present a semisupervised factor graph model that is able to exploit the relationships among nodes. In this approach, each vertex is modeled as a variable node and the various relationships are modeled as factor nodes. Grover and Leskovec, in [6]
, present an efficient and scalable algorithm for feature learning in networks that optimizes a novel networkaware, neighborhood preserving objective function using Stochastic Gradient Descent. Perozzi et al.
[18] propose an approach called DeepWalk. This technique uses truncated random walks to efficiently learn representations for vertices in graphs. These latent representations, which encode graph relations in a vector space, can be easily exploited by statistical models thus producing stateoftheart results.Unfortunately, the described techniques are not able to deal with graphs that dynamically change in time (nodes/edges in the graphs may change during time). There is a small amount of methodologies that have been designed to classify nodes in dynamic networks [14, 22]. Li et al. [14] propose an approach that is able to learn the latent feature representation and to capture the dynamic patterns. Yao et al. [22]
present a Support Vector Machinesbased approach that combines the support vectors of the previous temporal instant with the current training data to exploit temporal relationships. Pei et al.
[17] define an approach called dynamic Factor Graph Model for node classification in dynamic social networks. More precisely, this approach organizes the dynamic graph data in a sequence of graphs. Three types of factors, called node factor, correlation factor and dynamic factor, are designed to respectively capture node features, node correlations and temporal correlations. Node factor and correlation factor are designed to capture the global and local properties of the graph structures while the dynamic factor exploits the temporal information.It is important to underline that, very little attention has been devoted to the generalization of neural network models to structured datasets. In the last couple of years, a number of research works have revisited the problem of generalizing neural networks to work on arbitrarily structured graphs [1, 3, 4, 12, 15, 20], some of them achieving promising results in domains that have been previously dominated by other techniques. Scarselli et al. [20] formalize a novel neural network model, called Graph Neural Network (GNNs). This model is based on extending a neural network method with the purpose of processing data in form of graph structures. The GNNs model can process different types of graphs (e.g., acyclic, cyclic, directed, and undirected) and it maps a graph and its nodes into a dimensional Euclidean space to learn the final classification/regression model. Li et al. [15] extend the GNN
model, by relaxing the contractivity requirement of the propagation step through the use of Gated Recurrent Unit
[2], and by predicting sequence of outputs from a single input graph. Bruna et al. [1]describe two generalizations of Convolutional Neural Networks (
CNNs, [5]). Precisely, the authors propose two variants: one based on a hierarchical clustering of the domain and another based on the spectrum of the Laplacian graph. Duvenaud et al.
[4] present another variant of CNNs working on graph structures. This model allows an endtoend learning on graphs of arbitrary size and shape. Defferrard et al. [3] introduce a formulation of CNNs in the context of spectral graph theory. The model provides efficient numerical schemes to design fast localized convolutional filters on graphs. It is important to notice that, it reaches the same computational complexity of classical CNNs working on any graph structure. Kipf and Welling [12] propose an approach for semisupervised learning on graphstructured data (GCNs) based on CNNs. In their work, they exploit a localized firstorder approximation of the spectral graph convolutions framework [7]. Their model linearly scales in the number of graph edges and learns hidden layer representations encoding local and structural graph features.Notice that, these neural network architectures are not able to properly deal with temporal information.
3 Our Approaches
In this section, we introduce two novel network architectures to deal with vertex/graphfocused applications. Both of them rely on the following intuitions:

[nolistsep]

GCNs can effectively deal with graphstructured information, but they lack the ability to handle data structures that change during time. This limitation is (at least) twofold:

inability to manage dynamic vertex features,

inability to manage dynamic edge connections.

LSTMs excel in finding long shortterm dependencies, but they lack the ability to explicitly exploit graphstructured information within it.
Due to the dynamic nature of the tasks we are interested in solving, the new network architectures proposed in this paper will work on ordered sequences of graphs and ordered sequences of vertex features. Notice that, for sequences of length one, this reduces to the vertex/graphfocused applications described in Section 1.
Our contributions are based on the idea of combining an extension of the Graph Convolution (GC, the fundamental layer of the GCNs) and a modified version of LSTM, thus to learn the downstream recurrent units by exploiting both graph structured data and vertex features.
We propose two GClike layers that take as input a graph sequence and the corresponding ordered sequence of vertex features, and they output an ordered sequence of a new vertex representation. These layers are:

[nolistsep]

the Waterfall DynamicGC layer, which performs at each step of the sequence a graph convolution on the vertex input sequence. An important feature of this layer is that the trainable parameters of each graph convolution are shared among the various step of the sequence;

the Concatenate DynamicGC layer, which performs at each step of the sequence a graph convolution on the vertex input features, and concatenates it to the input. Again, the trainable parameters are shared among the steps in the sequence.
Each of the two layers can jointly be used with a modified version of LSTM to perform a semisupervised classification of sequence of vertices or a supervised classification of sequence of graphs. The difference between the two tasks just consists in how we perform the last processing of the data (for further details, see Equation (1) and Equation (2)).
In the following section we will provide the mathematical definitions of the two modified GC layers, the modified version of LSTM, as well as some other handy definitions that will be useful when we will describe the final network architectures.
3.1 Definitions
Let with be a finite sequence of undirected graphs , with , i.e. all the graphs in the sequence share the same vertices. Considering the graph , for each vertex let be the corresponding feature vector. Each step in the sequence can completely be defined by its graph (modeled by the adjacency matrix^{2}^{2}2Notice that, the adjacency matrices can be either weighted or unweighted. ) and by the vertexfeatures matrix (the matrix whose row vectors are the ).
We will denote with the th row, th column element of the matrix , and with the transpose of .
is the identity matrix of
; and are the softmaximum and the rectified linear unit functions [5].The matrix is a projector on if it is a symmetric, positive semidefinite matrix with . In particular, it is a diagonal projector if it is a diagonal matrix (with possibly some zero entries on the main diagonal). In other words, a diagonal projector on is diagonal matrix with some s on the main diagonal, that when it is rightmultiplied by a dimensional column vector it zeroes out all the entries of corresponding to the zeros on the main diagonal of :
We recall here the mathematics of the GC layer [12] and the LSTM [8], since they are the basic building blocks of our contribution. Given a graph with adjacency matrix and vertexfeature matrix , the GC layer with output nodes is defined as the function , such that , where is a weight matrix and is the renormalized adjacency matrix, i.e. with and .
Given the sequence with dimensional row vectors for each , a returning sequenceLSTM with output nodes, is the function , with and
where is the Hadamard product, , , are weight matrices and
are bias vectors, with
.Definition 1 (wdGC layer)
Let , be, respectively, the sequence of adjacency matrices and the sequence of vertexfeature matrices for the considered graph sequence , with and . The Waterfall DynamicGC layer with output nodes is the function with weight matrix defined as follows:
where , and all the are the renormalized adjacency matrices of the graph sequence .
The wdGC layer can be seen as multiple copies of a standard GC layer, all of them sharing the same training weights. Then, the resulting training parameters are , independently of the length of the sequence.
In order to introduce the Concatenate DynamicGC layer, we recall the definition of the Graph of a Function: considering a function from to , , . Namely, the operator transforms into a function returning the concatenation between and .
Definition 2 (cdGC layer)
Let , be, respectively, the sequence of adjacency matrices and the sequence of vertexfeature matrices for the considered graph sequence , with and . A Concatenate DynamicGC layer with output nodes is the function with weight matrix defined as follows:
where , and all the are the renormalized adjacency matrices of the graph sequence .
Intuitively, cdGC is a layer made of copies of GC layers, each copy acting on a specific instant of the sequence. Each output of the copies is then concatenated with its input, thus resulting in a sequence of graphconvoluted features together with the vertexfeatures matrix. Note that, the weights are shared among the copies. The number of learnable parameters of this layer is , independently of the number of steps in the sequence .
Notice that, both the input and the output of wdGC and cdGC
are sequences of matrices (loosely speaking, third order tensors).
We will now define three additional layers. These will help us in reducing the clutter with the notation when we will introduce in Section 3.2 and Section 3.3 the network architectures we have used to solve the semisupervised classification of sequence of vertices and the supervised classification of sequence of graphs. Precisely, they are: the recurrent layer used to process in a parallel fashion the convoluted vertex features, the two final layers (one per task) used to map the previous layers outputs into
class probability vectors.
Definition 3 (vLSTM layer)
Consider with , the Vertex LSTM layer with output nodes is given by the function :Definition 4 (vsFC layer)
Consider with , the Vertex Sequential Fully Connected layer with output nodes is given by the function , parameterized by the weight matrix and the bias matrix :
with .
Definition 5 (gsFC layer)
Consider with , the Graph Sequential Fully Connected layer with output nodes is given by the function , parameterized by the weight matrices , and the bias matrices and :
with .
Informally: the vLSTM layer acts as copies of LSTM, each one evaluating the sequence of one row of the input tensor ; the vsFC layer acts as copies of a Fully Connected layer (FC, [5]) with softmax activation, all the copies sharing the parameters. The vsFC layer outputs class probability vectors for each step in the input sequence; the gsFC layer acts as copies of two FC layers with softmaxReLU activation, all the copies sharing the parameters. This layer outputs one class probability vector for each step in the input sequence. Note that, both the input and the output of vsFC and vLSTM are sequences of matrices, while for gsFC the input is a sequence of matrices and the output is a sequence of vectors. We have now all the elements to describe our network architectures to address both semisupervised classification of sequence of vertices and supervised classification of sequence of graphs.
3.2 SemiSupervised Classification of Sequence of Vertices
Definition 6 (SemiSupervised Classification of Sequence of Vertices)
Let be a sequence of graphs each one made of vertices, and the related sequence of vertexfeatures matrices.
Let be a sequence of diagonal projectors on the vector space . Define the sequence by means of , ; i.e. and identify the labeled and unlabeled vertices of , respectively. Moreover, let be a sequence of matrices with rows and columns, satisfying the property , where the th row of the
th matrix represents the onehot encoding of the
class label of the th vertex of the th graph in the sequence, with the th vertex being a labeled one. Then, semisupervised classification of sequence of vertices consists in learning a function such that and is the right labeling for the unlabeled vertices for each .To address the above task, we propose the networks defined by the following functions:
(1a)  
(1b) 
where denote the function composition. Both the architectures take as input, and produce a sequence of matrices whose row vectors are the probabilities of each vertex of the graph: with . For the sake of clarity, in the rest of the paper, we will refer to the networks defined by Equation (1a) and Equation (1b) as Waterfall DynamicGCN(WDGCN) and Concatenate DynamicGCN (CDGCN, see Figure 0(b)), respectively.
Since all the functions involved in the composition are differentiable, the weights of the architectures can be learned using gradient descent methods, employing as loss function the
crossentropy evaluated only on the labeled vertices:with the convention that .
3.3 Supervised Classification of Sequence of Graphs
Definition 7 (Supervised Classification of Sequence of Graphs)
Let be a sequence of graphs each one made of vertices, and the related sequence of vertexfeatures matrices. Moreover, let be a sequence of onehot encoded class labels, i.e. . Then, graphsequence classification task consists in learning a predictive function such that .
The proposed architectures are defined by the following functions:
(2a)  
(2b) 
The two architectures take as input . The output of wdGC and cdGC is processed by a vLSTM, resulting in a matrix for each step in the sequence. It is a gsFC duty to transform this vertexbased prediction into a graph based prediction, i.e. to output a sequence of class probability vectors . Again, we will use WDGCN (see Figure 0(a)) and CDGCN to refer to the networks defined by Equation (2a) and Equation (2b), respectively.
Also under this setting the training can be performed by means of gradient descent methods, with the cross entropy as loss function:
with the convention .
4 Experimental Results
In this section we describe the employed datasets, the experimental settings, and the results achieved by our approaches compared with those obtained by baseline methods.
4.1 Datasets
We now present the used datasets. The first one is employed to evaluate our approaches in the context of the vertexfocused applications; instead, the second dataset is used to assess our architectures in the context of the graphfocused applications.
Our first set of data is a subset of DBLP^{3}^{3}3http://dblp.unitrier.de/xml/ dataset described in [17]
. Conferences from six research communities, including artificial intelligence and machine learning, algorithm and theory, database, data mining, computer vision, and information retrieval, have been considered. Precisely, the coauthor relationships from
to are considered and data of each year is organized in a graph form. Each author represents a node in the network and an edge between two nodes exists if two authors have collaborated on a paper in the considered year. Note that, the resulting adjacency matrix is unweighted.The node features are extracted from each temporal instant using DeepWalk [18] and are composed of 64 values. Furthermore, we have augmented the node features by adding the number of articles published by the authors in each of the six communities, obtaining a features vector composed of values. This specific task belongs to the vertexfocused applications.
The original dataset is made of authors across the ten years under analysis. Each year authors appear on average, and authors appear all the years, with an average of authors appearing on two consecutive years.
We have considered the authors with the highest number of connections during the analyzed years, i.e. the vertices among the total with the highest , with the adjacency matrix at the th year. If one of the selected authors does not appear in the th year, its feature vector is set to zero.
The final dataset is composed of vertexfeatures matrices in , adjacency matrices belonging to , and each vertex belongs to one of the classes.
CAD120^{4}^{4}4http://pr.cs.cornell.edu/humanactivities/data.php is a dataset composed of RGBD videos corresponding to highlevel human activities [13]. Each video is annotated with subactivity labels, object affordance labels, tracked human skeleton joints and tracked object bounding boxes. The subactivity labels are: reaching, moving, pouring, eating, drinking, opening, placing, closing, scrubbing, null. Our second dataset is composed of all the data related to the detection of subactivities, i.e. no object affordance data have been considered. Notice that, detecting the subactivities is a challenging problem as it involves complex interactions, since humans can interact with multiple objects during a single activity. This specific task belongs to the graphfocused applications.
Each one of the highlevel activities is characterized by one person, whose joints are tracked (in position and orientation) in the D space for each frame of the sequence. Moreover, in each highlevel activity appears a variable number of objects, for which are registered their bounding boxes in the video frame together with the transformation matrix matching extracted SIFT features [16] from the frame to the ones of the previous frame. Furthermore, there are objects involved in the videos.
We have built a graph for each video frame: the vertices are the skeleton joints plus the objects, while the weighted adjacency matrix has been derived by employing Euclidean distance. Precisely, among two skeleton joints the edge weight is given by the Euclidean distance between their D positions; among two objects it is the D distance between the centroids of their bounding boxes; among an object and a skeleton joint it is the D distance between the centroid of the object bounding box and the skeleton joint projection into the D video frame. All the distances have been scaled between zero and one. When an object does not appear in a frame, its related row and column in the adjacency matrix is set to zero.
Since the videos have different lengths, we have padded all the sequences to match the longest one, which has
frames.Finally, the feature columns have been standardized. The resulting dataset is composed of vertexfeature matrices belonging to , adjacency matrices (in ), and each graph belongs to one of the classes.
4.2 Experimental Settings
In our experiments, we have compared the results achieved by the proposed architectures with those obtained by other baseline networks (see Section 4.3 for a full description of the chosen baselines).
For the baselines that are not able to explicitly exploit sequentiality in the data, we have flatten the temporal dimension of all the sequences, thus considering the same point in two different time instants as two different training samples.
The hyperparameters of all the networks (in terms of number of nodes of each layer and dropout rate) have been appropriately tuned by means of a grid approach. The performances are assessed employing iterations of Monte Carlo CrossValidation^{5}^{5}5This approach randomly selects (without replacement) some fraction of the data to build the training set, and it assignes the rest of the samples to the test set. This process is repeated multiple times, generating (at random) new training and test partitions each time. Notice that, in our experiments, the training set is further split into training and validation. preserving the percentage of samples for each class. It is important to underline that, the train/test sets are generated once, and they are used to evaluate all the architectures, to keep the experiments as fair as possible. To assess the performances of all the considered architectures we have employed Accuracy and Unweighted F1 Measure^{6}^{6}6The Unweighted F1 Measure evaluates the F1 scores for each label class, and find their unweighted mean: , where and are the precision and the recall of the class .. Moreover, the training phase has been performed using Adam [11] for a maximum of epochs, and for each network (independently for Accuracy and F1 Measure) we have selected the epoch where the learned model achieved the best performance on the validation set using the learned model to finally assess the performance on the test set.
4.3 Results
4.3.1 Dblp
We have compared the approaches proposed in Section 3.2 (WDGCN and CDGCN) against the following baseline methodologies: a GCN composed of two layers, a network made of two FC layers, a network composed of LSTM+FC, and a deeper architecture made of FC+LSTM+FC. Note that, the FC is a Fully Connected layer; when it appears as the first layer of a network it employes a activation, instead a activation is used when it is the last layer of a network. The test set contains of the vertices. Moreover, of the remaining vertices (the training ones) have been used for validation purposes. It is important to underline that, an unlabeled vertex remains unlabeled for all the years in the sequence, i.e. considering Definition 6, , . In Table 1, the best hyperparameter configurations together with the test results of all the evaluated architectures are presented. Accuracy Unweighted F1 Measure Network Hyperparams Grid 5cmBest Config. 5cmPerformance mean std 5cmBest Config. 5cmPerformance mean std FC+FC 5cm1 FC nodes: dropout: 5cm 5cm250 50% 5cm250 40% [1pt/3pt] GC+GC 5cm1 GC nodes: dropout: 5cm 5cm350 50% 5cm350 10% [1pt/3pt] LSTM+FC 5cmLSTM nodes: dropout: 5cm 5cm100 0% 5cm100 0% [1pt/3pt] FC+LSTM+FC 5cmFC nodes: LSTM nodes: dropout: 5cm 5cm300 300 50% 5cm300 300 50% WDGCN 5cmwdGC nodes: vLSTM nodes: dropout: 5cm 5cm300 300 50% 5cm400 300 0% [1pt/3pt] CDGCN 5cmcdGC nodes: vLSTM nodes: dropout: 5cm 5cm200 100 50% 5cm200 100 50% Employing the best configuration for each of the architectures in Table 1, we have further assessed the quality of the tested approaches by evaluating them by changing the ratio of the labeled vertices as follows: , , , , , ,
. To obtain robust estimations, we have averaged the performances by means of
iterations of Monte Carlo CrossValidation. Figure 2 reports the results of this experiment.Finally, WDGCN and CDGCN have shown little sensitivity to the labeling ratio, further demonstrating the robustness of our methods.
4.3.2 Cad120
We have compared the approaches proposed in Section 3.3 against a GC+gsFC network, a vsFC+gsFC architecture, a vLSTM+gsFC network, and a deeper architecture made of vsFC+vLSTM+gsFC. Notice that, for this architectures, the vsFCs are used with a activation, instead of a .
The of the videos has been selected for testing the performances of the model, and of the remaining videos has been employed for validation.
Accuracy  Unweighted F1 Measure  
Network  Hyperparams  Grid  5cmBest  
Config.  5cmPerformance  
mean std  5cmBest  
Config.  5cmPerformance  
mean std  
vsFC+gsFC  5cm1 vsFC nodes:  
dropout:  5cm  
5cm100  
20%  5cm200  
20%  
[1pt/3pt]  
GC+gsFC  5cm1 GC nodes:  
dropout:  5cm  
5cm250  
30%  5cm250  
50%  
[1pt/3pt]  
vLSTM+gsFC  5cmLSTM nodes:  
dropout:  5cm  
5cm150  
0%  5cm150  
0%  
[1pt/3pt]  
vsFC+vLSTM+gsFC  5cmvsFC nodes:  
vLSTM nodes:  
dropout:  5cm  
5cm200  
150  
20%  5cm200  
150  
20%  
WDGCN  5cmwdGC nodes:  
vLSTM nodes:  
dropout:  5cm  
5cm250  
150  
30%  5cm250  
150  
30%  
[1pt/3pt]  
CDGCN  5cmcdGC nodes:  
vLSTM nodes:  
dropout:  5cm  
5cm250  
150  
30%  5cm250  
150  
30% 
Table 2 shows the results of this experiment. The obtained results have shown that only CDGCN has outperformed the baseline, while WDGCN has reached performances similar to those obtained by the baseline architectures. This difference may be due to the low number of vertices in the sequence of graphs. Under this setting, the predictive power of the graph convolutional features is less effective, and the CDGCN approach, which augments the plain vertexfeatures with the graph convolutional ones, provides an advantage. Hence, we can further suppose that, while WDGCN and CDGCN
are suitable to effectively exploit structure in graphs with high vertexcardinality, only the latter can deal with dataset with limited amount of nodes. It is worth noting that, despite all the experiments have shown a high variance in their performances, the Wilcoxon test has shown that
CDGCN is statistically better than the baselines with a pvalue for the Unweighted F1 Measure and for the Accuracy. This reveals that in almost every iteration of the Monte Carlo CrossValidation, the CDGCN has performed better than the baselines.Finally, the same considerations presented for the DBLP dataset regarding the depth and the number of parameters are valid also for this set of data.
5 Conclusions and Future Works
We have introduced for the first time, two neural network approaches that are able to deal with semisupervised classification of sequence of vertices and supervised classification of sequence of graphs. Our models are based on modified GC layers connected with a modified version of LSTM. We have assessed their performances on two datasets against some baselines, showing the superiority of both of them for semisupervised classification of sequence of vertices, and the superiority of CDGCN for supervised classification of sequence of graphs.
We can hypothesize that the differences between the WDGCN and the CDGCN performances when the graph size is small are due to the feature augmentation approach employed by CDGCN. This conjecture should be addressed in future works.
In our opinion, interesting extensions of our work may consist in: the usage of alternative recurrent units to replace LSTM; to propose further extensions of the GC unit; to explore the performance of deeper architectures that combine the layers proposed in this work.
References
 [1] Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (2013)
 [2] Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLP. pp. 1724–1734 (2014)
 [3] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS (2016)
 [4] Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: NIPS (2015)

[5]
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
 [6] Grover, A., Leskovec, J.: Node2vec: Scalable feature learning for networks. In: ACM SIGKDD. pp. 855–864. ACM (2016)
 [7] Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Applied and Comput. Harmonic Analysis 30 (2), 129–150 (2011)
 [8] Hochreiter, S., Schmidhuber, J.: Long shortterm memory. Neural Comput. 9(8), 1735–1780 (Nov 1997)
 [9] Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: StructuralRNN: Deep learning on spatiotemporal graphs. In: CVPR. pp. 5308–5317. IEEE (2016)
 [10] Jain, L.C., Medsker, L.R.: Recurrent Neural Networks: Design and Applications. CRC Press, Inc., 1st edn. (1999)
 [11] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
 [12] Kipf, T.N., Welling, M.: Semisupervised classification with graph convolutional networks. In: ICLR (2017)
 [13] Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGBD videos. Int. J. Rob. Res. 32(8), 951–970 (Jul 2013)
 [14] Li, K., Guo, S., Du, N., Gao, J., Zhang, A.: Learning, analyzing and predicting object roles on dynamic networks. In: IEEE ICDM. pp. 428–437 (2013)
 [15] Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.S.: Gated graph sequence neural networks. In: ICLR (2016)
 [16] Lowe, D.G.: Object recognition from local scaleinvariant features. In: ICCV. pp. 1150–1157. IEEE (1999)
 [17] Pei, Y., Zhang, J., Fletcher, G.H., Pechenizkiy, M.: Node classification in dynamic social networks. In: AALTD 2016: 2nd ECMLPKDD International Workshop on Advanced Analytics and Learning on Temporal Data. pp. 54–93 (2016)
 [18] Perozzi, B., AlRfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: ACM SIGKDD. pp. 701–710. ACM (2014)
 [19] Rozza, A., Manzo, M., Petrosino, A.: A novel graphbased fisher kernel method for semisupervised learning. In: ICPR. pp. 3786–3791. IEEE (2014)
 [20] Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2009)
 [21] Xu, H., Yang, Y., Wang, L., Liu, W.: Node classification in social network via a factor graph model. In: PAKDD. pp. 213–224 (2013)
 [22] Yao, Y., Holder, L.: Scalable SVMbased classification in dynamic graphs. In: IEEE ICDM. pp. 650–659 (2014)
 [23] Zhao, Y., Wang, G., Yu, P.S., Liu, S., Zhang, S.: Inferring social roles and statuses in social networks. In: ACM SIGKDD. pp. 695–703. ACM (2013)
 [24] Zhu, X., Ghahramani, Z., Lafferty, J., et al.: Semisupervised learning using gaussian fields and harmonic functions. In: ICML. vol. 3, pp. 912–919 (2003)
Comments
There are no comments yet.