Many problems regarding structured predictions are encountered in the process of ”translating” an input data (e.g., images, texts) into a corresponding output data, which is to learn a translation mapping from the input domain to the target domain. For example, many problems in image processing and computer vision can be seen as a ”translation” from an input image into a corresponding output image. Similar applications can also be found in language translation[35, 36, 37], where sentences (sequences of words) in one language are translated into corresponding sentences in another language. Such generic translation problem, which is important yet has been extremely difficult in nature, has attracted rapidly-increasing attention in recent years. The conventional data translation problem typically considers the data under special topology. For example, an image is a type of grid where each pixel is a node and each node has connections to its spatial neighbors. Texts are typically considered as sequences where each node is a word and an edge exists between two contextual words. Both grids and sequences are special types of graphs. In many practical applications, it is required to work on data with more flexible structures than grids and sequences, and hence more powerful translation techniques are required in order to handle more generic graph-structured data. This has been widely applied into many applications, e.g. predicting future states of a system in the physical domain based on the fixed relations (e.g. gravitational forces) among nodes  and the traffic speed forecasting on the road networks [19, 39]. Though they can work on generic graph-structured data, they assume that the graphs from the input domain and target domain share the same graph topology but cannot model or predict the change of the graph topology.
To address the above issues where the topology can change during translation, deep learning-based graph translation problem has debuted in the very recent years. This problem is promising and critical to the domains where the variations of the graph topology are possible and frequent such as social network and cyber-network. For example, in social networks where people are the nodes and their contacts are the edges, the contact graph among them vary dramatically across different situations. For example, when the people are organizing a riot, it is expected that the contact graph to become denser and several special “hubs” (e.g., key players) may appear. Hence, accurately predicting the contact network in a target situation is highly beneficial to situational awareness and resource allocation. Existing topology translation models[11, 33] predict the graph topology (i.e., edges) in a target domain based on that in an input domain. They focus on predicting the graph topology but assume that the node attributes value are fixed or do not exist.
Therefore, existing works either predict node attributes upon fixed topology or predict edge attributes upon fixed node attributes. However, in many applications, both node attributes and edge attributes can change. In this paper, such generic problem is named as multi-attributed graph translation, with important real-world applications ranging from biological structural to functional network translation  to network intervention research . For example, the process of malware confinement111A device infected in an IoT network can propagate to other nodes connected to it, leading to contaminating the whole network, such as MiraiBot attack. As such, it is non-trivial to confine the malware to limit the infection and also equally important to maintain overall network connectivity and performance. over IoT (Internet of Things) is typically a graph translation problem as shown in Fig. 1
. It takes the initial status of IoT as input, and predicts the target graph which is ideally the optimal status of the network with modified connections (i.e., edges) and devices (i.e., nodes) state that helps to limit malware propagation and maintain network throughput. Epidemic controlling can also be considered as a multi-attributed graph translation problem, which is to estimate how the initial disease contact network (i.e., multi-attributed edges) and the human health stage (i.e., multi-attribute nodes) are jointly changed after the specific interventions. Since multi-attributed graph translation problem is highly sophisticated, there is no generic framework yet, but only ad-hoc methods for few specific domains, which heavily rely on intensive hand-crafting and domain-specific mechanistic models that could be extremely time- and resource- consuming to run in large scale. Hence, a generic, efficient, and end-to-end framework for general multi-attributed graph translation problems is highly in demand. Such framework needs to be able to comprehensively learn the translation mapping, remedy human bias by enjoying the large historical data, and achieve efficient prediction.
In this paper, we focus on the generic problem of multi-attributed graph translation, which cannot be handled by the existing methods because of the following challenges: 1) Translation of node and edge attributes are mutually dependent. The translation of edge attributes should not only consider edges, but also the node attributes. For example, in Fig. 1, two links are cut down since their linked Device 1 is compromised, which exemplifies the interplay between nodes and edges. Similarly, node translation also needs to jointly consider both nodes and edges, e.g., Device 4 is infected due to its link to Device 1. All the above issues need to be jointly considered but no existing works can handle. 2) Asynchronous and iterative changes of node and edge attributes during graph translation. The multi-attributed graph translation process may involve a series of iterative changes in both edge and node attributes. For example in Fig.1, the translation could take several steps since the malware propagation is an iterative process from one device to the others. The links to a device may be cut (i.e., edge changes) right after it is compromised (i.e, node attribute change). These orders and dependencies of how node and edge attributes change during the translation are very important, yet difficult to be learned. 3) Difficulty in discovering and enforcing the correct consistency between node attributes and graph spectra. Although the predicted node and edge attributes are two different outputs, they should be highly dependent on each other instead of being irrelevant. For example, as shown in Fig. 1, the reason why Devices 2 and 3 on the right graph are not compromised is that they have no links with the compromised Device 1 anymore. It is highly challenging to learn and maintain the consistency of node and edge attributes, which are very sophisticated and domain-specific patterns.
To the best of our knowledge, this is the first work that addresses all the above challenges and provides a generic framework for the multi-attributed graph translation problem. This paper propose an Node-Edge Co-evolving Deep Graph Translator (NEC-DGT) with novel architecture and components for joint node and edge translation. Multi-block network with novel interactive node and edge translation paths are developed to translate both node and edge attributes, while skip-connection is utilized among different blocks to allow the non-synchronicity of changes in node and edge attributes. A novel spectral graph regularization is designed to ensure the consistency of nodes and edges in generated graphs. The contributions of this work are summarized as follows:
The development of a new framework for multi-attributed graph translation. We formulate, for the first time, a multi-attributed graph translation problem and propose the NEC-DGT to tackle this problem. The proposed framework is generic for different applications where both node and edge attributes can change after translation.
The proposal of novel and generic edge translation layers and blocks. A new edge translation path is proposed to translate the edge attributes from the input domain to the output domain. Existing edge translation methods were proven to be special cases of ours, which can handle broad multi-attribute edges and nodes.
The proposal of a spectral-based regularization that ensures consistency of the predicted nodes and edges. In order to discover and maintain the inherent relationships between predicted nodes and edges, a new non-parametric graph Laplacian regularization with a graph frequency regularization is proposed and leveraged.
The conduct of extensive experiments to validate the effectiveness and efficiency of the proposed model. Extensive experiments on four synthetic and four real-world datasets demonstrated that NEC-DGT is capable of generating graphs close to ground-truth target graphs and significantly outperforms other generative models.
Ii Related Works
Graph neural networks learning
. In recent years, there has been a surge in research focusing on graph neural networks, which are generally divided into two categories: Graph Recurrent Networks[10, 30, 20] and Graph Convolutional Networks [24, 23, 7, 16, 25, 6, 17, 34]. Graph Recurrent Networks originates from the early works of graph neural networks proposed by Gori et al.  and Scarselli et al. 
based on recursive neural networks. Another line of research is to generalize convolutional neural networks from grids (e.g., images) to generic graphs. Bruna et al. first introduced the spectral graph convolutional neural networks, and then it was extended by Defferrard et al.  using fast localized convolutions, which is further approximated for an efficient architecture for a semi-supervised setting .
Graph generation. Most of the existing GNN based graph generation for general graphs have been proposed in the last two years and are based on VAE [31, 27] and generative adversarial nets (GANs) , among others [21, 38]. Most of these approaches generate nodes and edges sequentially to form a whole graph, leading to the issues of being sensitive to the generation order and very time-consuming for large graphs. Differently, GraphRNN  builds an autoregressive generative model on these sequences with LSTM model and has demonstrated its good scalability.
Graph structured data translation. The existing Graph structured data translation either deal with the node attributes prediction or translate the graph topology. Node attributes prediction aims at predicting the node attributes given the fixed graph topology [3, 19, 39, 9]. Li et al. 
propose a Diffusion Convolution Recurrent Neural Network (DCRNN) for traffic forecasting which incorporates both spatial and temporal dependency in the traffic flow. Yu et al. formulated the node attributes prediction problem of graphs based on the complete convolution structures. Graph topology translation considers the change of graph topology from one domain distributions to another. Guo et al.  proposed and tackled graph topology translation problem by proposing a generative model consisting of a graph translator with graph convolution and deconvolution layers and a new conditional graph discriminator. Sun et al.  proposed a graphRNN based model which generates a graph’s topology based on another graph.
Iii Problem Formulation
This paper focuses on predicting a target multi-attributed graph based on an input multi-attributed graph by learning the graph translation mapping between them. The following provides the notations and mathematical problem formulation.
|Input graph with node set , edge set
, edge attributes tensorand node attributes matrix
|Target graph with node set , edge set , edge attributes tensor and node attributes matrix|
Contextual information vector
|Number of nodes|
|Number of edges|
|Dimension of node attributes|
|Dimension of edge attributes|
|Dimension of contextual information vector|
|Number of translation blocks|
Define an input graph as where is the set of nodes, and is the set of edges. is an edge connecting nodes and . contains all pairs of nodes while the existence of is reflected by its attributes. is the edge attributes tensor, where denotes the edge attributes of edge and is the dimension of edge attributes. refers to the node attribute matrix, where is the node attributes of node and is the dimension of the node attributes. Similarly, we define the target graph as . Note that the target and input graphs are different both in their node attributes as well as edge attributes. Moreover, vector provides some contextual information on the translation process. Therefore, multi-attributed graph translation is defined as learning a mapping: .
For example, considering the malware confinement case where the nodes refer to IoT devices and the edges reflect the communication links between two devices. The node attributes include the malware-infection status and the properties of that device (i.e., specification and anti-virus software features). A single IoT device (i.e., node) that is compromised has the potential to spread malware infection across the network, eventually compromising the network or even ceasing the network functionality. In contrast, in order to avoid malware spreading as well as maintain the performance of the network, the network connectivity (i.e., graph topology) should be modified through malware confinement, thus to change the device status (i.e., node attributes) accordingly. Hence, malware confinement can be considered as predicting the optimal topology as well as the corresponding node and edge attributes of the target graph, where both malware prevention and device performance are maximized.
Multi-attributed graph translation problem requires to highlight several unique considerations as depicted in Fig.2: 1) Edges-to-edges interaction: In target domain, the edge attributes of an edge can be influenced by its incident edges’ attributes and in input domain. For example, in Fig. 2 (a), if Devices 1 and 3 must be prevented from infection, then the edges between the compromised Device 1 and Device 2 need to be cut, due to the paths among them in input domain. 2) Nodes-to-edges interaction: In target domain, the attributes of edge can be influenced by its incident nodes’ attributes and in the input domain. As shown in Fig. 2 (b), if Device 2 is compromised in input domain, then in target domain, only its connections to Devices 1 and 3 need to be removed but the connection between Devices 1 and 3 can be retained because they are not compromised. 3) Nodes-to-nodes interaction: For a given node , its attribute in input domain may directly influence its attribute in target domain. As shown in Fig. 2 (c), Device 3 with effective anti-virus protection (e.g. firewall) may not be easily compromised in target domain. 4) Edges-to-nodes interaction: For a given node , its related edge attributes in input domain may affect its attributes in target domain. As shown in Fig. 2 (d), Device 1 which has more connections with compromised devices in input domain is more likely to be infected in target domain. 5) Spectral Graph Property: There exist relationships between nodes and edges in one graph as reflected by the graph spectrum. These relationships are claimed to have some persistent or consistent patterns across input and target domains, which have also been verified in many real-world applications such as brain networks . For example, as shown in Fig. 2 (e), the devices that are densely connected as a sub-community tend to be in the same node status, which is a shared pattern for relationships between nodes and edges in different domains.
Multi-attributed graph translation should consider all the above properties, which cannot be comprehensively handled by existing methods because: 1) Lack of a generic framework to simultaneously characterize and automatically infer all of the above node-edge interactions during translation process. 2) Difficulty in automatically discovering and characterizing the inherent spectral relationship between the nodes and edges in each graph, and ensuring consistent spectral patterns in graphs across input and target domains. 3) All the above interactions could be imposed repeatedly, alternately, and asynchronously during the translation process. It is difficult to discover and characterize such important yet sophisticated process.
Iv The Proposed Method: NEC-DGT
In this section, we propose the Node-Edge Co-evolving Deep Graph Translator (NEC-DGT) to model the multi-attributed graph translation process. First, an introduction of the overall architecture and the loss functions is given. Then, the elaborations of three modules on edge translation, node translation, and graph spectral regularization are presented.
Iv-a Overall architecture
Multi-block asynchronous translation architecture. The proposed NEC-DGT learns the distribution of graphs in the target domain conditioning on the input graphs and contextual information. However, such a translation process from input graph to the final target graph may experience a series of interactions of different types among edges and nodes. Also, such a sophisticated process is hidden and needs to be learned by a sufficiently flexible and powerful model. To address this, we propose the NEC-DGT as shown in Fig. 3. Specifically, the node and edge attributes of input graphs are inputted into the model and the model output the generated target graphs’ node attributes and edge attributes after several blocks. The skip-connection architecture (black dotted lines in Fig. 3) implemented across different blocks aims to deal with the asynchrony property of different blocks, which ensures that the final translated results fully utilize various combinations of blocks’ information. To train the deep neural network to generate the target graph conditioning on the input graph and contextual information , we minimize the loss function as follows:
where the nodes set and as well as edges set and can be reflected in and , as well as and .
Node and edge translation paths. To jointly tackle various interactions among nodes and edges, respective translation paths are proposed for each block. In node translation path (in upper part of detailed structure in Fig. 3), node attributes are generated considering the ”nodes-to-nodes” and ”edges-to-nodes” interactions. In edge translation path (in lower part of detailed structure in Fig. 3), edge attributes are generated following the ”edges-to-edges” and ”node-to-edges” interactions.
Spectral graph regularization
. To discover and characterize the inherent relationship between nodes and edges of each graph, the frequency domain properties of the graph is learned, based on which the interactions between node and edge attributes are jointly regularized upon non-parametric graph Laplacian. Moreover, to maintain consistent spectral properties throughout the translation process, we enforce the shared patterns among the generated nodes and edges in different blocks by regularizing their relevant parameters in the frequency domain. The regularization of the graphs is formalized as follows:
where refers to the number of blocks, and refers to the overall parameters in the spectral graph regularization. and refer to the generated edge attributes tensor and node attributes matrix in the th block. Thus is the generated target graph. Then the final loss function can be summarized as follows:
where is the trade-off between the and spectral graph regularization. The model is trained by minimizing the mean squared error of with , and with
, enforced by the regularization. Optimization methods (e.g. Stochastic gradient descent (SGD) and Adam) based on Back-propagation technique can be utilized to optimize the whole model.
Iv-B Edge Translation Path
Edge translation path aims to model the nodes-to-edges and edges-to-edges interactions, where edge attributes in the target domain can be influenced by both nodes and edges in the input domain. Therefore, we propose to first jointly embed both node and edge information into influence vectors and then decode it to generate edges attributes. Specifically, the edge translation path of each block contains two functions, influence-on-edge function which encodes each pair of edge and node attributes into the influence for generating edges, and the edge updating function which aggregates all the influences related to each edge into an integrated influence and decodes this integrated influence to generate each edge’ attributes. Fig. 4 shows the operation of the two functions in a single block by translating the current input of graph to output graph .
Influence-on-edge layers. As shown in Fig. 4, the input graph is first organized in unit of several pairs of node and edge attributes. For each pair of nodes and , we concatenate their edge attributes and their node attributes: and as: (as circled in black rectangles in Fig. 4). Then
is inputted into the influence-on-edge function: a constrained MLP (Multilayer Perceptron)which is used to calculate the influence from the pair of the nodes and . refers to the dimension of the final influence on edges. for edge translation path is expressed as follows:
where and are weights and bias for in edge translation path. refers to the number of layers of and
refers to the activation functions. For undirected graph, we add a weight constraintto ensure that the influence of is the same as the influence of , which means that the first rows (related to the attributes of node ) and the last rows (related to the attributes of node ) of are shared. The influence on edges of each pair is computed through the same function with the same weights. Thus the NEC-DGT can handle various size of graphs.
Edge updating layers. After calculating the influence of each pair of nodes and edge, the next step is to assigning each pairs’ influences to its related edge to get the integrated influence for each edge (as shown of operation in Fig.4). This is because each edge is generated depending on both its two related nodes and its incident edges (like the pairs circled in the orange rectangle and purple rectangle related to node and node respectively in Fig.4). Here we define the integrated influence on one edge attribute as: , which is computed as follows:
where refers to the neighbor nodes of node . Then the edge attributes is generated by , where refers to the input edge attributes of edge . refers to the contextual information for the translation. The function is implemented by an MLP.
Relationship with other edge convolution networks. Edge convolution network is the most typical method to handle the edge embedding in graphs, which was first introduced as BrainNetCNN  and later explored in many studies [11, 18, 32]. Our edge translation path is a highly flexible and generic mechanism to handle multi-attributed nodes and edges. Several existing edge convolution layers and their variants can be considered as special cases of our method, as demonstrated in the following theorem222The proof process is available athttps://github.com/xguo7/NEC-DGT:
The influence-on-edge function in edge translation path of NEC-DGT is a generalization of conventional edge convolution networks.
Iv-C Node Translation Path
Node translation aims to learn the “nodes-to-nodes” and “edges-to-nodes” interactions, where translation of one node’s attributes depends on the edge attributes related to this node and its own attributes. The node translation path of each block contains two functions, influence-on-node function which learns the influence from each pair of nodes, and node updating function which generates the new node attributes by aggregating all the influences from pairs containing this node. Fig. 5 shows how to translate a node in a single block.
Influence-on-node layers. As shown in Fig. 5, the input graph is first organized in the unit of pairs of nodes, where each pair is which is similar to the edge translation path (as circled in the black rectangle in Fig. 5). Then is inputted into the influence-on-node function, which is implemented by contrained MLP as Equation (4), to compute the influence to nodes (as shown in the grey bar after in Fig. 5), where is the dimension of the influence on nodes.
Node updating layers. After computing the influences of each node pair, the next step is to generate node attributes. For node , an assignment step is required to aggregate all the influences from pairs containing node (as shown of operation in Fig. 5). Thus, all the influences for node are aggregated and input into the updating function, which is implemented by a MLP model to calculate the attributes of node as: .
Iv-D Graph spectral-based regularization
Based on the edge and node translation path introduced above, we can generate node and edge attributes, respectively. However, since these generated node and edge attributes are predicted separately in different paths, their patterns may not be consistent and harmonic. To ensure the consistency of the edge and node patterns mentioned in Section III, we propose a novel adaptive regularization based on non-parametric graph Laplacian, and a graph frequency regularization.
Non-parametric Graph Laplacian Regularization. First, we recall the property of the multi-attributed graphs where node information can be smoothed over the graph via some form of explicit graph-based regularization, namely, by the well-known graph Laplacian regularization term : , where is the node attribute vector for the th node attribute and is the edge attribute matrix for th attribute generated in the th block. denotes the graph Laplacian for the th edge attributes matrix. The degree matrix is computed as: .
However, the above traditional graph Laplacian can only impose an absolute smoothness regularization over all the nodes by forcing the neighbor nodes to have similar attribute values, which is often over-restrictive for many situations such as in signed networks and teleconnections. In the real world, the correlation among the nodes is much more complicated than purely ”smoothness” but should be a mixed pattern of different types of relations. To address this, we propose an end-to-end framework of non-parametric graph Laplacian which can automatically learn such node correlation patterns inherent in specific types of graphs, with rigorous foundations on spectral graph theory. In essence, we propose the non-parametric graph Laplacian based on the parameter as: . is the normalized Laplacian computed as and can be diagonalized by the Fourier basis , such that where is a diagonal matrix storing the graph frequencies. For example, is the frequency value of the first Fourier basis . Then we got . Therefore, we have the regularization as follows:
is a non-paramteric Laplacian eigenvalues that will be introduced subsequently.
Scalable approximation. is a non-parametric vector whose parameters are all free; It can be defined as: , where the parameter is a vector of Fourier coefficients for a graph. However, optimizing the parametric eigenvalues has the learning complexity of , the dimensionality of the graphs, which is not scalable for large graphs. To reduce the learning complexity of to , we propose approximating by a normalized truncated expansion in terms of Chebyshev polynomials . The Chebyshev polynomial of order p may be computed by the stable recurrence relation with and . The eigenvalues of the approximated Laplacian filter can thus be parametric as the truncated expansion:
for orders, where is the Chebyshev polynomial of order evaluated at , a diagonal matrix of scaled eigenvalues that lie in . The refers to the largest element in . denotes the parameter tensor for all blocks. is the th element of Chebyshev coefficients vector for the th edge attribute. Each is normalized by dividing the sum of all the coefficients in to avoid the situation where is trained as zero. Thus, the laplacian computation can then be written as , where is the Chebyshev polynomial of order evaluated at the scaled Laplacian . For efficient computation, we further approximate , as we can expect that the neural network parameters will adapt to this change in scale during training.
Graph frequency regularization. To ensure that the spectral graph patterns are consistent throughout the translation process across different blocks, we utilize a graph frequency regularization to not only maintain the similarity but also allow the exclusive properties of each block’s patterns to be reserved to some degree. Specifically, regarding all the frequency pattern basis of form , some are important in modeling the relationships between nodes and graphs while some are not, resulting in the sparsity pattern of . Thus, inspired by the multi-task learning, we learn the consistent sparsity pattern of by using the norm as regularization:
Iv-E Complexity Analysis
The proposed NEC-DGT requires operations in time complexity and space complexity in terms of number of nodes in the graph. It is more scalable than most of the graph generation methods. For example, GraphVAE  requires operations in the worst case and Li et al  uses graph neural networks to perform a form of “message passing” with operations to generate a graph.
In this section, we present both the quantitative and qualitative experiment results on NEC-DGT as well as the comparison models. All experiments are conducted on a 64-bit machine with Nvidia GPU (GTX 1070, 1683 MHz, 8 GB GDDR5). The model is trained by ADAM optimization algorithm333The code of the model and additional experiment results are available at:https://github.com/xguo7/NEC-DGT.
V-a Experimental Setup
We performed experiments on four synthetic datasets and four real-world datasets with different graph sizes and characteristics. All the dataset contain input-target pairs.
Synthetic dataset: Four datsets are generated based on different types of graphs and translation rules. The input graphs of the first three datasets (named as Syn-I, Syn-II, and Syn-III) are Erdos-Renyi (E-R) graphs generated by the Erdos Renyi model 
with the edge probability of 0.2 and graph size of 20, 40, and 60 respectively. The target graph topology is the 2-hop connection of the input graph, where each edge in the target graph refers to the 2-hop reachability in the input graph (e.g. if nodeis 2-hop reachable to node in the input graph, then they are connected in the target graph). The input graphs of the fourth dataset (named as Syn-IV) are Barabási-Albert (B-A) graphs generated by the Barabási-Albert model  with 20 nodes, where each node is connected to 1 existing node. In Syn-IV, topology of target graph is the 3-hop connection of the input graph. For all the four datasets, the edge attributes denotes the existence of the edge. For both input and target graphs, the node attributes are continuous values computed following the polynomial function: , where is the node degree and is the node attribute. Each dataset is divided into two subsets, each of which has 250 pairs of graphs. Validation is conducted where one subset is used for training and another for testing, and then exchange them for another validation. The average result of the two validations is regarded as the final result.
Malware confinement dataset: Malware dataset are used for measuring the performance of NEC-DGT for malware confinement prediction. There are three sets of IoT nodes at different amount (20, 40 and 60) encompassing temperature sensors connected with Intel ATLASEDGE Board and Beagle Boards (BeagleBone Blue), communicating via Bluetooth protocol. Benign and malware activities are executed on these devices to generate the initial attacked networks as the input graphs. Benign activities include MiBench  and SPEC2006 , Linux system programs, and word processor. The nodes represent devices and node attribute is a binary value referring to whether the device is compromised or not. Edge represents the connection of two devices and the edge attribute is a continuous value reflecting the distance of two devices. The real target graphs are generated by the classical malware confinement methods: stochastic controlling with malware detection [28, 26, 29]. We collected 334 pairs of input and target graphs with different contextual parameters (infection rate, recovery rate, and decay rate) for each of the three datasets. Each dataset is divided into two subsets: one has 200 pairs and another has 134 pairs. The validation is conducted in the same way as the synthetic dataset.
Molecule reaction dataset: We apply our NEC-DGT to one of the fundamental problems in organic chemistry, thus predicting the product (target graph) of chemical reaction given the reactant (input graph). Each molecular graph consists of atoms as nodes and bond as edges. The input molecule graph has multiple connected components since there are multiple molecules comprising the reactants. The reactions used for training are atom-mapped so that each atom in the product graph has a unique corresponding atom in the reactants. We used reactions from USPTO granted patents, collected by Lowe . we obtained a set of 5,000 reactions (reactant-product pair) and divided them into 2,500 and 2,500 for training and testing. Atom (node) features include its elemental identity, degree of connectivity, number of attached hydrogen atoms, implicit valence, and aromaticity. Bond (edge) features include bond type (single, double, triple, or aromatic), and whether it is connected.
V-A2 Comparison methods
Since there is no existing method handling the multi-attributed graph translation problem, NEC-DGT is compared with two categories of methods: 1) graph topology generation methods, and 2) graph node attributes prediction methods.
Graph topology generation methods: 1) GraphRNN  is a recent graph generation method based on sequential generation with LSTM model; 2) Graph Variational Auto-encoder (GraphVAE)  is a VAE based graph generation method for small graphs; 3) Graph Translation-Generative Adversarial Networks (GT-GAN)  is a new graph topology translation method based on graph generative adversarial network.
Node attributes prediction methods: 1) Interaction Network (IN)  is a node state updating network considering the interaction of neighboring nodes; 2) DCRNN  is a node attribute prediction network for tranffic flow prediction; 3) Spatio-Temporal Graph Convolutional Networks (STGCN)  is a node attribute prediction model for traffic speed forecast.
Furthermore, to validate the effectiveness of the graph spectral-based regularization, we conduct a comparison model (named as NR-DGT) which has the same architecture of NEC-DGT but without the graph regularization.
V-A3 Evaluation metrics
A set of metrics are used to measure the similarity between the generated and real target graphs in terms of node and edge attributes. To measure the attributes which are Boolean values, the Acc (accuracy) is utilized to evaluate the ratio of nodes or edges that are correctly predicted among all the nodes or possible node pairs. To measure the attributes which are continuous values, MSE (mean squared error), R2 (coefficient of determination score), Pearson and Spearman correlation are computed between attributes of generated and real target graphs. represents metrics evaluated on node attributes and represents metrics evaluated on edge attributes.
V-B1 Metric-based evaluation for synthetic graphs
For synthetic datasets, we compare the generated and real target graphs on various metrics and visualize the patterns captured in the generated graphs. Table II summarizes the effectiveness comparison for four synthetic datasets. The node attributes are continuous values evaluated by N-MSE, N-R2, N-P, and N-SP. The edge attributes are binary values evaluated by the accuracy of the correctly predicted edges. The results in Table II demonstrate that the proposed NEC-DGT outperforms other methods in both node and edge attributes prediction and is the only method to handle both. Specifically, in terms of node attributes, the proposed NEC-DGT get smaller N-MSE value than all the node attributes prediction methods by 85%, 71%, 95% and 95% on average for four dataset respectively. Also, NEC-DGT outperforms the other methods by 46%, 36%, 44% and 58% on average for four dataset respectively on N-R2, N-P, and N-SP. This is because all the node prediction methods only consider a fixed graph topology while NEC-DGT allows the edges to vary. In terms of edges, the proposed NEC-DGT get the highest E-ACC than all the other graph generation methods. It also has higher E-ACC than graph topology translation method: GT-GAN by 7% on average since NEC-DGT considers both edge and node attributes in learning the translation mapping while GT-GAN only considers edges. The proposed NEC-DGT outperforms the NR-DTG by around 3% on average in terms of all metrics, which demonstrates the effectiveness of the graph spectral-based regularization.
V-B2 Evaluation of the learned translation mapping for synthetic graphs
To evaluate whether the inherent relationship between node and edge (reflected by node degree) attributes is learned and maintained by NEC-DGT, we draw the distributions of the node attribute versus node degree of each node in the generated graphs to visualize their relationship. For comparison, a ground-truth correlation is drawn according to the predefined rule of generating the dataset, namely, each node’s degree and attribute follows the function . Fig. 6 shows four example distributions of nodes in terms of node attributes and degree with the black line as ground-truth. As shown in Fig. 6, the nodes are located closely on the ground-truth, especially for the syn-I and syn-IV, where around 85% nodes are correctly located. This is largely because the proposed graph spectral-based regularization successfully discovers the patterns: the densely connected nodes all tend to have large node attributes and in reverse.
V-B3 Metric-based Evaluation for malware datasets
Table III shows the evaluation of NEC-DGT by comparing the generated and real target graphs. For malware graphs, the node attributes are evaluated by N-ACC by calculating the percentage of nodes whose attributes are correctly predicted in all nodes. The edge attributes are continuous value evaluated by E-MSE, E-R2 and E-P. We also use E-Acc to evaluate the correct existence of edges among all pairs of nodes. The results in Table III demonstrates that NEC-DGT performs the best for all the three datasets. In terms of E-Acc, the graph generation methods (GraphRNN and GraphVAE) cannot handle the graph translation work and got low E-Acc of around 0.6 at Mal-I,Mal-II, and 0.8 at Mal-III. GT-GAN achieves high E-ACC, but its E-MSE is about 2 folds larger than that of the proposed NEC-DGT on average. NEC-DGT successfully handle the translation tasks with high E-Acc above 0.9, and the smallest E-MSE. In terms of N-Acc, NEC-DGT outperforms other methods by around 5% on the first two datasets. In summary, the proposed NEC-DGT can not only jointly predict the node and edges attributes, but also performs the best in most of metrics. The superiority of NEC-DGT over the NR-DGT in terms of E-MSE demonstrates that the graph spectral-based regularization indeed improve modeling translation mapping.
V-B4 Case study for malware dataset
Fig. 7 investigates three cases of input, real target and generated target graph by NEC-DGT. The green nodes refer to the uncompromised devices while the red nodes refer to the compromised devices. The width of each edge reflects the distance between two devices. In the first case, both in generated and real target graphs, Devices 4 and 6 are restored to normal, while Device 19 get attacked and is isolated from the other devices. It validates that our NEC-DGT successfully finds the rules of translating nodes and performs like the true confinement process. In the second case, Device 8 propagates the malware to Device 38, which is also modeled by NEC-DGT in generated graphs. In addition, the NEC-DGT not only correctly predicts the nodes attributes, but also discovers the change in edge attributes, e.g. in the third case, most of the connections of compromised Device 10 were cut both in generated and real target graphs.
V-B5 Metric-based Evaluation for Molecule Reaction datasets
In this task, the NEC-DGT is compared to the Weisfeiler-Lehman Difference Network (WLDN) , which is a graph learning model specially for reaction prediction. Table IV shows the performance of our NEC-DGT on the reaction dataset on five metrics, which are the same with the synthetic datasets. The proposed NEC-DGT outperforms both the translation model GT-GAN and the WLDN by 5% on average. Though the atoms do not change during reaction, we evaluate the capacity of our NEC-DGT to copy the input node features. As shown in Table IV, The NEC-DGT get the smallest N-MSE and get higher N-R2 than other comparison methods by around 18%. This shows that our NEC-DGT can deal with a wide range of real-world applications, whether the edges and nodes need change or keep stable.
Vi Conclusion and Future Work
This paper focuses on a new problem: multi-attributed graph translation. To achieve this, we propose a novel NEC-DGT consisting of several blocks which translates a multi-attributed input graph to a target graph. To jointly tackle the different types of interactions among nodes and edges, node and edge translation paths are proposed in each block and the graph spectral-based regularization is proposed to preserve the consistent spectral property of graphs. Extensive experiments have been conducted on the synthetic and real-world datasets. Experiment results show that our NEC-DGT can discover the ground-truth translation rules and significantly outperform comparison methods in terms effectiveness. This paper provides a further step of research for graph translation problems in more general scenarios.
This work was supported by the National Science Foundation grant: #1755850, #1841520, #1907805, Jeffress Trust Award, and NVIDIA GPU Grant.
-  (2018) Functional brain connectivity is predictable from anatomic network’s laplacian eigen-structure. NeuroImage 172, pp. 728–739. Cited by: §I, §III.
-  (1999) Emergence of scaling in random networks. science 286 (5439), pp. 509–512. Cited by: §V-A1.
-  (2016) Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510. Cited by: §I, §II, §V-A2.
-  (2018) Netgan: generating graphs via random walks. arXiv preprint arXiv:1803.00816. Cited by: §II.
-  (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §II.
Deep neural networks for learning graph representations.
Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §II.
-  (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §II.
-  (1960) On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 (1), pp. 17–60. Cited by: §V-A1.
-  (2018) Local event forecasting and synthesis using unpaired deep graph translations. In Proceedings of the 2nd ACM SIGSPATIAL Workshop on Analytics for Local Events and News, pp. 5. Cited by: §II.
-  (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §II.
-  (2018) Deep graph translation. arXiv preprint arXiv:1805.09980. Cited by: §I, §II, §IV-B, §V-A2.
-  (2001) MiBench: a free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pp. 3–14. Cited by: §V-A1.
-  (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129–150. Cited by: §IV-D.
-  (2006) SPEC cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34 (4), pp. 1–17. Cited by: §V-A1.
-  (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In NeurIPS, pp. 2607–2616. Cited by: §V-B5.
-  (2017) BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage 146, pp. 1038–1049. Cited by: §II, §IV-B.
-  (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II, §IV-D.
-  (2018) Modeling brain networks with artificial neural networks. In Graphs in Biomedical Image Analysis and Integrating Medical Imaging and Non-Imaging Modalities, pp. 43–53. Cited by: §IV-B.
-  (2017) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv preprint arXiv:1707.01926. Cited by: §I, §II, §V-A2.
-  (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §II.
-  (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §II, §IV-E.
-  (2014) Patent reaction extraction: downloads. Cited by: §V-A1.
-  (2017) Hierarchical graph embedding in vector space by graph pyramid. Pattern Recognition 61, pp. 245–254. Cited by: §II.
-  (2016) Learning convolutional neural networks for graphs. In ICML, pp. 2014–2023. Cited by: §II.
-  (2017) Kernel graph convolutional neural networks. arXiv preprint arXiv:1710.10689. Cited by: §II.
-  (2019) Lightweight node-level malware detection and network-level malware confinement in IoT networks. In ACM/EDAA/IEEE Design Automation and Test in Europe (DATE), Cited by: §V-A1.
Designing random graph models using variational autoencoders with applications to chemical design. arXiv preprint arXiv:1802.05283. Cited by: §II.
-  (2018) Ensemble learning for hardware-based malware detection: a comprehensive analysis and classification. In ACM/EDAA/IEEE Design Automation Conference, Cited by: §I, §V-A1.
2SMaRT: a two-stage machine learning-based approach for run-time specialized hardware-assisted malware detection. In ACM/EDAA/IEEE Design Automation and Test in Europe (DATE), Cited by: §V-A1.
-  (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §II.
-  (2018) Graphvae: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §II, §IV-E, §V-A2.
-  (2018) A domain guided cnn architecture for predicting age from structural brain images. arXiv preprint arXiv:1808.04362. Cited by: §IV-B.
-  (2019) Graph to graph: a topology aware approach for graph structures learning and generation. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2946–2955. Cited by: §I, §II.
-  (2019) Scalable global alignment graph kernel using random features: from node embedding to graph embedding. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1418–1428. Cited by: §II.
-  (2018) Graph2seq: graph to sequence learning with attention-based neural networks. arXiv preprint arXiv:1804.00823. Cited by: §I.
-  (2018) Exploiting rich syntactic information for semantic parsing with graph-to-sequence model. arXiv preprint arXiv:1808.07624. Cited by: §I.
SQL-to-text generation with graph-to-sequence model. arXiv preprint arXiv:1809.05255. Cited by: §I.
-  (2018) Graphrnn: generating realistic graphs with deep auto-regressive models. arXiv preprint arXiv:1802.08773. Cited by: §II, §V-A2.
-  (2017) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. Cited by: §I, §II, §V-A2.