I Introduction
Many problems regarding structured predictions are encountered in the process of ”translating” an input data (e.g., images, texts) into a corresponding output data, which is to learn a translation mapping from the input domain to the target domain. For example, many problems in image processing and computer vision can be seen as a ”translation” from an input image into a corresponding output image. Similar applications can also be found in language translation
[35, 36, 37], where sentences (sequences of words) in one language are translated into corresponding sentences in another language. Such generic translation problem, which is important yet has been extremely difficult in nature, has attracted rapidlyincreasing attention in recent years. The conventional data translation problem typically considers the data under special topology. For example, an image is a type of grid where each pixel is a node and each node has connections to its spatial neighbors. Texts are typically considered as sequences where each node is a word and an edge exists between two contextual words. Both grids and sequences are special types of graphs. In many practical applications, it is required to work on data with more flexible structures than grids and sequences, and hence more powerful translation techniques are required in order to handle more generic graphstructured data. This has been widely applied into many applications, e.g. predicting future states of a system in the physical domain based on the fixed relations (e.g. gravitational forces) among nodes [3] and the traffic speed forecasting on the road networks [19, 39]. Though they can work on generic graphstructured data, they assume that the graphs from the input domain and target domain share the same graph topology but cannot model or predict the change of the graph topology.To address the above issues where the topology can change during translation, deep learningbased graph translation problem has debuted in the very recent years. This problem is promising and critical to the domains where the variations of the graph topology are possible and frequent such as social network and cybernetwork. For example, in social networks where people are the nodes and their contacts are the edges, the contact graph among them vary dramatically across different situations. For example, when the people are organizing a riot, it is expected that the contact graph to become denser and several special “hubs” (e.g., key players) may appear. Hence, accurately predicting the contact network in a target situation is highly beneficial to situational awareness and resource allocation. Existing topology translation models
[11, 33] predict the graph topology (i.e., edges) in a target domain based on that in an input domain. They focus on predicting the graph topology but assume that the node attributes value are fixed or do not exist.Therefore, existing works either predict node attributes upon fixed topology or predict edge attributes upon fixed node attributes. However, in many applications, both node attributes and edge attributes can change. In this paper, such generic problem is named as multiattributed graph translation, with important realworld applications ranging from biological structural to functional network translation [1] to network intervention research [28]. For example, the process of malware confinement^{1}^{1}1A device infected in an IoT network can propagate to other nodes connected to it, leading to contaminating the whole network, such as MiraiBot attack. As such, it is nontrivial to confine the malware to limit the infection and also equally important to maintain overall network connectivity and performance. over IoT (Internet of Things) is typically a graph translation problem as shown in Fig. 1
. It takes the initial status of IoT as input, and predicts the target graph which is ideally the optimal status of the network with modified connections (i.e., edges) and devices (i.e., nodes) state that helps to limit malware propagation and maintain network throughput. Epidemic controlling can also be considered as a multiattributed graph translation problem, which is to estimate how the initial disease contact network (i.e., multiattributed edges) and the human health stage (i.e., multiattribute nodes) are jointly changed after the specific interventions. Since multiattributed graph translation problem is highly sophisticated, there is no generic framework yet, but only adhoc methods for few specific domains, which heavily rely on intensive handcrafting and domainspecific mechanistic models that could be extremely time and resource consuming to run in large scale. Hence, a generic, efficient, and endtoend framework for general multiattributed graph translation problems is highly in demand. Such framework needs to be able to comprehensively learn the translation mapping, remedy human bias by enjoying the large historical data, and achieve efficient prediction.
In this paper, we focus on the generic problem of multiattributed graph translation, which cannot be handled by the existing methods because of the following challenges: 1) Translation of node and edge attributes are mutually dependent. The translation of edge attributes should not only consider edges, but also the node attributes. For example, in Fig. 1, two links are cut down since their linked Device 1 is compromised, which exemplifies the interplay between nodes and edges. Similarly, node translation also needs to jointly consider both nodes and edges, e.g., Device 4 is infected due to its link to Device 1. All the above issues need to be jointly considered but no existing works can handle. 2) Asynchronous and iterative changes of node and edge attributes during graph translation. The multiattributed graph translation process may involve a series of iterative changes in both edge and node attributes. For example in Fig.1, the translation could take several steps since the malware propagation is an iterative process from one device to the others. The links to a device may be cut (i.e., edge changes) right after it is compromised (i.e, node attribute change). These orders and dependencies of how node and edge attributes change during the translation are very important, yet difficult to be learned. 3) Difficulty in discovering and enforcing the correct consistency between node attributes and graph spectra. Although the predicted node and edge attributes are two different outputs, they should be highly dependent on each other instead of being irrelevant. For example, as shown in Fig. 1, the reason why Devices 2 and 3 on the right graph are not compromised is that they have no links with the compromised Device 1 anymore. It is highly challenging to learn and maintain the consistency of node and edge attributes, which are very sophisticated and domainspecific patterns.
To the best of our knowledge, this is the first work that addresses all the above challenges and provides a generic framework for the multiattributed graph translation problem. This paper propose an NodeEdge Coevolving Deep Graph Translator (NECDGT) with novel architecture and components for joint node and edge translation. Multiblock network with novel interactive node and edge translation paths are developed to translate both node and edge attributes, while skipconnection is utilized among different blocks to allow the nonsynchronicity of changes in node and edge attributes. A novel spectral graph regularization is designed to ensure the consistency of nodes and edges in generated graphs. The contributions of this work are summarized as follows:

The development of a new framework for multiattributed graph translation. We formulate, for the first time, a multiattributed graph translation problem and propose the NECDGT to tackle this problem. The proposed framework is generic for different applications where both node and edge attributes can change after translation.

The proposal of novel and generic edge translation layers and blocks. A new edge translation path is proposed to translate the edge attributes from the input domain to the output domain. Existing edge translation methods were proven to be special cases of ours, which can handle broad multiattribute edges and nodes.

The proposal of a spectralbased regularization that ensures consistency of the predicted nodes and edges. In order to discover and maintain the inherent relationships between predicted nodes and edges, a new nonparametric graph Laplacian regularization with a graph frequency regularization is proposed and leveraged.

The conduct of extensive experiments to validate the effectiveness and efficiency of the proposed model. Extensive experiments on four synthetic and four realworld datasets demonstrated that NECDGT is capable of generating graphs close to groundtruth target graphs and significantly outperforms other generative models.
Ii Related Works
Graph neural networks learning
. In recent years, there has been a surge in research focusing on graph neural networks, which are generally divided into two categories: Graph Recurrent Networks
[10, 30, 20] and Graph Convolutional Networks [24, 23, 7, 16, 25, 6, 17, 34]. Graph Recurrent Networks originates from the early works of graph neural networks proposed by Gori et al. [10] and Scarselli et al. [30]based on recursive neural networks. Another line of research is to generalize convolutional neural networks from grids (e.g., images) to generic graphs. Bruna et al.
[5] first introduced the spectral graph convolutional neural networks, and then it was extended by Defferrard et al. [7] using fast localized convolutions, which is further approximated for an efficient architecture for a semisupervised setting [17].Graph generation. Most of the existing GNN based graph generation for general graphs have been proposed in the last two years and are based on VAE [31, 27] and generative adversarial nets (GANs) [4], among others [21, 38]. Most of these approaches generate nodes and edges sequentially to form a whole graph, leading to the issues of being sensitive to the generation order and very timeconsuming for large graphs. Differently, GraphRNN [38] builds an autoregressive generative model on these sequences with LSTM model and has demonstrated its good scalability.
Graph structured data translation. The existing Graph structured data translation either deal with the node attributes prediction or translate the graph topology. Node attributes prediction aims at predicting the node attributes given the fixed graph topology [3, 19, 39, 9]. Li et al. [19]
propose a Diffusion Convolution Recurrent Neural Network (DCRNN) for traffic forecasting which incorporates both spatial and temporal dependency in the traffic flow. Yu et al.
[39] formulated the node attributes prediction problem of graphs based on the complete convolution structures. Graph topology translation considers the change of graph topology from one domain distributions to another. Guo et al. [11] proposed and tackled graph topology translation problem by proposing a generative model consisting of a graph translator with graph convolution and deconvolution layers and a new conditional graph discriminator. Sun et al. [33] proposed a graphRNN based model which generates a graph’s topology based on another graph.Iii Problem Formulation
This paper focuses on predicting a target multiattributed graph based on an input multiattributed graph by learning the graph translation mapping between them. The following provides the notations and mathematical problem formulation.
Notations  Descriptions 

Input graph with node set , edge set , edge attributes tensor and node attributes matrix 

Target graph with node set , edge set , edge attributes tensor and node attributes matrix  
Contextual information vector 

Number of nodes  
Number of edges  
Dimension of node attributes  
Dimension of edge attributes  
Dimension of contextual information vector  
Number of translation blocks 
Define an input graph as where is the set of nodes, and is the set of edges. is an edge connecting nodes and . contains all pairs of nodes while the existence of is reflected by its attributes. is the edge attributes tensor, where denotes the edge attributes of edge and is the dimension of edge attributes. refers to the node attribute matrix, where is the node attributes of node and is the dimension of the node attributes. Similarly, we define the target graph as . Note that the target and input graphs are different both in their node attributes as well as edge attributes. Moreover, vector provides some contextual information on the translation process. Therefore, multiattributed graph translation is defined as learning a mapping: .
For example, considering the malware confinement case where the nodes refer to IoT devices and the edges reflect the communication links between two devices. The node attributes include the malwareinfection status and the properties of that device (i.e., specification and antivirus software features). A single IoT device (i.e., node) that is compromised has the potential to spread malware infection across the network, eventually compromising the network or even ceasing the network functionality. In contrast, in order to avoid malware spreading as well as maintain the performance of the network, the network connectivity (i.e., graph topology) should be modified through malware confinement, thus to change the device status (i.e., node attributes) accordingly. Hence, malware confinement can be considered as predicting the optimal topology as well as the corresponding node and edge attributes of the target graph, where both malware prevention and device performance are maximized.
Multiattributed graph translation problem requires to highlight several unique considerations as depicted in Fig.2: 1) Edgestoedges interaction: In target domain, the edge attributes of an edge can be influenced by its incident edges’ attributes and in input domain. For example, in Fig. 2 (a), if Devices 1 and 3 must be prevented from infection, then the edges between the compromised Device 1 and Device 2 need to be cut, due to the paths among them in input domain. 2) Nodestoedges interaction: In target domain, the attributes of edge can be influenced by its incident nodes’ attributes and in the input domain. As shown in Fig. 2 (b), if Device 2 is compromised in input domain, then in target domain, only its connections to Devices 1 and 3 need to be removed but the connection between Devices 1 and 3 can be retained because they are not compromised. 3) Nodestonodes interaction: For a given node , its attribute in input domain may directly influence its attribute in target domain. As shown in Fig. 2 (c), Device 3 with effective antivirus protection (e.g. firewall) may not be easily compromised in target domain. 4) Edgestonodes interaction: For a given node , its related edge attributes in input domain may affect its attributes in target domain. As shown in Fig. 2 (d), Device 1 which has more connections with compromised devices in input domain is more likely to be infected in target domain. 5) Spectral Graph Property: There exist relationships between nodes and edges in one graph as reflected by the graph spectrum. These relationships are claimed to have some persistent or consistent patterns across input and target domains, which have also been verified in many realworld applications such as brain networks [1]. For example, as shown in Fig. 2 (e), the devices that are densely connected as a subcommunity tend to be in the same node status, which is a shared pattern for relationships between nodes and edges in different domains.
Multiattributed graph translation should consider all the above properties, which cannot be comprehensively handled by existing methods because: 1) Lack of a generic framework to simultaneously characterize and automatically infer all of the above nodeedge interactions during translation process. 2) Difficulty in automatically discovering and characterizing the inherent spectral relationship between the nodes and edges in each graph, and ensuring consistent spectral patterns in graphs across input and target domains. 3) All the above interactions could be imposed repeatedly, alternately, and asynchronously during the translation process. It is difficult to discover and characterize such important yet sophisticated process.
Iv The Proposed Method: NECDGT
In this section, we propose the NodeEdge Coevolving Deep Graph Translator (NECDGT) to model the multiattributed graph translation process. First, an introduction of the overall architecture and the loss functions is given. Then, the elaborations of three modules on edge translation, node translation, and graph spectral regularization are presented.
Iva Overall architecture
Multiblock asynchronous translation architecture. The proposed NECDGT learns the distribution of graphs in the target domain conditioning on the input graphs and contextual information. However, such a translation process from input graph to the final target graph may experience a series of interactions of different types among edges and nodes. Also, such a sophisticated process is hidden and needs to be learned by a sufficiently flexible and powerful model. To address this, we propose the NECDGT as shown in Fig. 3. Specifically, the node and edge attributes of input graphs are inputted into the model and the model output the generated target graphs’ node attributes and edge attributes after several blocks. The skipconnection architecture (black dotted lines in Fig. 3) implemented across different blocks aims to deal with the asynchrony property of different blocks, which ensures that the final translated results fully utilize various combinations of blocks’ information. To train the deep neural network to generate the target graph conditioning on the input graph and contextual information , we minimize the loss function as follows:
(1) 
where the nodes set and as well as edges set and can be reflected in and , as well as and .
Node and edge translation paths. To jointly tackle various interactions among nodes and edges, respective translation paths are proposed for each block. In node translation path (in upper part of detailed structure in Fig. 3), node attributes are generated considering the ”nodestonodes” and ”edgestonodes” interactions. In edge translation path (in lower part of detailed structure in Fig. 3), edge attributes are generated following the ”edgestoedges” and ”nodetoedges” interactions.
Spectral graph regularization
. To discover and characterize the inherent relationship between nodes and edges of each graph, the frequency domain properties of the graph is learned, based on which the interactions between node and edge attributes are jointly regularized upon nonparametric graph Laplacian. Moreover, to maintain consistent spectral properties throughout the translation process, we enforce the shared patterns among the generated nodes and edges in different blocks by regularizing their relevant parameters in the frequency domain. The regularization of the graphs is formalized as follows:
(2) 
where refers to the number of blocks, and refers to the overall parameters in the spectral graph regularization. and refer to the generated edge attributes tensor and node attributes matrix in the th block. Thus is the generated target graph. Then the final loss function can be summarized as follows:
(3) 
where is the tradeoff between the and spectral graph regularization. The model is trained by minimizing the mean squared error of with , and with
, enforced by the regularization. Optimization methods (e.g. Stochastic gradient descent (SGD) and Adam) based on Backpropagation technique can be utilized to optimize the whole model.
IvB Edge Translation Path
Edge translation path aims to model the nodestoedges and edgestoedges interactions, where edge attributes in the target domain can be influenced by both nodes and edges in the input domain. Therefore, we propose to first jointly embed both node and edge information into influence vectors and then decode it to generate edges attributes. Specifically, the edge translation path of each block contains two functions, influenceonedge function which encodes each pair of edge and node attributes into the influence for generating edges, and the edge updating function which aggregates all the influences related to each edge into an integrated influence and decodes this integrated influence to generate each edge’ attributes. Fig. 4 shows the operation of the two functions in a single block by translating the current input of graph to output graph .
Influenceonedge layers. As shown in Fig. 4, the input graph is first organized in unit of several pairs of node and edge attributes. For each pair of nodes and , we concatenate their edge attributes and their node attributes: and as: (as circled in black rectangles in Fig. 4). Then
is inputted into the influenceonedge function: a constrained MLP (Multilayer Perceptron)
which is used to calculate the influence from the pair of the nodes and . refers to the dimension of the final influence on edges. for edge translation path is expressed as follows:(4) 
where and are weights and bias for in edge translation path. refers to the number of layers of and
refers to the activation functions. For undirected graph, we add a weight constraint
to ensure that the influence of is the same as the influence of , which means that the first rows (related to the attributes of node ) and the last rows (related to the attributes of node ) of are shared. The influence on edges of each pair is computed through the same function with the same weights. Thus the NECDGT can handle various size of graphs.Edge updating layers. After calculating the influence of each pair of nodes and edge, the next step is to assigning each pairs’ influences to its related edge to get the integrated influence for each edge (as shown of operation in Fig.4). This is because each edge is generated depending on both its two related nodes and its incident edges (like the pairs circled in the orange rectangle and purple rectangle related to node and node respectively in Fig.4). Here we define the integrated influence on one edge attribute as: , which is computed as follows:
(5) 
where refers to the neighbor nodes of node . Then the edge attributes is generated by , where refers to the input edge attributes of edge . refers to the contextual information for the translation. The function is implemented by an MLP.
Relationship with other edge convolution networks. Edge convolution network is the most typical method to handle the edge embedding in graphs, which was first introduced as BrainNetCNN [16] and later explored in many studies [11, 18, 32]. Our edge translation path is a highly flexible and generic mechanism to handle multiattributed nodes and edges. Several existing edge convolution layers and their variants can be considered as special cases of our method, as demonstrated in the following theorem^{2}^{2}2The proof process is available athttps://github.com/xguo7/NECDGT:
Theorem 1.
The influenceonedge function in edge translation path of NECDGT is a generalization of conventional edge convolution networks.
IvC Node Translation Path
Node translation aims to learn the “nodestonodes” and “edgestonodes” interactions, where translation of one node’s attributes depends on the edge attributes related to this node and its own attributes. The node translation path of each block contains two functions, influenceonnode function which learns the influence from each pair of nodes, and node updating function which generates the new node attributes by aggregating all the influences from pairs containing this node. Fig. 5 shows how to translate a node in a single block.
Influenceonnode layers. As shown in Fig. 5, the input graph is first organized in the unit of pairs of nodes, where each pair is which is similar to the edge translation path (as circled in the black rectangle in Fig. 5). Then is inputted into the influenceonnode function, which is implemented by contrained MLP as Equation (4), to compute the influence to nodes (as shown in the grey bar after in Fig. 5), where is the dimension of the influence on nodes.
Node updating layers. After computing the influences of each node pair, the next step is to generate node attributes. For node , an assignment step is required to aggregate all the influences from pairs containing node (as shown of operation in Fig. 5). Thus, all the influences for node are aggregated and input into the updating function, which is implemented by a MLP model to calculate the attributes of node as: .
IvD Graph spectralbased regularization
Based on the edge and node translation path introduced above, we can generate node and edge attributes, respectively. However, since these generated node and edge attributes are predicted separately in different paths, their patterns may not be consistent and harmonic. To ensure the consistency of the edge and node patterns mentioned in Section III, we propose a novel adaptive regularization based on nonparametric graph Laplacian, and a graph frequency regularization.
Nonparametric Graph Laplacian Regularization. First, we recall the property of the multiattributed graphs where node information can be smoothed over the graph via some form of explicit graphbased regularization, namely, by the wellknown graph Laplacian regularization term [17]: , where is the node attribute vector for the th node attribute and is the edge attribute matrix for th attribute generated in the th block. denotes the graph Laplacian for the th edge attributes matrix. The degree matrix is computed as: .
However, the above traditional graph Laplacian can only impose an absolute smoothness regularization over all the nodes by forcing the neighbor nodes to have similar attribute values, which is often overrestrictive for many situations such as in signed networks and teleconnections. In the real world, the correlation among the nodes is much more complicated than purely ”smoothness” but should be a mixed pattern of different types of relations. To address this, we propose an endtoend framework of nonparametric graph Laplacian which can automatically learn such node correlation patterns inherent in specific types of graphs, with rigorous foundations on spectral graph theory. In essence, we propose the nonparametric graph Laplacian based on the parameter as: . is the normalized Laplacian computed as and can be diagonalized by the Fourier basis , such that where is a diagonal matrix storing the graph frequencies. For example, is the frequency value of the first Fourier basis . Then we got . Therefore, we have the regularization as follows:
where
is a nonparamteric Laplacian eigenvalues that will be introduced subsequently.
Scalable approximation. is a nonparametric vector whose parameters are all free; It can be defined as: , where the parameter is a vector of Fourier coefficients for a graph. However, optimizing the parametric eigenvalues has the learning complexity of , the dimensionality of the graphs, which is not scalable for large graphs. To reduce the learning complexity of to , we propose approximating by a normalized truncated expansion in terms of Chebyshev polynomials [13]. The Chebyshev polynomial of order p may be computed by the stable recurrence relation with and . The eigenvalues of the approximated Laplacian filter can thus be parametric as the truncated expansion:
(6) 
for orders, where is the Chebyshev polynomial of order evaluated at , a diagonal matrix of scaled eigenvalues that lie in . The refers to the largest element in . denotes the parameter tensor for all blocks. is the th element of Chebyshev coefficients vector for the th edge attribute. Each is normalized by dividing the sum of all the coefficients in to avoid the situation where is trained as zero. Thus, the laplacian computation can then be written as , where is the Chebyshev polynomial of order evaluated at the scaled Laplacian . For efficient computation, we further approximate , as we can expect that the neural network parameters will adapt to this change in scale during training.
Graph frequency regularization. To ensure that the spectral graph patterns are consistent throughout the translation process across different blocks, we utilize a graph frequency regularization to not only maintain the similarity but also allow the exclusive properties of each block’s patterns to be reserved to some degree. Specifically, regarding all the frequency pattern basis of form , some are important in modeling the relationships between nodes and graphs while some are not, resulting in the sparsity pattern of . Thus, inspired by the multitask learning, we learn the consistent sparsity pattern of by using the norm as regularization:
(7) 
IvE Complexity Analysis
The proposed NECDGT requires operations in time complexity and space complexity in terms of number of nodes in the graph. It is more scalable than most of the graph generation methods. For example, GraphVAE [31] requires operations in the worst case and Li et al [21] uses graph neural networks to perform a form of “message passing” with operations to generate a graph.
V Experiments
In this section, we present both the quantitative and qualitative experiment results on NECDGT as well as the comparison models. All experiments are conducted on a 64bit machine with Nvidia GPU (GTX 1070, 1683 MHz, 8 GB GDDR5). The model is trained by ADAM optimization algorithm^{3}^{3}3The code of the model and additional experiment results are available at:https://github.com/xguo7/NECDGT.
Va Experimental Setup
VA1 Datasets
We performed experiments on four synthetic datasets and four realworld datasets with different graph sizes and characteristics. All the dataset contain inputtarget pairs.
Synthetic dataset: Four datsets are generated based on different types of graphs and translation rules. The input graphs of the first three datasets (named as SynI, SynII, and SynIII) are ErdosRenyi (ER) graphs generated by the Erdos Renyi model [8]
with the edge probability of 0.2 and graph size of 20, 40, and 60 respectively. The target graph topology is the 2hop connection of the input graph, where each edge in the target graph refers to the 2hop reachability in the input graph (e.g. if node
is 2hop reachable to node in the input graph, then they are connected in the target graph). The input graphs of the fourth dataset (named as SynIV) are BarabásiAlbert (BA) graphs generated by the BarabásiAlbert model [2] with 20 nodes, where each node is connected to 1 existing node. In SynIV, topology of target graph is the 3hop connection of the input graph. For all the four datasets, the edge attributes denotes the existence of the edge. For both input and target graphs, the node attributes are continuous values computed following the polynomial function: , where is the node degree and is the node attribute. Each dataset is divided into two subsets, each of which has 250 pairs of graphs. Validation is conducted where one subset is used for training and another for testing, and then exchange them for another validation. The average result of the two validations is regarded as the final result.Malware confinement dataset: Malware dataset are used for measuring the performance of NECDGT for malware confinement prediction. There are three sets of IoT nodes at different amount (20, 40 and 60) encompassing temperature sensors connected with Intel ATLASEDGE Board and Beagle Boards (BeagleBone Blue), communicating via Bluetooth protocol. Benign and malware activities are executed on these devices to generate the initial attacked networks as the input graphs. Benign activities include MiBench [12] and SPEC2006 [14], Linux system programs, and word processor. The nodes represent devices and node attribute is a binary value referring to whether the device is compromised or not. Edge represents the connection of two devices and the edge attribute is a continuous value reflecting the distance of two devices. The real target graphs are generated by the classical malware confinement methods: stochastic controlling with malware detection [28, 26, 29]. We collected 334 pairs of input and target graphs with different contextual parameters (infection rate, recovery rate, and decay rate) for each of the three datasets. Each dataset is divided into two subsets: one has 200 pairs and another has 134 pairs. The validation is conducted in the same way as the synthetic dataset.
Molecule reaction dataset: We apply our NECDGT to one of the fundamental problems in organic chemistry, thus predicting the product (target graph) of chemical reaction given the reactant (input graph). Each molecular graph consists of atoms as nodes and bond as edges. The input molecule graph has multiple connected components since there are multiple molecules comprising the reactants. The reactions used for training are atommapped so that each atom in the product graph has a unique corresponding atom in the reactants. We used reactions from USPTO granted patents, collected by Lowe [22]. we obtained a set of 5,000 reactions (reactantproduct pair) and divided them into 2,500 and 2,500 for training and testing. Atom (node) features include its elemental identity, degree of connectivity, number of attached hydrogen atoms, implicit valence, and aromaticity. Bond (edge) features include bond type (single, double, triple, or aromatic), and whether it is connected.
VA2 Comparison methods
Since there is no existing method handling the multiattributed graph translation problem, NECDGT is compared with two categories of methods: 1) graph topology generation methods, and 2) graph node attributes prediction methods.
Graph topology generation methods: 1) GraphRNN [38] is a recent graph generation method based on sequential generation with LSTM model; 2) Graph Variational Autoencoder (GraphVAE) [31] is a VAE based graph generation method for small graphs; 3) Graph TranslationGenerative Adversarial Networks (GTGAN) [11] is a new graph topology translation method based on graph generative adversarial network.
Node attributes prediction methods: 1) Interaction Network (IN) [3] is a node state updating network considering the interaction of neighboring nodes; 2) DCRNN [19] is a node attribute prediction network for tranffic flow prediction; 3) SpatioTemporal Graph Convolutional Networks (STGCN) [39] is a node attribute prediction model for traffic speed forecast.
Furthermore, to validate the effectiveness of the graph spectralbased regularization, we conduct a comparison model (named as NRDGT) which has the same architecture of NECDGT but without the graph regularization.
VA3 Evaluation metrics
A set of metrics are used to measure the similarity between the generated and real target graphs in terms of node and edge attributes. To measure the attributes which are Boolean values, the Acc (accuracy) is utilized to evaluate the ratio of nodes or edges that are correctly predicted among all the nodes or possible node pairs. To measure the attributes which are continuous values, MSE (mean squared error), R2 (coefficient of determination score), Pearson and Spearman correlation are computed between attributes of generated and real target graphs. represents metrics evaluated on node attributes and represents metrics evaluated on edge attributes.
VB Performance
VB1 Metricbased evaluation for synthetic graphs
For synthetic datasets, we compare the generated and real target graphs on various metrics and visualize the patterns captured in the generated graphs. Table II summarizes the effectiveness comparison for four synthetic datasets. The node attributes are continuous values evaluated by NMSE, NR2, NP, and NSP. The edge attributes are binary values evaluated by the accuracy of the correctly predicted edges. The results in Table II demonstrate that the proposed NECDGT outperforms other methods in both node and edge attributes prediction and is the only method to handle both. Specifically, in terms of node attributes, the proposed NECDGT get smaller NMSE value than all the node attributes prediction methods by 85%, 71%, 95% and 95% on average for four dataset respectively. Also, NECDGT outperforms the other methods by 46%, 36%, 44% and 58% on average for four dataset respectively on NR2, NP, and NSP. This is because all the node prediction methods only consider a fixed graph topology while NECDGT allows the edges to vary. In terms of edges, the proposed NECDGT get the highest EACC than all the other graph generation methods. It also has higher EACC than graph topology translation method: GTGAN by 7% on average since NECDGT considers both edge and node attributes in learning the translation mapping while GTGAN only considers edges. The proposed NECDGT outperforms the NRDTG by around 3% on average in terms of all metrics, which demonstrates the effectiveness of the graph spectralbased regularization.
dataset  Method  NMSE  NR2  NP  NSp  Method  EAcc 

SynI  IN  5.97  0.06  0.48  0.44  GraphRNN  0.6212 
DCRNN  51.36  0.12  0.44  0.45  GraphVAE  0.6591  
STGCN  15.44  0.19  0.42  0.56  GTGAN  0.7039  
NRDGT  2.13  0.87  0.90  0.89  NRDGT  0.7017  
NECDGT  1.98  0.76  0.93  0.91  NECDGT  0.7129  
SynII  IN  1.36  0.85  0.77  0.87  GraphRNN  0.5621 
DCRNN  71.07  0.11  0.39  0.37  GraphVAE  0.4639  
STGCN  33.11  0.21  0.15  0.15  GTGAN  0.7005  
NRDGT  1.43  0.91  0.94  0.97  NRDGT  0.7016  
NECDGT  1.91  0.93  0.97  0.97  NECDGT  0.7203  
SynIII  IN  35.46  0.31  0.59  0.56  GraphRNN  0.4528 
DCRNN  263.23  0.09  0.41  0.39  GraphVAE  0.3702  
STGCN  43.34  0.22  0.48  0.47  GTGAN  0.5770  
NRDGT  5.90  0.90  0.94  0.92  NRDGT  0.6259  
NECDGT  4.56  0.93  0.97  0.96  NECDGT  0.6588  
SynIV  IN  4.63  0.10  0.53  0.51  GraphRNN  0.5172 
DCRNN  63.03  0.12  0.22  0.16  GraphVAE  0.3001  
STGCN  6.52  0.08  0.11  0.10  GTGAN  0.8052  
NRDGT  4.49  0.12  0.55  0.54  NRDGT  0.6704  
NECDGT  1.86  0.73  0.93  0.89  NECDGT  0.8437 
VB2 Evaluation of the learned translation mapping for synthetic graphs
To evaluate whether the inherent relationship between node and edge (reflected by node degree) attributes is learned and maintained by NECDGT, we draw the distributions of the node attribute versus node degree of each node in the generated graphs to visualize their relationship. For comparison, a groundtruth correlation is drawn according to the predefined rule of generating the dataset, namely, each node’s degree and attribute follows the function . Fig. 6 shows four example distributions of nodes in terms of node attributes and degree with the black line as groundtruth. As shown in Fig. 6, the nodes are located closely on the groundtruth, especially for the synI and synIV, where around 85% nodes are correctly located. This is largely because the proposed graph spectralbased regularization successfully discovers the patterns: the densely connected nodes all tend to have large node attributes and in reverse.
VB3 Metricbased Evaluation for malware datasets
Table III shows the evaluation of NECDGT by comparing the generated and real target graphs. For malware graphs, the node attributes are evaluated by NACC by calculating the percentage of nodes whose attributes are correctly predicted in all nodes. The edge attributes are continuous value evaluated by EMSE, ER2 and EP. We also use EAcc to evaluate the correct existence of edges among all pairs of nodes. The results in Table III demonstrates that NECDGT performs the best for all the three datasets. In terms of EAcc, the graph generation methods (GraphRNN and GraphVAE) cannot handle the graph translation work and got low EAcc of around 0.6 at MalI,MalII, and 0.8 at MalIII. GTGAN achieves high EACC, but its EMSE is about 2 folds larger than that of the proposed NECDGT on average. NECDGT successfully handle the translation tasks with high EAcc above 0.9, and the smallest EMSE. In terms of NAcc, NECDGT outperforms other methods by around 5% on the first two datasets. In summary, the proposed NECDGT can not only jointly predict the node and edges attributes, but also performs the best in most of metrics. The superiority of NECDGT over the NRDGT in terms of EMSE demonstrates that the graph spectralbased regularization indeed improve modeling translation mapping.
MalwareI  

Method  EAcc  EMSE  ER2  EP  Method  NAcc 
GraphRNN  0.6107  1831.43  0.52  0.00  IN  0.8786 
GraphVAE  0.5064  2453.61  0.00  0.04  DCRNN  0.8786 
GTGAN  0.6300  1718.02  0.42  0.11  STGCN  0.9232 
NRDGT  0.9107  668.57  0.82  0.91  NRDGT  0.9108 
NECDGT  0.9218  239.79  0.78  0.91  NECDGT  0.9295 
MalwareII  
Method  EAcc  EMSE  ER2  EP  Method  NAcc 
GraphRNN  0.7054  1950.46  0.44  0.29  IN  0.8828 
GraphVAE  0.6060  2410.57  0.73  0.16  DCRNN  0.8790 
GTGAN  0.9033  462.73  0.13  0.81  STGCN  0.9330 
NRDGT  0.9117  448.48  0.68  0.83  NRDGT  0.8853 
NECDGT  0.9380  244.40  0.81  0.91  NECDGT  0.9340 
MalwareIII  
Method  EAcc  EMSE  ER2  EP  Method  NAcc 
GraphRNN  0.8397  1775.58  0.16  0.23  IN  0.8738 
GraphVAE  0.8119  2109.64  0.39  0.32  DCRNN  0.8738 
GTGAN  0.9453  550.30  0.63  0.80  STGCN  0.9375 
NRDGT  0.9543  341.10  0.76  0.88  NRDGT  0.8773 
NECDGT  0.9604  273.67  0.81  0.90  NECDGT  0.9002 
VB4 Case study for malware dataset
Fig. 7 investigates three cases of input, real target and generated target graph by NECDGT. The green nodes refer to the uncompromised devices while the red nodes refer to the compromised devices. The width of each edge reflects the distance between two devices. In the first case, both in generated and real target graphs, Devices 4 and 6 are restored to normal, while Device 19 get attacked and is isolated from the other devices. It validates that our NECDGT successfully finds the rules of translating nodes and performs like the true confinement process. In the second case, Device 8 propagates the malware to Device 38, which is also modeled by NECDGT in generated graphs. In addition, the NECDGT not only correctly predicts the nodes attributes, but also discovers the change in edge attributes, e.g. in the third case, most of the connections of compromised Device 10 were cut both in generated and real target graphs.
VB5 Metricbased Evaluation for Molecule Reaction datasets
In this task, the NECDGT is compared to the WeisfeilerLehman Difference Network (WLDN) [15], which is a graph learning model specially for reaction prediction. Table IV shows the performance of our NECDGT on the reaction dataset on five metrics, which are the same with the synthetic datasets. The proposed NECDGT outperforms both the translation model GTGAN and the WLDN by 5% on average. Though the atoms do not change during reaction, we evaluate the capacity of our NECDGT to copy the input node features. As shown in Table IV, The NECDGT get the smallest NMSE and get higher NR2 than other comparison methods by around 18%. This shows that our NECDGT can deal with a wide range of realworld applications, whether the edges and nodes need change or keep stable.
Method  NMSE  NR2  NP  NSp  Method  EAcc 

IN  0.0805  0.46  0.13  0.12  GTGAN  0.8687 
STGCN  0.0006  0.98  0.99  0.97  WLDN  0.9667 
NRDGT  0.0008  0.97  0.99  0.99  NRDGT  0.9918 
NECDGT  0.0004  0.99  0.99  0.99  NECDGT  0.9925 
Vi Conclusion and Future Work
This paper focuses on a new problem: multiattributed graph translation. To achieve this, we propose a novel NECDGT consisting of several blocks which translates a multiattributed input graph to a target graph. To jointly tackle the different types of interactions among nodes and edges, node and edge translation paths are proposed in each block and the graph spectralbased regularization is proposed to preserve the consistent spectral property of graphs. Extensive experiments have been conducted on the synthetic and realworld datasets. Experiment results show that our NECDGT can discover the groundtruth translation rules and significantly outperform comparison methods in terms effectiveness. This paper provides a further step of research for graph translation problems in more general scenarios.
Acknowledgement
This work was supported by the National Science Foundation grant: #1755850, #1841520, #1907805, Jeffress Trust Award, and NVIDIA GPU Grant.
References
 [1] (2018) Functional brain connectivity is predictable from anatomic network’s laplacian eigenstructure. NeuroImage 172, pp. 728–739. Cited by: §I, §III.
 [2] (1999) Emergence of scaling in random networks. science 286 (5439), pp. 509–512. Cited by: §VA1.
 [3] (2016) Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510. Cited by: §I, §II, §VA2.
 [4] (2018) Netgan: generating graphs via random walks. arXiv preprint arXiv:1803.00816. Cited by: §II.
 [5] (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §II.

[6]
(2016)
Deep neural networks for learning graph representations.
In
Thirtieth AAAI Conference on Artificial Intelligence
, Cited by: §II.  [7] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §II.
 [8] (1960) On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 (1), pp. 17–60. Cited by: §VA1.
 [9] (2018) Local event forecasting and synthesis using unpaired deep graph translations. In Proceedings of the 2nd ACM SIGSPATIAL Workshop on Analytics for Local Events and News, pp. 5. Cited by: §II.
 [10] (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §II.
 [11] (2018) Deep graph translation. arXiv preprint arXiv:1805.09980. Cited by: §I, §II, §IVB, §VA2.
 [12] (2001) MiBench: a free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC4. 2001 IEEE International Workshop on, pp. 3–14. Cited by: §VA1.
 [13] (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129–150. Cited by: §IVD.
 [14] (2006) SPEC cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34 (4), pp. 1–17. Cited by: §VA1.
 [15] (2017) Predicting organic reaction outcomes with weisfeilerlehman network. In NeurIPS, pp. 2607–2616. Cited by: §VB5.
 [16] (2017) BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage 146, pp. 1038–1049. Cited by: §II, §IVB.
 [17] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II, §IVD.
 [18] (2018) Modeling brain networks with artificial neural networks. In Graphs in Biomedical Image Analysis and Integrating Medical Imaging and NonImaging Modalities, pp. 43–53. Cited by: §IVB.
 [19] (2017) Diffusion convolutional recurrent neural network: datadriven traffic forecasting. arXiv preprint arXiv:1707.01926. Cited by: §I, §II, §VA2.
 [20] (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §II.
 [21] (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §II, §IVE.
 [22] (2014) Patent reaction extraction: downloads. Cited by: §VA1.
 [23] (2017) Hierarchical graph embedding in vector space by graph pyramid. Pattern Recognition 61, pp. 245–254. Cited by: §II.
 [24] (2016) Learning convolutional neural networks for graphs. In ICML, pp. 2014–2023. Cited by: §II.
 [25] (2017) Kernel graph convolutional neural networks. arXiv preprint arXiv:1710.10689. Cited by: §II.
 [26] (2019) Lightweight nodelevel malware detection and networklevel malware confinement in IoT networks. In ACM/EDAA/IEEE Design Automation and Test in Europe (DATE), Cited by: §VA1.

[27]
(2018)
Designing random graph models using variational autoencoders with applications to chemical design
. arXiv preprint arXiv:1802.05283. Cited by: §II.  [28] (2018) Ensemble learning for hardwarebased malware detection: a comprehensive analysis and classification. In ACM/EDAA/IEEE Design Automation Conference, Cited by: §I, §VA1.

[29]
(2019)
2SMaRT: a twostage machine learningbased approach for runtime specialized hardwareassisted malware detection
. In ACM/EDAA/IEEE Design Automation and Test in Europe (DATE), Cited by: §VA1.  [30] (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §II.
 [31] (2018) Graphvae: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §II, §IVE, §VA2.
 [32] (2018) A domain guided cnn architecture for predicting age from structural brain images. arXiv preprint arXiv:1808.04362. Cited by: §IVB.
 [33] (2019) Graph to graph: a topology aware approach for graph structures learning and generation. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2946–2955. Cited by: §I, §II.
 [34] (2019) Scalable global alignment graph kernel using random features: from node embedding to graph embedding. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1418–1428. Cited by: §II.
 [35] (2018) Graph2seq: graph to sequence learning with attentionbased neural networks. arXiv preprint arXiv:1804.00823. Cited by: §I.
 [36] (2018) Exploiting rich syntactic information for semantic parsing with graphtosequence model. arXiv preprint arXiv:1808.07624. Cited by: §I.

[37]
(2018)
SQLtotext generation with graphtosequence model
. arXiv preprint arXiv:1809.05255. Cited by: §I.  [38] (2018) Graphrnn: generating realistic graphs with deep autoregressive models. arXiv preprint arXiv:1802.08773. Cited by: §II, §VA2.
 [39] (2017) Spatiotemporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. Cited by: §I, §II, §VA2.
Comments
There are no comments yet.