Graph neural networks (GNNs) have shown state-of-the-art performance across a number of tasks with graph-structured data, such as social networks, molecule networks, and webpage graphs Kipf and Welling (2016); Hamilton et al. (2017); Ying et al. (2018); Xu et al. (2019); Duvenaud et al. (2015). GNNs use a recursive neighborhood aggregation scheme — in a GNN layer, each node aggregates its neighbors’ activations from the previous GNN layer and uses the aggregated value to update its own activations. The activations of the final GNN layer are used for prediction tasks, such as node classification, graph classification, or link prediction.
Due to the clustering nature of real-world graphs, different nodes in a graph may share a number of common neighbors. For example, in webpage graphs, different websites under the same domain generally have a number of common links (i.e., neighbors). As another example, in recommender systems, users in the same group may have interests in common items.
However, existing GNN representations do not capture these common neighbors in real-world graphs, leading to redundant and unnecessary computation in both GNN training and inference. In particular, existing GNN representations define computation in each GNN layer with a GNN computation graph (referred to as a GNN-graph). For each node in the input graph, the GNN-graph includes an individual tree structure to describe how to compute ’s activations by aggregating the previous-layer activations of ’s neighbors. Figure 1b shows the GNN-graph of the input graph in Figure 1a; for example, for node , its neighbor’s activations , and from the layer are aggregated to compute new activations for the layer (see the top portion of Figure 1b). The new activations of the other nodes are computed similarly using the previous activations of their neighbors. Notice that this representation results in redundant computation and data transfers. In this small example, both and are aggregated twice. In wider and mlulti-layer GNNs, the redundancies in existing GNN representations account for a significant fraction of all computation.
In this paper, we propose a new GNN representation called Hierarchically Aggregated computation Graphs (HAGs). Figure 1c shows one possible HAG for the input graph in Figure 1a. HAGs are functionally equivalent to standard GNN-graphs (produce the same output), but represent common neighbors across different nodes using aggregation hierarchies, which eliminates redundant computation and unnecessary data transfers in both GNN training and inference. In addition, a HAG is agnostic to any particular GNN model, and can be used to eliminate redundancy for arbitrary GNNs.
For a GNN-graph, there exist numerous equivalent HAGs with different aggregation hierarchies and runtime performance. Finding HAGs with optimized performance is challenging since the number of possible HAGs is exponential in the input graph size. We introduce an accurate cost function to quantitatively estimate the performance of different HAGs and develop a novel HAG search algorithm to automatically find optimized HAGs.
Theoretically, we prove that the search algorithm can find HAGs with strong performance guarantees: (1) for GNNs whose neighborhood aggregations require a specific ordering on a node’s neighbors, the algorithm can find a globally optimal HAG under the cost function; and (2) for other GNNs, the algorithm can find HAGs whose runtime performance is at least a approximation () of globally optimal HAGs using the submodularity property Mossel and Roch (2007). Empirically, the algorithm finds highly optimized HAGs for real-world graphs, reducing the number of aggregations by up to 6.3.
Our HAG abstraction maintains the predictive performance of GNNs but leads to much faster training and inference. We evaluate the performance of HAGs on five real-world datasets and along three dimensions: (a) end-to-end training and inference performance; (b) number of aggregations; and (c) size of data transfers. Experiments show that HAGs increase the end-to-end training and inference performance by up to 2.8 and 2.9, respectively. In addition, compared to GNN-graphs, HAGs reduce the number of aggregations and the size of data transfers by up to 6.3 and , respectively.
To summarize, our contributions are:
We propose HAG, a new GNN graph representation to eliminate redundant computation and data transfers in GNNs.
We define a cost model to quantitatively evaluate the runtime performance of different HAGs and develop a HAG search algorithm to automatically find optimized HAGs. Theoretically, we prove that the HAG search algorithm at least finds a -approximation of globally optimal HAGs under the cost model.
We show that HAGs significantly outperform GNN-graphs by increasing GNN training and inference performance by up to XX and YY, respectively, and reducing the aggregations and data transfers in GNN-graphs by up to 6.3 and 5.6, respectively.
2 Related Work
Graph neural networks have been used to solve various real-world tasks with relational structures Kipf and Welling (2016); Hamilton et al. (2017); Ying et al. (2018); Xu et al. (2019); Duvenaud et al. (2015). FastGCN Chen et al. (2018) and SGC Wu et al. (2019) accelerate GNN training using importance sampling and removing nonlinearilities. This paper solves an orthogonal problem: how to optimize GNN efficiency while maintaining network accuracy. HAG is agnostic to any particular GNN model and provides a general approach that can be automatically applied to eliminate redundancy for arbitrary GNN models.
Join-trees are a tree decomposition technique that maps a graph into a corresponding tree structure to solve optimization problems on the graph, such as query optimization Flum et al. (2002). Although a join-tree provides a possible way to find optimal HAGs for a GNN-graph, its time complexity is exponential in the treewidth of a GNN-graph Arnborg et al. (1987), and real graphs tend to have very large treewidths. For example, Adcock et al. (2016) shows that the treewidth of real-world social networks grow linearly with the network size, making it infeasible to use join-trees to find optimal HAGs.
Computation reduction in neural networks.
Several techniques have been proposed to reduce computation in neural networks, including weights pruning Han et al. (2015) and quantization Han et al. (2016). These techniques reduce computation at the cost of modifying networks, resulting in decreased accuracy (as reported in these papers). By contrast, we propose a new GNN representation that accelerates GNN training by eliminating redundancy in GNN-graphs while maintaining the original network accuracy.
3 Hierarchically Aggregated Computation Graphs (HAGs)
|GCN Kipf and Welling (2016)|
|GraphSAGE-P Hamilton et al. (2017)|
|GraphSAGE-LSTM Hamilton et al. (2017)|
|-ary Tree-LSTM Tai et al. (2015)|
A GNN takes an input graph and node features as inputs and iteratively learns representations for individual nodes over the entire graph through a number of GNN layers. Algorithm 1 shows an abstraction for GNNs: is the learned activations of node at layer , and we initialize with input node features . At the -th layer, denotes the aggregated activations of ’s neighbors, which is combined with to compute an updated activation . The learned node activations of the final layer (i.e.,
) are used for predictions, and a GNN model generally minimizes a loss functionthat takes the final node activations as inputs (line 6).
Existing GNN models use a GNN computation graph (GNN-graph) to describe the computation in each GNN layer, as shown in Figure 1b. For each node in the input graph, the GNN-graph includes an individual tree structure to define how to compute the activations of node by aggregating the previous-layer activations of ’s neighbors (i.e., ). GNN-graphs are efficient at expressing direct neighborhood relations between nodes, but are not capable of capturing common neighbors across multiple nodes, leading to redundant computation in GNN training and inference.
3.1 HAG Definition
We propose Hierarchically Aggregated computation Graphs (HAGs) for GNNs, which eliminate redundancy in GNN-graphs by hierarchically managing and reusing intermediate aggregation results. Compared to a GNN-graph, a HAG includes a new set of aggregation nodes, each of which represents the intermediate aggregations result for a subset of nodes (i.e., aggregation on a subset of ). Similar to edges in GNN-graphs, an edge in a HAG denotes an aggregation relation — computing ’s activations requires aggregating ’s activations.
Our HAG abstraction is general and applicable to many existing GNN models. Table 1 shows how to use our abstraction to define existing GNNs, which can be further divided into two categories.
Set Aggregate. Most GNNs assume the neighbors of a node have no ordering, and the aggregations are associative and commutative operations that are invariant to the order in which the aggregations are performed. Examples include GCN with summation aggregations and GraphSAGE-P with element-wise pooling aggregations (Table 1). Note that set aggregations in GNNs are designed to be order invariant and thus can be performed in a hierarchical fashion as we do in HAGs.
Sequential Aggregate. Another class of GNNs require a specific ordering of a node’s neighbors and the aggregations are not commutative. Examples include -ary Tree-LSTM Tai et al. (2015) and the LSTM variant of GraphSAGE Hamilton et al. (2017). However, HAGs can be applied in the case of sequential aggregations as well. Rather than identifying common subsets of neighbors, we identify the common prefixes of the sequence of aggregated nodes, which can then be reused among nodes.
We shall use to denote the nodes in the input graph and use to denote the aggregation nodes added in a HAG. The standard GNN-graph representation can be considered as a special case in the HAG representation with no intermediate aggregation nodes (i.e., ). We further define two additional functions for each node:
First, is the aggregation results of node :
where denotes the in-neighbors of node in a HAG. Note that is recursively defined, and there exists a sequential ordering to evaluate for all nodes since each HAG is acyclic.
Second, we use to describe how to compute by using the input activations from the previous layer.
defines the coverage of node in a HAG. For the HAG example in Figure 1c, because , , and are used as inputs to compute .
For a set Aggregate, is an unordered set:
For a sequential Aggregate, is an ordered list:
where are the ordered in-neighbors of .
3.2 GNNs with HAGs
Existing GNNs are defined with GNN-graphs as shown in Algorithm 1. We extend the GNN abstraction in Algorithm 2 to make it also applicable to HAGs. The extension does not require any modification to a GNN model, and the only difference is how to compute neighborhood aggregations (i.e., ) in each GNN layer. In Algorithm 2, we first compute the results of intermediate aggregation nodes and save the results in (line 5-6). We then compute the neighborhood aggregations (i.e., ) for nodes in the input graph using the intermediate aggregation results .
Although Algorithm 2 includes new intermediate variables , the memory overhead for storing is negligible since is not used for back propagation and can be saved in a constant memory across all GNN layers. In the experiments, we show HAGs can increase the training throughput by at the cost of 0.1% memory overhead.
We define a GNN-graph and a HAG to be equivalent for a GNN model if (1) the GNN model outputs the same activations (i.e., ) at each GNN layer, and (2) the GNN model computes the same gradients for all trainable parameters in back propagation. We can use equivalent graphs interchangeably for both inference and training, since equivalent graphs produce the same outputs and gradients by definition. Theorem 1 provides a necessary and sufficient condition for graph equivalence. We prove the theorem in Appendix.
Equivalent graphs achieve the same model accuracy but have different runtime performance. Theorem 1 provides an efficient way to check equivalence between GNN-graphs and HAGs, and can be used as an oracle to search for optimized HAGs for any GNN-graph.
4 HAG Search Algorithm
For an arbitrary GNN model and an input GNN-graph, our goal is to find an equivalent HAG with optimized runtime performance. We define a realistic cost function to quantitatively evaluate the runtime performance of arbitrary HAGs, and introduce a HAG search algorithm that automatically finds an optimized HAG with the following theoretical guarantees:
For GNNs with sequential Aggregate, the HAG search algorithm can find globally optimal HAGs under the cost function.
For GNNs with set Aggregate, finding an optimal HAG is NP-hard by a reduction from the NP-hard maximum coverage problem (see Appendix for the proof). The search algorithm finds at least a -approximation of globally optimal HAGs based on the submodularity property Mossel and Roch (2007).
4.1 Cost Function
We introduce a realistic cost function that quantitatively evaluates the runtime performance of a HAG by measuring the computation cost to perform one epoch GNN training on the HAG.
The computation cost of a GNN model includes aggregating the neighbors of each node by calling Aggregate and updating the activations of each node via Update, as shown in Algorithm 2. For a GNN model , we assume the cost of performing Aggregate on two elements is , and the cost of computing an Update is . In Algorithm 2, computing with neighbors requires performing binary aggregations, whose cost is . Therefore, the total computation cost of training a GNN model on a HAG is
Since is determined by the input graph, our goal is to minimize as much as possible.
4.2 Search Algorithm
We present a HAG search algorithm that finds a globally optimal HAG for GNNs with sequential Aggregate and a -approximation of globally optimal HAGs for GNNs with set Aggregate. In addition to an input GNN-graph and a GNN model, the algorithm also takes a hyper-parameter capacity, defining an upper limit on the number of intermediate aggregation nodes (i.e., ).
Algorithm 3 shows the pseudocode of the HAG search algorithm. We start with an input GNN-graph, and iteratively insert aggregation nodes into the current HAG to merge highly redundant aggregations and remove unnecessary computation and data transfers.
In each iteration, we find a binary aggregation with the highest redundancy and insert a new aggregation node in to represent the binary aggregation results (line 12-15). All nodes containing this binary aggregation can directly use the output of without recomputing the aggregation (line 16-18). The HAG search algorithm iteratively reduces the computation cost of the HAG by eliminating the most redundant binary aggregation in each iteration.
For any GNN-graph and any GNN model with a sequential Aggregate, Algorithm 3 returns an equivalent HAG with globally minimized cost as long as .
The overall time complexity of Algorithm 3 is (see Appendix for the proof).
Our HAG abstraction maintains predictive performance of GNNs but leads to much faster runtime performance. This section evaluates the runtime performance of HAGs on five real-world graph datasets. We evaluate HAGs along three dimensions: (a) end-to-end training and inference performance; (b) number of aggregations; and (c) size of data transfers.
Existing frameworks such as TensorFlowAbadi et al. (2016)
and PyTorchPyt (2017) are designed for spatial data structures (e.g., images and text), and have limited support for irregular data structures such as graphs. As a result, GNN models in existing frameworks translate graph structures to sparse adjacent matrices and use matrix operations to perform GNN training.
We implemented the following operations in TensorFlow r1.13 to support GNN training with HAGs. First, graph_to_hag automatically transforms an input GNN-graph to an equivalent HAG with optimized performance. Second, hag_aggregate takes a HAG and nodes’ activations as inputs, and computes the aggregated activations of all nodes. Finally, hag_aggregate_grad computes the gradients of hag_aggregate for back propagation.
Our implementation minimizes changes to existing GNN programs: a GNN application can directly use all HAG optimizations by only modifying a few lines of code.
5.2 Experimental Setup
|Name||# Nodes||# Edges|
|BZR Kriege and Mutzel (2012)||6,519||137,734|
|PPI Zitnik and Leskovec (2017)||56,944||1,612,348|
|REDDIT Hamilton et al. (2017)||232,965||57,307,946|
|IMDB Yanardag and Vishwanathan (2015)||19,502||197,806|
|COLLAB Yanardag and Vishwanathan (2015)||372,474||12,288,900|
Table 2 summarizes the public datasets used in our experiments. BZR is a chemical compound dataset, where each node is an atom and an edge is a chemical bond between two atoms Kriege and Mutzel (2012). PPI contains a number of protein-protein interaction graphs, each of which corresponds to a different human tissue Zitnik and Leskovec (2017). REDDIT is an online discussion forum dataset, with each node being a Reddit post and each edge being commenting relations. For both PPI and REDDIT, we directly use prepossessed data from Hamilton et al. (2017). IMDB and COLLAB are two collaboration datasets for graph classification Yanardag and Vishwanathan (2015). IMDB is a movie collaboration dataset, with each node representing an actor/actress, while COLLAB is a scientific collaboration dataset, with each node representing a researcher.
, each GNN model has two GNN layers and one SoftMax layer. For graph classification datasets, each GNN model also includes a mean-pooling layer to gather graph-level activations. For all experiments, we set the maximumcapacity of in a HAG to be , which achieves high performance on real-world graphs.
5.3 End-to-End Performance
We first measure the per-epoch training time and inference latency to run a 2-layer GCN model on different graph datasets. We follow previous work Hamilton et al. (2017); Kriege and Mutzel (2012); Yanardag and Vishwanathan (2015) to split the datasets into training/validation/testing sets, and use the testing sets to measure the inference latency.
Figure 2 compares the per-epoch training time and inference latency between GNN-graphs and HAGs. Compared to GNN-graphs, HAGs can improve the training and inference performance by up to 2.8 and 2.9, respectively, while maintaining the same network accuracy. We note this improvement is achieved completely automatically, and computing a HAG is inexpensive. Thus, because the improvement is essentially for free, we believe there is no reason not to use HAGs in preference to GNN-graphs.
5.4 Aggregation Performance
We further compare the aggregation performance of GNN-graphs and HAGs on the following two metrics: (1) the number of binary aggregations performed in each GNN layer; and (2) the size of data transfers between GPU threads to perform the aggregations. Note that aggregating a neighbor’s activations requires transferring the activations from GPU global memory to a thread’s local memory.
Figure 3 shows the comparison results. For GNNs with set aggregations, HAGs reduce the number of aggregations by 1.5-6.3 and the size of data transfers by 1.3-5.6. For GNNs with sequential aggregations, HAGs reduce aggregations and data transfers by up to 1.8 and 1.9, respectively.
Although the search algorithm finds a globally optimal HAG for sequential aggregations (Theorem 2) and a -approximation of globally optimal HAGs for set aggregations (Theorem 3), we observe the performance improvement is more significant for set aggregations. Optimality for HAGs with set aggregation involves more potential redundancy compared to sequential aggregations, due to permutation invariance of set aggregation. Thus higher performance can be achieved with HAGs for set aggregations, though optimal solutions are more difficult to compute.
It is also worth noting that the HAG search algorithm can find highly optimized HAGs even on very sparse graphs. For example, on the COLLAB dataset with a graph density of 0.01%, our algorithm reduces the number of aggregations and data transfers by 3.3 and 2.2, respectively.
We study how different values of capacity affect the runtime performance of the generated HAGs. Recall that capacity is an upper bound on the number of aggregation nodes in a HAG. In our HAG search algorithm, a larger value of capacity allows the algorithm to eliminate more redundant aggregations and therefore achieves lower cost.
Figure 4 shows that a larger value of capacity can consistently improve the end-to-end training performance, which indicates that the cost function is an appropriate metric to evaluate and compare the performance of different HAGs.
By gradually increasing the capacity, the search algorithm eventually finds a HAG with 150K aggregation nodes, which consume 6MB of memory (0.1% memory overhead) while improving the training performance by 2.8.
We have introduced HAG, a new GNN graph representation to eliminate redundant computation and data transfers in GNNs. We propose a cost function to quantitatively evaluate the runtime performance of different HAGs and use a HAG search algorithm to find optimized HAGs. Our experiments show that HAGs significantly outperform existing GNN-graphs by improving the end-to-end training performance and reducing the aggregations and data transfers in GNN training.
- Kipf and Welling  Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Hamilton et al.  Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30. PMLR, 2017.
- Ying et al.  Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, 2018.
- Xu et al.  Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019.
- Duvenaud et al.  David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, 2015.
Mossel and Roch 
Elchanan Mossel and Sebastien Roch.
On the submodularity of influence in social networks.
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 128–134. ACM, 2007.
- Chen et al.  Jie Chen, Tengfei Ma, and Cao Xiao. Fastgcn: Fast learning with graph convolutional networks via importance sampling. CoRR, abs/1801.10247, 2018. URL http://arxiv.org/abs/1801.10247.
- Wu et al.  Felix Wu, Tianyi Zhang, Amauri H. Souza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. Simplifying graph convolutional networks. CoRR, abs/1902.07153, 2019. URL http://arxiv.org/abs/1902.07153.
- Flum et al.  Jörg Flum, Markus Frick, and Martin Grohe. Query evaluation via tree-decompositions. J. ACM, 2002.
- Arnborg et al.  Stefan Arnborg, Derek G. Corneil, and Andrzej Proskurowski. Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods, 1987.
- Adcock et al.  Aaron B Adcock, Blair D Sullivan, and Michael W Mahoney. Tree decompositions and social graphs. Internet Mathematics, 12(5), 2016.
- Han et al.  Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS, 2015.
- Han et al.  Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, 2016.
- Tai et al.  Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075, 2015.
Abadi et al. 
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey
Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard,
Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray,
Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan
Yu, and Xiaoqiang Zheng.
Tensorflow: A system for large-scale machine learning.In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI, 2016.
- Pyt  Tensors and Dynamic neural networks in Python with strong GPU acceleration. https://pytorch.org, 2017.
- Kriege and Mutzel  Nils Kriege and Petra Mutzel. Subgraph matching kernels for attributed graphs. arXiv preprint arXiv:1206.6483, 2012.
- Zitnik and Leskovec  Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks. CoRR, abs/1707.04638, 2017.
- Yanardag and Vishwanathan  Pinar Yanardag and S.V.N. Vishwanathan. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, 2015.
- Cormen et al.  Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
Appendix A Proof of Theorem 1
It is sufficient to prove that if for all , then the GNN-graph and the HAG generate the same outputs (i.e., ) for every GNN layer.
We prove this by induction. Assume a GNN-graph and a HAG generate the same outputs for the (-1)-th layer, we prove the two graphs produce the same outputs for the -th GNN layer.
In Algorithm 2, is the aggregation results of node , which is defined as
This proves that Algorithm 1 and Algorithm 2 compute the same . In addition, both algorithms use the same Update function that takes and as inputs and computes , which applies that the two algorithms compute the same . ∎
Appendix B Proof of Theorem 2
Sequential aggregations require a specific ordering of a node’s neighbors. Let denote the ordered list of node ’s neighbors and denote a list of the first elements in :
where is the -th neighbor of node .
represents a necessary intermediate aggregation step for computing (since sequential aggregations are not commutative), and therefore any HAG must compute as an intermediate aggregation. Counting the number of distinct (where and ) provides a lower bound on the number of aggregations any equivalent HAG must perform. Assuming is a globally optimal HAG under the cost model, we have:
where lb is the number of distinct that must be computed by any equivalent HAG.
Assuming is the output HAG of Algorithm 3, we prove that by using contradiction. In the case , must perform more than aggregations.
Case 1. One possible case is that computes at least one aggregation that is not a prefix of any , indicating that performs some useless aggregations, which contradicts with the fact that all intermediate aggregations added to must be used at least once.
Case 2. The other possible case is that computes the aggregation of some multiple times. However, in Algorithm 3, each iteration reduces the number of aggregations by at least 1, and there are aggregations initially. This implies there cannot be redundant aggregations after iterations, which contradicts with the precondition of Case 2. ∎
Appendix C Proof of Theorem 3
The idea of the proof is to build a monotone submodular function Cormen et al.  based on the cost model.
For any GNN-graph and an equivalent , we define
where is the set of aggregation nodes in , and and are the set of edges in and , respectively. measures the number of aggregations that can be saved by using for GNN training.
We begin by defining the subset relations between different HAGs. For two HAGs and , we define iff is a subset of , where and are the aggregation nodes in and , respectively.
Prove that is monotone. We show that for all , . This is true since indicates that contains all aggregation nodes in , which applies that can at least save the same number of aggregations as .
Prove that is submodular. We show that for all and any aggregation node , . This inequality holds because measures the number of aggregations we can further save by adding aggregation to the existing HAG, which monotonically decreases as we add more aggregation nodes to the HAG.
Let denote the result HAG after the -th iteration of Algorithm 3. includes exactly aggregation nodes. Let denote the optimal HAG under the cost model with aggregation nodes. We claim via induction that for ,
The base case is trivially true. In the -th step, Algorithm 3 selects an aggregation node by maximizing the marginal gain . Observe that the remaining aggregation nodes includes , a set of at most elements. The submodularity applies that
and this implies that the aggregation node has marginal value
Appendix D Time Complexity of Algorithm 3
The overall time complexity of Algorithm 3 is .
We use a heap to maintain the redundancy score of each potential node pair and only update the heap when we add and remove edges in . Since the depth of the heap is at most 111This is because there can be at most node pairs., querying the most redundant binary aggregation and modifying each takes time.
First, we calculate the number of queries and updates to the heap structure:
The algorithm iteratively pull the most redundant binary aggregation from the heap and add it to . Since the number of vertices in is smaller than capacity, the total number of queries is .
The algorithm inserts two new edges into in line 16 and removes one edge from in line 19. Since line 16 can be invoked at most times, the total number of invocations to line 19 is . Therefore, the overall number of updates is .
Second, the enumeration over all vertices in (line 17) involves time complexity of . Therefore, the overall time complexity of Algorithm 3 is