Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery

10/24/2021 ∙ by Kartik Lakhotia, et al. ∙ University of Southern California 0

Wing and Tip decomposition construct a hierarchy of butterfly-dense edge and vertex induced bipartite subgraphs, respectively. They have applications in several domains including e-commerce, recommendation systems and document analysis. Existing decomposition algorithms use a bottom-up approach that constructs the hierarchy in an increasing order of subgraph density. They iteratively peel the entities with minimum butterfly count i.e. remove them from the graph and update the butterfly count of other entities. However, the amount of butterflies in large bipartite graphs makes bottom-up peeling computationally demanding. Furthermore, the strict order of peeling entities results in a numerous sequentially dependent iterations, which makes parallelization challenging. In this paper, we propose a novel Parallel Bipartite Network peelinG (PBNG) framework which adopts a two-phased peeling approach to relax the order of peeling, and in turn, reduce synchronization. The first phase divides the decomposition hierarchy into several partitions using very few peeling iterations. The second phase concurrently processes these partitions to generate the final hierarchy, and requires no global synchronization. The two-phased peeling further enables batching optimizations that dramatically improve the computational efficiency. We empirically evaluate PBNG using several real-world bipartite graphs. Compared to the state-of-the-art frameworks and decomposition algorithms, PBNG achieves up to four orders of magnitude reduction in synchronization and two orders of magnitude speedup, respectively. We also present the first decomposition results of some of the largest public real-world datasets, which PBNG can peel in few minutes but existing algorithms fail to process even in several days.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A bipartite graph is a special graph whose vertices can be partitioned into two disjoint sets and such that any edge connects a vertex from set with a vertex from set . Several real-world systems naturally exhibit bipartite relationships, such as consumer-product purchase network of an e-commerce website (consumerProduct), user-ratings data in a recommendation system (he2016ups; lim2010detecting), author-paper network of a scientific field (authorPaper), group memberships in a social network (orkut) etc. Due to the rapid growth of data produced in these domains, efficient mining of dense structures in bipartite graphs has become a popular research topic (wangButterfly; wangBitruss; zouBitruss; sariyucePeeling; shiParbutterfly; lakhotia2020receipt).

Nucleus decomposition is commonly used to mine hierarchical dense subgraphs where minimum clique participation of an edge in a subgraph determines its level in the hierarchy (sariyuce2015finding). Truss decomposition is arguably the most popular case of nucleus decomposition which uses triangles (-cliques) to measure subgraph density (spamDet; graphChallenge; trussVLDB; sariyuce2016fast; bonchi2019distance; wen2018efficient). However, truss decomposition is not directly applicable for bipartite graphs as they do not have triangles. One way to circumvent this issue is to compute unipartite projection of a bipartite graph which contains an edge between each pair of vertices with common neighbor(s) in . But this approach suffers from (a) information loss which can impact quality of results, and (b) explosion in dataset size which can restrict its scalability (sariyucePeeling).

Butterfly (biclique/quadrangle) is the smallest cohesive motif in bipartite graphs. Butterflies can be used to directly analyze bipartite graphs and have drawn significant research interest in the recent years (wangRectangle; sanei2018butterfly; sanei2019fleet; shiParbutterfly; wangButterfly; he2021exploring; sariyucePeeling; wang2018efficient). Sariyuce and Pinar (sariyucePeeling) use butterflies as a density indicator to define the notion of wings and tips, as maximal bipartite subgraphs where each edge and vertex, respectively, is involved in at least butterflies. For example, the graph shown in fig.1a is a wing since each edge participates in at least one butterfly. Analogous to trusses (cohen2008trusses), wings (tips) represent hierarchical dense structures in the sense that a wing (tip) is a subgraph of a wing (tip).

In this paper, we explore parallel algorithms for wing111wing and wing decomposition are also known as bitruss and bitruss decomposition, respectively. and tip decomposition analytics, that construct the entire hierarchy of wings and tips in a bipartite graph, respectively. For space-efficient representation of the hierarchy, these analytics output wing number of each edge or tip number of each vertex , which represent the densest level of hierarchy that contains or , respectively. Wing and tip decomposition have several real-world applications such as:

  • [leftmargin=*]

  • Link prediction in recommendation systems or e-commerce websites that contain communities of users with common preferences or purchase history (he2021exploring; leicht2006vertex; navlakha2008graph; communityDet).

  • Mining nested communities in social networks or discussion forums, where users affiliate with broad groups and more specific sub-groups based on their interests.  (he2021exploring).

  • Detecting spam reviewers that collectively rate selected items in rating networks (mukherjee2012spotting; fei2013exploiting; lim2010detecting).

  • Document clustering by mining co-occurring keywords and groups of documents containing them (dhillon2001co).

  • Finding nested groups of researchers from author-paper networks (sariyucePeeling) with varying degree of collaboration.

Figure 1. (a) Bipartite graph which is also a -wing. (b) Wing decomposition hierarchy of – edges colored blue, red, green and black have wing numbers of 1, 2, 3 and 4, respectively.

Existing algorithms for decomposing bipartite graphs typically employ an iterative bottom-up peeling approach (sariyucePeeling; shiParbutterfly), wherein entities (edges and vertices for wing and tip decomposition, respectively) with the minimum support (butterfly count) are peeled in each iteration. Peeling an entity involves deleting it from the graph and updating the support of other entities that share butterflies with . However, the huge number of butterflies in bipartite graphs makes bottom-up peeling computationally demanding and renders large graphs intractable for decomposing by sequential algorithms. For example, trackers – a bipartite network of internet domains and the trackers contained in them, has million edges but more than trillion butterflies.

Parallel computing is widely used to scale such high complexity analytics to large datasets (Park_2016; smith2017truss; 10.1145/3299869.3319877). However, the bottom-up peeling approach used in existing parallel frameworks (shiParbutterfly) severely restricts parallelism by peeling entities in a strictly increasing order of their entity numbers (wing or tip numbers). Consequently, it takes a very large number of iterations to peel an entire graph, for example, it takes million iterations to peel all edges of the trackers dataset using bottom-up peeling. Moreover, each peeling iteration is sequentially dependent on support updates in all prior iterations, thus mandating synchronization of parallel threads before each iteration. Hence, the conventional approach of parallelizing workload within each iteration (shiParbutterfly) suffers from heavy thread synchronization, and poor parallel scalability.

In this paper, we propose a novel two-phased peeling approach for generalized bipartite graph decomposition. Both phases in the proposed approach exploit parallelism across multiple levels of the decomposition hierarchy to drastically reduce the number of parallel peeling iterations and in turn, the amount of thread synchronization. The first phase creates a coarse hierarchy which divides the set of entity numbers into few non-overlapping ranges. It accordingly partitions the entities by iteratively peeling the ones with support in the lowest range. A major implication of range-based partitioning is that each iteration peels a large number of entities corresponding to a wide range of entity numbers. This results in large parallel workload per iteration and little synchronization.

The second phase concurrently processes multiple partitions to compute the exact entity numbers. Absence of overlap between corresponding entity number ranges enables every partition to be peeled independently of others. By assigning each partition exclusively to a single thread, this phase achieves parallelism with no global synchronization.

We implement the two-phased peeling as a part of our Parallel Bipartite Network peelinG (PBNG) framework which adapts this approach for both wing and tip decomposition. PBNG further encapsulates novel workload optimizations that exploit batched peeling of numerous entities in the first phase to dramatically improve computational efficiency of decomposition. Overall, our contributions can be summarized as follows:

  1. [leftmargin=*]

  2. We propose a novel two-phased peeling approach for bipartite graph decomposition, that parallelizes workload across different levels of decomposition hierarchy. The proposed methodology is implemented in our PBNG framework which generalizes it for both vertex and edge peeling. To the best of our knowledge, this is the first approach to utilize parallelism across the levels of both wing and tip decomposition hierarchies.

  3. Using the proposed two-phased peeling, we achieve a dramatic reduction in the number of parallel peeling iterations and in turn, the thread synchronization. As an example, wing decomposition of trackers dataset in PBNG requires only parallel peeling iterations, which is four orders of magnitude less than existing parallel algorithms.

  4. We develop novel optimizations that are highly effective for the two-phased peeling approach and dramatically reduce the work done by PBNG. As a result, PBNG traverses only trillion wedges during tip decomposition of internet domains in trackers dataset, compared to trillion wedges traversed by the state-of-the-art.

We empirically evaluate PBNG on several real-world bipartite graphs and demonstrate its superior scalability compared to state-of-the-art. We show that PBNG significantly expands the limits of current practice by decomposing some of the largest publicly available datasets in few minutes/hours, that existing algorithms cannot decompose in multiple days.

In a previous work (lakhotia2020receipt), we developed a two-phased algorithm for tip decomposition (vertex peeling). This paper generalizes the two-phased approach for peeling any set of entities within a bipartite graph. We further present non-trivial techniques to adopt the two-phased peeling for wing decomposition (edge peeling), which is known to reveal better quality dense subgraphs than tip decomposition (sariyucePeeling).

2. Background

In this section, we formally define the problem statement and review existing methods for butterfly counting and bipartite graph decomposition. Note that counting is used to initialize support (running count of butterflies) of each vertex or edge before peeling, and also inspires some optimizations to improve efficiency of decomposition.

Table 1 lists some notations used in this paper. For description of a general approach, we use the term entity to denote a vertex (for tip decomposition) or an edge (for wing decomposition), and entity number to denote tip or wing number (sec.2.2), respectively. Correspondingly, notations and denote the support and entity number of entity .

bipartite graph with disjoint vertex sets and , and edges
no. of vertices in i.e. / no. of edges in i.e.
arboricity of (chibaArboricity)
neighbors of vertex / degree of vertex
/ no. of butterflies in that contain vertex / edge
/ support (runing count of butterflies) of vertex / edge

support vector of all vertices in set

/ all edges in
/ tip number of vertex / wing number of edge
/ maximum tip number of vertices in / maximum wing number of edges in
number of vertex/edge partitions created by PBNG
number of threads
Table 1. Notations and their definition

2.1. Butterfly counting

A butterfly (2,2-bicliques/quadrangle) can be viewed as a combination of two wedges with common endpoints. For example, in fig.1a, both wedges and have end points and , and form a butterfly. A simple way to count butterflies is to explore all wedges and combine the ones with common end points. However, this is computationally inefficient with complexity (if we use vertices in as end points).

Chiba and Nishizeki (chibaArboricity) developed an efficient vertex-priority quadrangle counting algorithm in which starting from each vertex , only those wedges are expanded where has the highest degree. It has a theoretical complexity of , which is state-of-the-art for butterfly counting. Wang et al.(wangButterfly) further propose a cache-efficient version of this algorithm that traverses wedges such that the degree of the last vertex is greater than the that of the start and middle vertices (alg.1, line 10). Thus, wedge explorations frequently end at a small set of high degree vertices that can be cached.

The vertex-priority algorithm can be easily parallelized by concurrently processing multiple start vertices (shiParbutterfly; wangButterfly). In PBNG, we use the per-vertex and per-edge counting variants of the parallel algorithm (shiParbutterfly; wangButterfly), as shown in alg.1. To avoid conflicts, each thread is provided an individual -element array (alg.1, line 5) for wedge aggregation, and butterfly counts of entities are incremented using atomic operations.

1:Input: Bipartite Graph
2:Output: Butterfly counts – for each , and for each
3: for each ;    for each Initialization
4:Relabel vertices in decreasing order of degree Priority assignment
5:for each  do in parallel
6:     Sort in increasing order of new labels
7:for each vertex  do in parallel
8:     Initialize hashmap to all zeros
9:     Initialize an empty wedge set
10:     for each vertex
11:         for each vertex
12:              if  or  then break              
15:     for each  such that Per-vertex counting
17:         ;  
18:         for each 
20:     for each  such that Per-edge counting
21:         Let and denote edges and , respectively
22:         ;        
Algorithm 1 Counting per-vertex and per-edge butterflies (pveBcnt)

2.2. Bipartite Graph Decomposition

Sariyuce et al.(sariyucePeeling) introduced tips and wings as a butterfly dense vertex and edge-induced subgraphs, respectively. They are formally defined as follows:

Definition 0 ().

A bipartite subgraph , induced on edges , is a k-wing iff

  • each edge is contained in at least k butterflies,

  • any two edges is connected by a series of butterflies,

  • is maximal i.e. no other wing in subsumes .

Definition 0 ().

A bipartite subgraph , induced on vertex sets and , is a k-tip iff

  • each vertex is contained in at least k butterflies,

  • any two vertices are connected by a series of butterflies,

  • is maximal i.e. no other tip in subsumes .

Both wings and tips are hierarchical as a wing/tip completely overlaps with a wing/tip for all . Therefore, instead of storing all wings, a wing number of an edge is defined as the maximum for which is present in a wing. Similarly, tip number of a vertex is the maximum for which is present in a tip. Wing and tip numbers act as a space-efficient indexing from which any level of the wing and tip hierarchy, respectively, can be quickly retrieved (sariyucePeeling). In this paper, we study the problem of finding wing and tip numbers, also known as wing and tip decomposition, respectively.

Bottom-Up Peeling (BUP) is a commonly employed technique to compute wing decomposition (alg.2). It initializes the support of each edge using per-edge butterfly counting (alg.2, line 1), and then iteratively peels the edges with minimum support until no edge remains. When an edge is peeled, its support in that iteration is recorded as its wing number (alg.2, line 4). Further, for every edge that shares butterflies with , the support is decreased corresponding to the removal of those butterflies. Thus, edges are peeled in a non-decreasing order of wing numbers.

Bottom-up peeling for tip decomposition utilizes a similar procedure for peeling vertices. A crucial distinction here is that in tip decomposition, vertices in only one of the sets or are peeled as a tip consists of all vertices from the other set (defn.2). For clarity of description, we assume that is the vertex set to peel. As we will see later in sec.3.2, this distinction renders the two-phased approach of PBNG highly suitable for decomposition.

Runtime of bottom-up peeling is dominated by wedge traversal required to find butterflies that contain the entities being peeled (alg.2, lines 7-9). The overall complexity for wing decomposition is . Relatively, tip decomposition has a lower complexity of , which is still quadratic in vertex degrees and very high in absolute terms.

1:Input: Bipartite graph
2:Output: Wing numbers
3: pveBcnt() Counting for support initialization (alg.1)
4:while  do Peeling
7:     update()
9:function update()
10:     for each  Find butterflies
11:         Let denote edge
12:         for each  such that
13:              Let and denote edges and , respectively
14:              ;  ;   Update support               
Algorithm 2 Wing decomposition using bottom-up peeling (BUP)

2.3. Bloom-Edge-Index

Chiba and Nishizeki (chibaArboricity) proposed storing wedges derived from the computational patterns of their butterfly counting algorithm, as a space-efficient representation of all butterflies. Wang et al.(wangBitruss) used a similar representation termed Bloom-Edge-Index (BE-Index) for quick retrieval of butterflies containing peeled edges during wing decomposition. We extensively utilize BE-Index not just for computational efficiency, but also for enabling parallelism in wing decomposition. In this subsection, we give a brief overview of some key concepts in this regard.

The butterfly counting algorithm assigns priorities (labels) to all vertices in a decreasing order of their degree (alg.1, line 2). Based on these priorities, a structure called maximal priority bloom, which is the basic building block of BE-Index, is defined as follows (wangBitruss):

Definition 0 ().

A maximal priority bloom is a biclique (either or has exactly two vertices, each connected to all vertices in or , respectively) that satisfies the following conditions:

  1. The highest priority vertex in belongs to the set ( or ) which has exactly two vertices, and

  2. is maximal i.e. there exists no biclique such that and satisfies condition .

Maximal Priority Bloom Notations:

The vertex set ( or ) containing the highest priority vertex is called the dominant set of . Note that each vertex in the non-dominant set has exactly two incident edges in , that are said to be twins of each other in bloom . For example, in the graph shown in fig.2, the subgraph induced on is a maximal priority bloom with as the highest priority vertex and twin edge pairs and . The twin of an edge in bloom is denoted by . The cardinality of the non-dominant vertex set of bloom is called the bloom number of . Wang et al.(wangBitruss) further prove the following properties of maximal priority blooms:

Property 1 ().

A bloom consists of exactly butterflies. Each edge is contained in exactly butterflies in . Further, edge shares all butterflies with , and one butterfly each with all other edges .

Property 2 ().

A butterfly in must be contained in exactly one maximal priority bloom.

Note that the butterflies containing an edge , and the other edges in those butterflies, can be obtained by exploring all blooms that contain . For quick access to blooms of an edge and vice-versa, BE-Index is defined as follows:

Definition 0 ().

BE-Index of a graph is a bipartite graph that links all maximal priority blooms in to the respective edges within the blooms.

  • W(I) – Vertices in and uniquely represent all maximal priority blooms in and edges in , respectively. Each vertex also stores the bloom number of the corresponding bloom.

  • E(I) – There exists an edge if and only if the corresponding bloom contains the edge . Each edge is labeled with .

BE-Index Notations:

For ease of explanation, we refer to a maximal priority bloom as simply bloom. We use the same notation  (or ) to denote both a bloom (or edge) and its representative vertex in BE-Index. Neighborhood of a vertex and is denoted by and , respectively. The bloom number of in BE-Index is denoted by . Note that .

Figure 2. (a) Bipartite graph (b) BE-Index of with two maximal priority blooms.

Fig.2 depicts a graph (subgraph of from fig.1) and its BE-Index. consists of two maximal priority blooms: (a) with dominant set and , and (b) with dominant vertex set and . As an example, edge is a part of butterfly in shared with twin , and butterflies in shared with twin . With all other edges in and , it shares one butterfly each.

Construction of BE-Index:

Index construction can be easily embedded within the counting procedure (alg.1). Each pair of endpoint vertices of wedges explored during counting, represents the dominating set of a bloom (with as the highest priority vertex) containing the edges and for all midpoints . Lastly, for a given vertex , edges and are twins of each other. Thus, the space and computational complexities of BE-Index construction are bounded by the the wedges explored during counting which is .

Wing Decomposition with BE-Index:

Alg.3 depicts the procedure to peel an edge using BE-Index . Instead of traversing wedges in to find butterflies of , edges that share butterflies with are found by exploring 2-hop neighborhood of in  (alg.3, line 7). Number of butterflies shared with these edges in each bloom is also obtained analytically using property 1 (alg.3, lines 4 and 8). Remarkably, peeling an edge using alg.3 requires at most traversal in BE-Index (wangBitruss). Thus, it reduces the computational complexity of wing decomposition to . However, it is still proportional to the number of butterflies which can be enormous for large graphs.

1:function update()
2:     for each bloom
3:         , bloom number of in
6:          Update bloom number
7:         for each 
8:               Update support               
Algorithm 3 Support update during edge peeling, using BE-Index

2.4. Challenges

Bipartite graph decomposition is computationally very expensive and parallel computing is widely used to accelerate such workloads. However, state-of-the-art parallel framework ParButterfly (shiParbutterfly; julienne) is based on bottom-up peeling and only utilizes parallelism within each peeling iteration. This restricted parallelism is due to the following sequential dependency between iterations – support updates in an iteration guide the choice of entities to peel in the subsequent iterations. Hence, even though ParButterfly is work-efficient (shiParbutterfly), its scalability is limited because:

  1. [leftmargin=*]

  2. It incurs large number of iterations and low parallel workload per iteration. Due to the resulting synchronization and load imbalance, intra-iteration parallelism is insufficient for substantial acceleration.
    Objective 1 is therefore, to design a parallelism aware peeling methodology for bipartite graphs that reduces synchronization and exposes large amount of parallel workload.

  3. It traverses an enormous amount of wedges (or bloom-edge links in BE-Index) to retrieve butterflies removed by peeling. This is computationally expensive and can be infeasible on large datasets, even for a parallel algorithm.
    Objective 2 is therefore, to reduce the amount of traversal in practice.

3. Parallel Bipartite Network peelinG (PBNG)

In this section, we describe a generic parallelism friendly two-phased peeling approach for bipartite graph decomposition (targeting objective , sec.2.4). We further demonstrate how this approach is adopted individually for tip and wing decomposition in our Parallel Bipartite Network peelinG (PBNG) framework.

3.1. Two-phased Peeling

Figure 3. Graphical illustration of PBNG’s two-phased peeling for wing decomposition of the graph from fig.1. The coarse-grained decomposition divides into partitions using parallel peeling iterations. The fine-grained decomposition peels each partition using a single thread but concurrently processes multiple partitions.

The fundamental observation underlining our approach is that entity number for an entity only depends on the number of butterflies shared between and other entities with entity numbers no less than . Therefore, given a graph and per-entity butterfly counts in  (obtained from counting), only the cumulative effect of peeling all entities with entity number strictly smaller than , is relevant for computing level (tip or wing) in the decomposition hierarchy. Due to commutativity of addition, the order of peeling these entities has no impact on level.

This insight allows us to eliminate the constraint of deleting only minimum support entities in each iteration, which bottlenecks the available parallelism. To find level, all entities with entity number less than can be peeled concurrently, providing sufficient parallel workload. However, for every possible , peeling all entities with smaller entity number will be computationally very inefficient. To avoid this inefficiency, we develop a novel two-phased approach.

3.1.1. Coarse-grained Decomposition

The first phase divides the spectrum of all possible entity numbers into smaller non-overlapping ranges , where is the maximum entity number in , and is a user-specified parameter. A range represents a set of entity numbers , such that for all

. These ranges are computed using a heuristic described in sec.

3.1.3. Corresponding to each range , PBNG also computes the partition comprising all entities whose entity numbers lie in . Thus, instead of finding the exact entity number of an entity , the first phase of PBNG computes bounds on . Therefore, we refer to this phase as Coarse-grained Decomposition (PBNG CD). The absence of overlap between the ranges allows each subset to be peeled independently of others in the second phase, for exact entity number computation.

Entity partitions are computed by iteratively peeling entities whose support lie in the minimum range (alg.4,lines 5-13). For each partition, the first peeling iteration in PBNG CD scans all entities to find the peeling set, denoted as  (alg.4, line 9). In subsequent iterations, is computed jointly with support updates. Thus, unlike bottom-up peeling, PBNG CD does not require a priority queue data structure which makes support updates relatively cheaper.

PBNG CD can be visualized as a generalization of bottom-up peeling (alg.2). In each iteration, the latter peels entities with minimum support ( for all ), whereas PBNG CD peels entities with support in a broad custom range (). For example, in fig.3, PBNG CD divides edges into two partitions corresponding to ranges and , whereas bottom-up peeling will create partitions corresponding to every individual level in the decomposition hierarchy (). Setting ensures a large number of entities peeled per iteration (sufficient parallel workload) and significantly fewer iterations (dramatically less synchronization) compared to bottom-up peeling.

In addition to the ranges and partitions, PBNG CD also computes a support initialization vector . For an entity , is the number of butterflies that shares only with entities in partitions such that . In other words, it represents the aggregate effect of peeling entities with entity number in ranges lower than . During iteative peeling in PBNG CD, this number is inherently generated after the last peeling iteration of and copied into  (alg.4, lines 6-7). For example, in fig.3, support of after peeling is , which is recorded in .

1:Input: Bipartite graph , # partitions
2:Output: Ranges , Edge Partitions , Support initialization vector
4:Initial support pveBcnt() Ref: alg.1
5: BE-Index of
6:, , target butterflies (workload) per partition
7:while  and  do
8:     for each  do in parallel Support Initialization Vector
10:      Upper Bound
11:      all edges such that
12:     while  do Peel edges
13:         ,
14:         parallel_update()
15:          all edges such that      
18:function find_range()
19:     Initialize hashmap to all zeros
20:     for each 
22:      such that
23:     return
25:function parallel_update()
26:     Initialize hashmap to all zeros
27:     for each edge  do in parallel
28:         for each bloom Update support atomically
29:              , bloom number of in
30:              if  or  then
34:                  for each  such that
36:     for each  such that  do in parallel
37:          Update bloom number      
Algorithm 4 PBNG Coarse-grained Decomposition (PBNG CD) for wing decomposition

3.1.2. Fine-grained Decomposition

The second phase computes exact entity numbers and is called Fine-grained Decomposition (PBNG FD). The key idea behind PBNG FD is that if we have the knowledge of all butterflies that each entity shares only with entities in partitions such that , can be peeled independently of all other partitions. The vector computed in PBNG CD precisely indicates the count of such butterflies (sec.3.1.1) and hence, is used to initialize support values in PBNG FD. PBNG FD exploits the resulting independence among partitions to concurrently process multiple partitions using sequential bottom up peeling. Setting ensures that PBNG FD can be efficiently parallelized across partitions on threads. Overall, both phases in PBNG circumvent strict sequential dependencies between different levels of decomposition hierarchy to efficiently parallelize the peeling process.

The two-phased approach can potentially double the computation required for peeling. However, we note that since partitions are peeled independently in PBNG FD, support updates are not communicated across the partitions. Therefore, to improve computational efficiency, PBNG FD operates on a smaller representative subgraph for each partitions . Specifically, preserves a butterfly iff it satisfies both of the following conditions:

  1. [leftmargin=*]

  2. contains multiple entities within .

  3. only contains entities from partitions such that . If contains an entity from lower ranged partitions, then it does not exist in -level of decomposition hierarchy (lowest entity number in ). Moreover, the impact of removing on the support of entities in , is already accounted for in  (sec.3.1.1).

For example, in fig.3, contains the butterfly because (a) it contains multiple edges and satisfies condition , and (b) all edges in are from or and hence, it satisfies condition . However, does not contain this butterfly because two if its edges are in and hence, it does not satisfy condition for .

3.1.3. Range Partitioning

In PBNG CD, the first step for computing a partition is to find the range  (alg.4, line 8). For load balancing, should be computed222 is directly obtained from upper bound of previous range . such that the all partitions pose uniform workload in PBNG FD. However, the representative subgraphs and the corresponding workloads are not known prior to actual partitioning. Furthermore, exact entity numbers are not known either and hence, we cannot determine beforehand, exactly which entities will lie in for different values of . Considering these challenges, PBNG uses two proxies for range determination:

  1. [leftmargin=*]

  2. Proxy 1 current support of an entity is used as a proxy for its entity number.

  3. Proxy 2 complexity of peeling individual entities in

    is used as a proxy to estimate peeling workload in representative subgraphs.

Now, the problem is to compute such that estimated workload of as per proxies, is close to the average workload per partition denoted as . To this purpose, PBNG CD creates a bin for each support value, and computes the aggregate workload of entities in that bin. For a given , estimated workload of peeling is the sum of workload of all bins corresponding to support less than333All entities with entity numbers less than are already peeled before PBNG CD computes . . Thus, the workload of as a function of can be computed by a prefix scan of individual bin workloads (alg.4, lines 17-18). Using this function, the upper bound is chosen such that the estimated workload of is close to but no less than  (alg.4, line 19).

Adaptive Range Computation:

Since range determination uses current support as a proxy for entity numbers, the target workload for each partition is covered by the entities added to in its very first peeling iteration in PBNG CD. After the support updates in this iteration, more entities may be added to and final workload estimate of may significantly exceed . This can result in significant load imbalance among the partitions and potentially, PBNG CD could finish in much fewer than partitions. To avoid this scenario, we implement the following two-way adaptive range determination:

  1. [leftmargin=*]

  2. Instead of statically computing an average target, we dynamically update for every partition based on the remaining workload and the number of partitions to create. If a partition gets too much workload, the target for subsequent partitions is automatically reduced, thereby preventing a situation where all entities get peeled in partitions.

  3. A partition likely covers many more entities than the initial estimate based on proxy 1. The second adaptation scales down the dynamic target for in an attempt to bring the actual workload close to the intended value. It assumes predictive local behavior i.e. will overshoot the target similar to . Therefore, the scaling factor is computed as the ratio of initial workload estimate of during computation, and final estimate based on all entities in .

3.1.4. Partition scheduling in PBNG FD

While adaptive range determination(sec.3.1.3) tries to create partitions with uniform estimated workload, the actual workload per partition in PBNG FD depends on the the representative subgraphs

and can still have significant variance. Therefore, to

improve load balance across threads, we use scheduling strategies inspired from Longest Processing Time (LPT) scheduling rule which is a well known -approximation algorithm (graham1969bounds). We use the workload of as an indicator of its execution time in the following runtime scheduling mechanism:

  • [leftmargin=*]

  • Dynamic task allocation All partition IDs are inserted in a task queue. When a thread becomes idle, it pops a unique ID from the queue and processes the corresponding partition. Thus, all threads are busy until every partition is scheduled.

  • Workload-aware Scheduling Partition IDs in the task queue are sorted in a decreasing order of their workload. Thus, partitions with highest workload get scheduled first and the threads processing them naturally receive fewer tasks in the future. Fig.4 shows how workload-aware scheduling can improve the efficiency of dynamic allocation.

Figure 4. Benefits of Workload-aware Scheduling (WaS) in a -thread ( and ) system. Top row shows entity partitions with estimated time to peel them in PBNG FD. Dynamic allocation without WaS finishes in units of time compared to units with WaS.

3.2. Tip Decomposition

In this section, we give a brief overview of PBNG’s two-phased peeling (sec.3.1) applied for tip decomposition. A detailed description of the same is provided in our previous work (lakhotia2020receipt).

For tip decomposition, PBNG CD divides the vertex set into partitions – . Peeling a vertex requires traversal of all wedges with as one of the endpoints. Therefore, range determination in PBNG CD uses wedge count of vertices in , as a proxy to estimate the workload of peeling each partition . Moreover, since only one of the vertex sets of is peeled, at most two vertices of a butterfly can be a part of the peeling set (). Hence, support updates to a vertex from different vertices in correspond to disjoint butterflies in . The net update to support can be simply computed by atomically aggregating the updates from individual vertices in .

PBNG FD also utilizes the fact that any butterfly contains at most two vertices in the vertex set being peeled (sec.2.2). If and , the two conditions for preserving in either representative graphs or are satisfied only when  (sec.3.1.2). Based on this insight, we construct as the subgraph induced on vertices . Clearly, preserves every butterfly where . For task scheduling in PBNG FD (sec.3.1.4), we use the total wedges in with endpoints in as an indicator of the workload of peeling .

Given the bipartite nature of graph , any edge exists in exactly one of the subgraphs and thus, the collective space requirement of all induced subgraphs is bounded by . Moreover, by the design of representative (induced) subgraphs, PBNG FD for tip decomposition traverses only those wedges for which both the endpoints are in the same partition. This dramatically reduces the amount of work done in PBNG FD compared to bottom-up peeling and PBNG CD. Note that we do not use BE-Index for tip decomposition due to the following reasons:

  • [leftmargin=*]

  • Butterflies between two vertices are quadratic in the number of wedges between them, and wedge traversal (not butterfly count) determines the work done in tip decomposition. Since BE-Index facilitates per-edge butterfly retrieval, peeling a vertex using BE-Index will require processing each of its edge individually and can result in increased computation if  (sec.2.3).

  • BE-Index has a high space complexity of compared to just space needed to store and all induced subgraphs . This can make BE-Index based peeling infeasible even on machines with large amount of main memory. For example, BE-Index of a user-group dataset Orkut ( million edges) has billion blooms, billion bloom-edge links and consumes TB memory.

3.3. Wing Decomposition

3.3.1. Challenges

Each butterfly consists of edges in which is the entity set to decompose in wing decomposition. This is unlike tip decomposition where each butterfly has only vertices from the decomposition set , and results in the following issues:

  1. [leftmargin=*]

  2. When a butterfly is removed due to peeling, the support of unpeeled edge(s) in should be reduced by exactly corresponding to this removal. However, when multiple (but not all) edges in are concurrently peeled in the same iteration of PBNG CD, multiple updates with aggregate value may be generated to unpeeled edges in .

  3. It is possible that a butterfly contains multiple but not all edges from a partition. Thus, may need to be preserved in the representative subgraph of a partition, but will not be present in its edge-induced subgraph.

Due to these reasons, a trivial extension of tip decomposition algorithm (sec.3.2) is not suitable for wing decomposition. In this section, we explore novel BE-Index based strategies to enable two-phased peeling for wing decomposition.

3.3.2. Pbng Cd

This phase divides the edges into partitions – , as shown in alg.4. Not only do we utilize BE-Index for computationally efficient support update computation in PBNG CD, we also utilize it to avoid conflicts in parallel peeling iterations of PBNG CD. Since a butterfly is contained in exactly one maximal priority bloom (sec.2.3, property 2), correctness of support updates within each bloom implies overall correctness of support updates in an iteration. To avoid conflicts, we therefore employ the following resolution mechanism for each bloom :

  1. [leftmargin=*]

  2. If an edge and its twin are both peeled in the same iteration, then only the edge with highest index among and updates (a) the support of other edges in , and (b) the bloom number  (alg.4, lines 26-31). This is because all butterflies in that contain also contain  (sec.2.3, property1).

  3. If in an iteration, an edge but , then the support is decreased by exactly when is peeled. Other edges in do not propagate any updates to via bloom  (alg.4, lines 26-30). This is because is contained in exactly butterflies in , all of which are removed when is peeled. To ensure that correctly represents the butterflies shared between twin edges, support updates from all peeled edges are computed prior to updating .

Peeling an edge requires traversal in the BE-Index. Therefore, range determination in PBNG CD uses edge support as a proxy to estimate the workload of peeling each partition .

1:Input: Graph , BE-Index , edge partitions
2:          Support initialization vector
3:Output: Wing number for all
4: for all Initialize Support
5: partition_BE_Index() BE-Indices for all partitions
6:Insert integers in queue
7:Compute for all , and sort by LPT Scheduling
8:for  do in parallel
9:     while  is not empty do
10:         Atomically pop integer from Dynamic Task Allocation
11:         while  do Peel partition
13:              ,
14:              update() Ref: alg.3               
16:function partition_BE_Index() Compute partitions’ BE-Indices
17:     Let denote the partition index of an edge i.e.
19:     for each  do in parallel
20:         Initialize hash map to all zeros
21:         for each 
23:              if  then
24:                   Add bloom-edge links
25:                  if  or  then Initialize bloom numbers
27:         for each partition index such that
28:               Compute bloom numbers (prefix scan)               
29:     return
Algorithm 5 PBNG Fine-grained Decomposition (PBNG FD) for wing decomposition

3.3.3. Pbng Fd

The first step for peeling a partition in PBNG FD, is to construct the corresponding BE-Index for its representative subgraph . One way to do so is to compute and then use the index construction algorithm (sec.2.3) to construct . However, this approach has the following drawbacks:

  • [leftmargin=*]

  • Computing requires mining all edges in that share butterflies with edges , which can be computationally expensive. Additionally, the overhead of index construction even for a few hundred partitions can be significantly large.

  • Any edge can potentially exist in all subgraphs such that . Therefore, creating and storing all subgraphs requires memory space.

To avoid these drawbacks, we directly compute for each by partitioning the BE-Index of the original graph  (alg.5, lines 12-25). Our partitioning mechanism ensures that all butterflies satisfying the two preservation conditions (sec.3.1.2) for a partition , are represented in its BE-Index .

Firstly, for an edge , its link with a bloom is preserved in if and only if the twin such that  (alg.5, lines 19-20). Since all butterflies in that contain also contain  (sec.2.3, property 1), none of them need to be preserved in if . Moreover, if , their contribution to the bloom number is counted only once (alg.5, lines -22).

Secondly, for a space-efficient representation, does not store a link if . However, while such an edge will not participate in peeling of , it may be a part of a butterfly in that satisfies both preservation conditions for  (sec.3.1.2). For example, fig.2 shows the representative subgraph and BE-Index for the partition generated by PBNG CD in fig.3. For space efficiency, we do not store the links in . However, the two butterflies in and , satisfy both preservation conditions for , and may be needed when peeling or . In order to account for such butterflies, we adjust the bloom number to correctly represent the number of butterflies in that only contain edges from all partitions such that  (alg.5, lines 23-24). For example, in fig.2b, we initialize the bloom number of to even though . Thus, correctly represents the butterflies in , that contain edges only from 444This is unlike the BE-Index of graph where bloom numb er  (sec.2.3)..

After the BE-Indices for partitions are computed, PBNG FD dynamically sche