1. Introduction
A bipartite graph is a special graph whose vertices can be partitioned into two disjoint sets and such that any edge connects a vertex from set with a vertex from set . Several realworld systems naturally exhibit bipartite relationships, such as consumerproduct purchase network of an ecommerce website (consumerProduct), userratings data in a recommendation system (he2016ups; lim2010detecting), authorpaper network of a scientific field (authorPaper), group memberships in a social network (orkut) etc. Due to the rapid growth of data produced in these domains, efficient mining of dense structures in bipartite graphs has become a popular research topic (wangButterfly; wangBitruss; zouBitruss; sariyucePeeling; shiParbutterfly; lakhotia2020receipt).
Nucleus decomposition is commonly used to mine hierarchical dense subgraphs where minimum clique participation of an edge in a subgraph determines its level in the hierarchy (sariyuce2015finding). Truss decomposition is arguably the most popular case of nucleus decomposition which uses triangles (cliques) to measure subgraph density (spamDet; graphChallenge; trussVLDB; sariyuce2016fast; bonchi2019distance; wen2018efficient). However, truss decomposition is not directly applicable for bipartite graphs as they do not have triangles. One way to circumvent this issue is to compute unipartite projection of a bipartite graph which contains an edge between each pair of vertices with common neighbor(s) in . But this approach suffers from (a) information loss which can impact quality of results, and (b) explosion in dataset size which can restrict its scalability (sariyucePeeling).
Butterfly (biclique/quadrangle) is the smallest cohesive motif in bipartite graphs. Butterflies can be used to directly analyze bipartite graphs and have drawn significant research interest in the recent years (wangRectangle; sanei2018butterfly; sanei2019fleet; shiParbutterfly; wangButterfly; he2021exploring; sariyucePeeling; wang2018efficient). Sariyuce and Pinar (sariyucePeeling) use butterflies as a density indicator to define the notion of wings and tips, as maximal bipartite subgraphs where each edge and vertex, respectively, is involved in at least butterflies. For example, the graph shown in fig.1a is a wing since each edge participates in at least one butterfly. Analogous to trusses (cohen2008trusses), wings (tips) represent hierarchical dense structures in the sense that a wing (tip) is a subgraph of a wing (tip).
In this paper, we explore parallel algorithms for wing^{1}^{1}1wing and wing decomposition are also known as bitruss and bitruss decomposition, respectively. and tip decomposition analytics, that construct the entire hierarchy of wings and tips in a bipartite graph, respectively. For spaceefficient representation of the hierarchy, these analytics output wing number of each edge or tip number of each vertex , which represent the densest level of hierarchy that contains or , respectively. Wing and tip decomposition have several realworld applications such as:

[leftmargin=*]

Link prediction in recommendation systems or ecommerce websites that contain communities of users with common preferences or purchase history (he2021exploring; leicht2006vertex; navlakha2008graph; communityDet).

Mining nested communities in social networks or discussion forums, where users affiliate with broad groups and more specific subgroups based on their interests. (he2021exploring).

Detecting spam reviewers that collectively rate selected items in rating networks (mukherjee2012spotting; fei2013exploiting; lim2010detecting).

Document clustering by mining cooccurring keywords and groups of documents containing them (dhillon2001co).

Finding nested groups of researchers from authorpaper networks (sariyucePeeling) with varying degree of collaboration.
Existing algorithms for decomposing bipartite graphs typically employ an iterative bottomup peeling approach (sariyucePeeling; shiParbutterfly), wherein entities (edges and vertices for wing and tip decomposition, respectively) with the minimum support (butterfly count) are peeled in each iteration. Peeling an entity involves deleting it from the graph and updating the support of other entities that share butterflies with . However, the huge number of butterflies in bipartite graphs makes bottomup peeling computationally demanding and renders large graphs intractable for decomposing by sequential algorithms. For example, trackers – a bipartite network of internet domains and the trackers contained in them, has million edges but more than trillion butterflies.
Parallel computing is widely used to scale such high complexity analytics to large datasets (Park_2016; smith2017truss; 10.1145/3299869.3319877). However, the bottomup peeling approach used in existing parallel frameworks (shiParbutterfly) severely restricts parallelism by peeling entities in a strictly increasing order of their entity numbers (wing or tip numbers). Consequently, it takes a very large number of iterations to peel an entire graph, for example, it takes million iterations to peel all edges of the trackers dataset using bottomup peeling. Moreover, each peeling iteration is sequentially dependent on support updates in all prior iterations, thus mandating synchronization of parallel threads before each iteration. Hence, the conventional approach of parallelizing workload within each iteration (shiParbutterfly) suffers from heavy thread synchronization, and poor parallel scalability.
In this paper, we propose a novel twophased peeling approach for generalized bipartite graph decomposition. Both phases in the proposed approach exploit parallelism across multiple levels of the decomposition hierarchy to drastically reduce the number of parallel peeling iterations and in turn, the amount of thread synchronization. The first phase creates a coarse hierarchy which divides the set of entity numbers into few nonoverlapping ranges. It accordingly partitions the entities by iteratively peeling the ones with support in the lowest range. A major implication of rangebased partitioning is that each iteration peels a large number of entities corresponding to a wide range of entity numbers. This results in large parallel workload per iteration and little synchronization.
The second phase concurrently processes multiple partitions to compute the exact entity numbers. Absence of overlap between corresponding entity number ranges enables every partition to be peeled independently of others. By assigning each partition exclusively to a single thread, this phase achieves parallelism with no global synchronization.
We implement the twophased peeling as a part of our Parallel Bipartite Network peelinG (PBNG) framework which adapts this approach for both wing and tip decomposition. PBNG further encapsulates novel workload optimizations that exploit batched peeling of numerous entities in the first phase to dramatically improve computational efficiency of decomposition. Overall, our contributions can be summarized as follows:

[leftmargin=*]

We propose a novel twophased peeling approach for bipartite graph decomposition, that parallelizes workload across different levels of decomposition hierarchy. The proposed methodology is implemented in our PBNG framework which generalizes it for both vertex and edge peeling. To the best of our knowledge, this is the first approach to utilize parallelism across the levels of both wing and tip decomposition hierarchies.

Using the proposed twophased peeling, we achieve a dramatic reduction in the number of parallel peeling iterations and in turn, the thread synchronization. As an example, wing decomposition of trackers dataset in PBNG requires only parallel peeling iterations, which is four orders of magnitude less than existing parallel algorithms.

We develop novel optimizations that are highly effective for the twophased peeling approach and dramatically reduce the work done by PBNG. As a result, PBNG traverses only trillion wedges during tip decomposition of internet domains in trackers dataset, compared to trillion wedges traversed by the stateoftheart.
We empirically evaluate PBNG on several realworld bipartite graphs and demonstrate its superior scalability compared to stateoftheart. We show that PBNG significantly expands the limits of current practice by decomposing some of the largest publicly available datasets in few minutes/hours, that existing algorithms cannot decompose in multiple days.
In a previous work (lakhotia2020receipt), we developed a twophased algorithm for tip decomposition (vertex peeling). This paper generalizes the twophased approach for peeling any set of entities within a bipartite graph. We further present nontrivial techniques to adopt the twophased peeling for wing decomposition (edge peeling), which is known to reveal better quality dense subgraphs than tip decomposition (sariyucePeeling).
2. Background
In this section, we formally define the problem statement and review existing methods for butterfly counting and bipartite graph decomposition. Note that counting is used to initialize support (running count of butterflies) of each vertex or edge before peeling, and also inspires some optimizations to improve efficiency of decomposition.
Table 1 lists some notations used in this paper. For description of a general approach, we use the term entity to denote a vertex (for tip decomposition) or an edge (for wing decomposition), and entity number to denote tip or wing number (sec.2.2), respectively. Correspondingly, notations and denote the support and entity number of entity .
bipartite graph with disjoint vertex sets and , and edges  

no. of vertices in i.e. / no. of edges in i.e.  
arboricity of (chibaArboricity)  
neighbors of vertex / degree of vertex  
/  no. of butterflies in that contain vertex / edge 
/  support (runing count of butterflies) of vertex / edge 
/  support vector of all vertices in set / all edges in 
/  tip number of vertex / wing number of edge 
/  maximum tip number of vertices in / maximum wing number of edges in 
number of vertex/edge partitions created by PBNG  
number of threads 
2.1. Butterfly counting
A butterfly (2,2bicliques/quadrangle) can be viewed as a combination of two wedges with common endpoints. For example, in fig.1a, both wedges and have end points and , and form a butterfly. A simple way to count butterflies is to explore all wedges and combine the ones with common end points. However, this is computationally inefficient with complexity (if we use vertices in as end points).
Chiba and Nishizeki (chibaArboricity) developed an efficient vertexpriority quadrangle counting algorithm in which starting from each vertex , only those wedges are expanded where has the highest degree. It has a theoretical complexity of , which is stateoftheart for butterfly counting. Wang et al.(wangButterfly) further propose a cacheefficient version of this algorithm that traverses wedges such that the degree of the last vertex is greater than the that of the start and middle vertices (alg.1, line 10). Thus, wedge explorations frequently end at a small set of high degree vertices that can be cached.
The vertexpriority algorithm can be easily parallelized by concurrently processing multiple start vertices (shiParbutterfly; wangButterfly). In PBNG, we use the pervertex and peredge counting variants of the parallel algorithm (shiParbutterfly; wangButterfly), as shown in alg.1. To avoid conflicts, each thread is provided an individual element array (alg.1, line 5) for wedge aggregation, and butterfly counts of entities are incremented using atomic operations.
2.2. Bipartite Graph Decomposition
Sariyuce et al.(sariyucePeeling) introduced tips and wings as a butterfly dense vertex and edgeinduced subgraphs, respectively. They are formally defined as follows:
Definition 0 ().
A bipartite subgraph , induced on edges , is a kwing iff

each edge is contained in at least k butterflies,

any two edges is connected by a series of butterflies,

is maximal i.e. no other wing in subsumes .
Definition 0 ().
A bipartite subgraph , induced on vertex sets and , is a ktip iff

each vertex is contained in at least k butterflies,

any two vertices are connected by a series of butterflies,

is maximal i.e. no other tip in subsumes .
Both wings and tips are hierarchical as a wing/tip completely overlaps with a wing/tip for all . Therefore, instead of storing all wings, a wing number of an edge is defined as the maximum for which is present in a wing. Similarly, tip number of a vertex is the maximum for which is present in a tip. Wing and tip numbers act as a spaceefficient indexing from which any level of the wing and tip hierarchy, respectively, can be quickly retrieved (sariyucePeeling). In this paper, we study the problem of finding wing and tip numbers, also known as wing and tip decomposition, respectively.
BottomUp Peeling (BUP) is a commonly employed technique to compute wing decomposition (alg.2). It initializes the support of each edge using peredge butterfly counting (alg.2, line 1), and then iteratively peels the edges with minimum support until no edge remains. When an edge is peeled, its support in that iteration is recorded as its wing number (alg.2, line 4). Further, for every edge that shares butterflies with , the support is decreased corresponding to the removal of those butterflies. Thus, edges are peeled in a nondecreasing order of wing numbers.
Bottomup peeling for tip decomposition utilizes a similar procedure for peeling vertices. A crucial distinction here is that in tip decomposition, vertices in only one of the sets or are peeled as a tip consists of all vertices from the other set (defn.2). For clarity of description, we assume that is the vertex set to peel. As we will see later in sec.3.2, this distinction renders the twophased approach of PBNG highly suitable for decomposition.
Runtime of bottomup peeling is dominated by wedge traversal required to find butterflies that contain the entities being peeled (alg.2, lines 79). The overall complexity for wing decomposition is . Relatively, tip decomposition has a lower complexity of , which is still quadratic in vertex degrees and very high in absolute terms.
2.3. BloomEdgeIndex
Chiba and Nishizeki (chibaArboricity) proposed storing wedges derived from the computational patterns of their butterfly counting algorithm, as a spaceefficient representation of all butterflies. Wang et al.(wangBitruss) used a similar representation termed BloomEdgeIndex (BEIndex) for quick retrieval of butterflies containing peeled edges during wing decomposition. We extensively utilize BEIndex not just for computational efficiency, but also for enabling parallelism in wing decomposition. In this subsection, we give a brief overview of some key concepts in this regard.
The butterfly counting algorithm assigns priorities (labels) to all vertices in a decreasing order of their degree (alg.1, line 2). Based on these priorities, a structure called maximal priority bloom, which is the basic building block of BEIndex, is defined as follows (wangBitruss):
Definition 0 ().
A maximal priority bloom is a biclique (either or has exactly two vertices, each connected to all vertices in or , respectively) that satisfies the following conditions:

The highest priority vertex in belongs to the set ( or ) which has exactly two vertices, and

is maximal i.e. there exists no biclique such that and satisfies condition .
Maximal Priority Bloom Notations:
The vertex set ( or ) containing the highest priority vertex is called the dominant set of . Note that each vertex in the nondominant set has exactly two incident edges in , that are said to be twins of each other in bloom . For example, in the graph shown in fig.2, the subgraph induced on is a maximal priority bloom with as the highest priority vertex and twin edge pairs and . The twin of an edge in bloom is denoted by . The cardinality of the nondominant vertex set of bloom is called the bloom number of . Wang et al.(wangBitruss) further prove the following properties of maximal priority blooms:
Property 1 ().
A bloom consists of exactly butterflies. Each edge is contained in exactly butterflies in . Further, edge shares all butterflies with , and one butterfly each with all other edges .
Property 2 ().
A butterfly in must be contained in exactly one maximal priority bloom.
Note that the butterflies containing an edge , and the other edges in those butterflies, can be obtained by exploring all blooms that contain . For quick access to blooms of an edge and viceversa, BEIndex is defined as follows:
Definition 0 ().
BEIndex of a graph is a bipartite graph that links all maximal priority blooms in to the respective edges within the blooms.

W(I) – Vertices in and uniquely represent all maximal priority blooms in and edges in , respectively. Each vertex also stores the bloom number of the corresponding bloom.

E(I) – There exists an edge if and only if the corresponding bloom contains the edge . Each edge is labeled with .
BEIndex Notations:
For ease of explanation, we refer to a maximal priority bloom as simply bloom. We use the same notation (or ) to denote both a bloom (or edge) and its representative vertex in BEIndex. Neighborhood of a vertex and is denoted by and , respectively. The bloom number of in BEIndex is denoted by . Note that .
Fig.2 depicts a graph (subgraph of from fig.1) and its BEIndex. consists of two maximal priority blooms: (a) with dominant set and , and (b) with dominant vertex set and . As an example, edge is a part of butterfly in shared with twin , and butterflies in shared with twin . With all other edges in and , it shares one butterfly each.
Construction of BEIndex:
Index construction can be easily embedded within the counting procedure (alg.1). Each pair of endpoint vertices of wedges explored during counting, represents the dominating set of a bloom (with as the highest priority vertex) containing the edges and for all midpoints . Lastly, for a given vertex , edges and are twins of each other. Thus, the space and computational complexities of BEIndex construction are bounded by the the wedges explored during counting which is .
Wing Decomposition with BEIndex:
Alg.3 depicts the procedure to peel an edge using BEIndex . Instead of traversing wedges in to find butterflies of , edges that share butterflies with are found by exploring 2hop neighborhood of in (alg.3, line 7). Number of butterflies shared with these edges in each bloom is also obtained analytically using property 1 (alg.3, lines 4 and 8). Remarkably, peeling an edge using alg.3 requires at most traversal in BEIndex (wangBitruss). Thus, it reduces the computational complexity of wing decomposition to . However, it is still proportional to the number of butterflies which can be enormous for large graphs.
2.4. Challenges
Bipartite graph decomposition is computationally very expensive and parallel computing is widely used to accelerate such workloads. However, stateoftheart parallel framework ParButterfly (shiParbutterfly; julienne) is based on bottomup peeling and only utilizes parallelism within each peeling iteration. This restricted parallelism is due to the following sequential dependency between iterations – support updates in an iteration guide the choice of entities to peel in the subsequent iterations. Hence, even though ParButterfly is workefficient (shiParbutterfly), its scalability is limited because:

[leftmargin=*]

It incurs large number of iterations and low parallel workload per iteration. Due to the resulting synchronization and load imbalance, intraiteration parallelism is insufficient for substantial acceleration.
Objective 1 is therefore, to design a parallelism aware peeling methodology for bipartite graphs that reduces synchronization and exposes large amount of parallel workload. 
It traverses an enormous amount of wedges (or bloomedge links in BEIndex) to retrieve butterflies removed by peeling. This is computationally expensive and can be infeasible on large datasets, even for a parallel algorithm.
Objective 2 is therefore, to reduce the amount of traversal in practice.
3. Parallel Bipartite Network peelinG (PBNG)
In this section, we describe a generic parallelism friendly twophased peeling approach for bipartite graph decomposition (targeting objective , sec.2.4). We further demonstrate how this approach is adopted individually for tip and wing decomposition in our Parallel Bipartite Network peelinG (PBNG) framework.
3.1. Twophased Peeling
The fundamental observation underlining our approach is that entity number for an entity only depends on the number of butterflies shared between and other entities with entity numbers no less than . Therefore, given a graph and perentity butterfly counts in (obtained from counting), only the cumulative effect of peeling all entities with entity number strictly smaller than , is relevant for computing level (tip or wing) in the decomposition hierarchy. Due to commutativity of addition, the order of peeling these entities has no impact on level.
This insight allows us to eliminate the constraint of deleting only minimum support entities in each iteration, which bottlenecks the available parallelism. To find level, all entities with entity number less than can be peeled concurrently, providing sufficient parallel workload. However, for every possible , peeling all entities with smaller entity number will be computationally very inefficient. To avoid this inefficiency, we develop a novel twophased approach.
3.1.1. Coarsegrained Decomposition
The first phase divides the spectrum of all possible entity numbers into smaller nonoverlapping ranges , where is the maximum entity number in , and is a userspecified parameter. A range represents a set of entity numbers , such that for all
. These ranges are computed using a heuristic described in sec.
3.1.3. Corresponding to each range , PBNG also computes the partition comprising all entities whose entity numbers lie in . Thus, instead of finding the exact entity number of an entity , the first phase of PBNG computes bounds on . Therefore, we refer to this phase as Coarsegrained Decomposition (PBNG CD). The absence of overlap between the ranges allows each subset to be peeled independently of others in the second phase, for exact entity number computation.Entity partitions are computed by iteratively peeling entities whose support lie in the minimum range (alg.4,lines 513). For each partition, the first peeling iteration in PBNG CD scans all entities to find the peeling set, denoted as (alg.4, line 9). In subsequent iterations, is computed jointly with support updates. Thus, unlike bottomup peeling, PBNG CD does not require a priority queue data structure which makes support updates relatively cheaper.
PBNG CD can be visualized as a generalization of bottomup peeling (alg.2). In each iteration, the latter peels entities with minimum support ( for all ), whereas PBNG CD peels entities with support in a broad custom range (). For example, in fig.3, PBNG CD divides edges into two partitions corresponding to ranges and , whereas bottomup peeling will create partitions corresponding to every individual level in the decomposition hierarchy (). Setting ensures a large number of entities peeled per iteration (sufficient parallel workload) and significantly fewer iterations (dramatically less synchronization) compared to bottomup peeling.
In addition to the ranges and partitions, PBNG CD also computes a support initialization vector . For an entity , is the number of butterflies that shares only with entities in partitions such that . In other words, it represents the aggregate effect of peeling entities with entity number in ranges lower than . During iteative peeling in PBNG CD, this number is inherently generated after the last peeling iteration of and copied into (alg.4, lines 67). For example, in fig.3, support of after peeling is , which is recorded in .
3.1.2. Finegrained Decomposition
The second phase computes exact entity numbers and is called Finegrained Decomposition (PBNG FD). The key idea behind PBNG FD is that if we have the knowledge of all butterflies that each entity shares only with entities in partitions such that , can be peeled independently of all other partitions. The vector computed in PBNG CD precisely indicates the count of such butterflies (sec.3.1.1) and hence, is used to initialize support values in PBNG FD. PBNG FD exploits the resulting independence among partitions to concurrently process multiple partitions using sequential bottom up peeling. Setting ensures that PBNG FD can be efficiently parallelized across partitions on threads. Overall, both phases in PBNG circumvent strict sequential dependencies between different levels of decomposition hierarchy to efficiently parallelize the peeling process.
The twophased approach can potentially double the computation required for peeling. However, we note that since partitions are peeled independently in PBNG FD, support updates are not communicated across the partitions. Therefore, to improve computational efficiency, PBNG FD operates on a smaller representative subgraph for each partitions . Specifically, preserves a butterfly iff it satisfies both of the following conditions:

[leftmargin=*]

contains multiple entities within .

only contains entities from partitions such that . If contains an entity from lower ranged partitions, then it does not exist in level of decomposition hierarchy (lowest entity number in ). Moreover, the impact of removing on the support of entities in , is already accounted for in (sec.3.1.1).
For example, in fig.3, contains the butterfly because (a) it contains multiple edges and satisfies condition , and (b) all edges in are from or and hence, it satisfies condition . However, does not contain this butterfly because two if its edges are in and hence, it does not satisfy condition for .
3.1.3. Range Partitioning
In PBNG CD, the first step for computing a partition is to find the range (alg.4, line 8). For load balancing, should be computed^{2}^{2}2 is directly obtained from upper bound of previous range . such that the all partitions pose uniform workload in PBNG FD. However, the representative subgraphs and the corresponding workloads are not known prior to actual partitioning. Furthermore, exact entity numbers are not known either and hence, we cannot determine beforehand, exactly which entities will lie in for different values of . Considering these challenges, PBNG uses two proxies for range determination:

[leftmargin=*]

Proxy 1 current support of an entity is used as a proxy for its entity number.

Proxy 2 complexity of peeling individual entities in
is used as a proxy to estimate peeling workload in representative subgraphs.
Now, the problem is to compute such that estimated workload of as per proxies, is close to the average workload per partition denoted as . To this purpose, PBNG CD creates a bin for each support value, and computes the aggregate workload of entities in that bin. For a given , estimated workload of peeling is the sum of workload of all bins corresponding to support less than^{3}^{3}3All entities with entity numbers less than are already peeled before PBNG CD computes . . Thus, the workload of as a function of can be computed by a prefix scan of individual bin workloads (alg.4, lines 1718). Using this function, the upper bound is chosen such that the estimated workload of is close to but no less than (alg.4, line 19).
Adaptive Range Computation:
Since range determination uses current support as a proxy for entity numbers, the target workload for each partition is covered by the entities added to in its very first peeling iteration in PBNG CD. After the support updates in this iteration, more entities may be added to and final workload estimate of may significantly exceed . This can result in significant load imbalance among the partitions and potentially, PBNG CD could finish in much fewer than partitions. To avoid this scenario, we implement the following twoway adaptive range determination:

[leftmargin=*]

Instead of statically computing an average target, we dynamically update for every partition based on the remaining workload and the number of partitions to create. If a partition gets too much workload, the target for subsequent partitions is automatically reduced, thereby preventing a situation where all entities get peeled in partitions.

A partition likely covers many more entities than the initial estimate based on proxy 1. The second adaptation scales down the dynamic target for in an attempt to bring the actual workload close to the intended value. It assumes predictive local behavior i.e. will overshoot the target similar to . Therefore, the scaling factor is computed as the ratio of initial workload estimate of during computation, and final estimate based on all entities in .
3.1.4. Partition scheduling in PBNG FD
While adaptive range determination(sec.3.1.3) tries to create partitions with uniform estimated workload, the actual workload per partition in PBNG FD depends on the the representative subgraphs
and can still have significant variance. Therefore, to
improve load balance across threads, we use scheduling strategies inspired from Longest Processing Time (LPT) scheduling rule which is a well known approximation algorithm (graham1969bounds). We use the workload of as an indicator of its execution time in the following runtime scheduling mechanism:
[leftmargin=*]

Dynamic task allocation All partition IDs are inserted in a task queue. When a thread becomes idle, it pops a unique ID from the queue and processes the corresponding partition. Thus, all threads are busy until every partition is scheduled.

Workloadaware Scheduling Partition IDs in the task queue are sorted in a decreasing order of their workload. Thus, partitions with highest workload get scheduled first and the threads processing them naturally receive fewer tasks in the future. Fig.4 shows how workloadaware scheduling can improve the efficiency of dynamic allocation.
3.2. Tip Decomposition
In this section, we give a brief overview of PBNG’s twophased peeling (sec.3.1) applied for tip decomposition. A detailed description of the same is provided in our previous work (lakhotia2020receipt).
For tip decomposition, PBNG CD divides the vertex set into partitions – . Peeling a vertex requires traversal of all wedges with as one of the endpoints. Therefore, range determination in PBNG CD uses wedge count of vertices in , as a proxy to estimate the workload of peeling each partition . Moreover, since only one of the vertex sets of is peeled, at most two vertices of a butterfly can be a part of the peeling set (). Hence, support updates to a vertex from different vertices in correspond to disjoint butterflies in . The net update to support can be simply computed by atomically aggregating the updates from individual vertices in .
PBNG FD also utilizes the fact that any butterfly contains at most two vertices in the vertex set being peeled (sec.2.2). If and , the two conditions for preserving in either representative graphs or are satisfied only when (sec.3.1.2). Based on this insight, we construct as the subgraph induced on vertices . Clearly, preserves every butterfly where . For task scheduling in PBNG FD (sec.3.1.4), we use the total wedges in with endpoints in as an indicator of the workload of peeling .
Given the bipartite nature of graph , any edge exists in exactly one of the subgraphs and thus, the collective space requirement of all induced subgraphs is bounded by . Moreover, by the design of representative (induced) subgraphs, PBNG FD for tip decomposition traverses only those wedges for which both the endpoints are in the same partition. This dramatically reduces the amount of work done in PBNG FD compared to bottomup peeling and PBNG CD. Note that we do not use BEIndex for tip decomposition due to the following reasons:

[leftmargin=*]

Butterflies between two vertices are quadratic in the number of wedges between them, and wedge traversal (not butterfly count) determines the work done in tip decomposition. Since BEIndex facilitates peredge butterfly retrieval, peeling a vertex using BEIndex will require processing each of its edge individually and can result in increased computation if (sec.2.3).

BEIndex has a high space complexity of compared to just space needed to store and all induced subgraphs . This can make BEIndex based peeling infeasible even on machines with large amount of main memory. For example, BEIndex of a usergroup dataset Orkut ( million edges) has billion blooms, billion bloomedge links and consumes TB memory.
3.3. Wing Decomposition
3.3.1. Challenges
Each butterfly consists of edges in which is the entity set to decompose in wing decomposition. This is unlike tip decomposition where each butterfly has only vertices from the decomposition set , and results in the following issues:

[leftmargin=*]

When a butterfly is removed due to peeling, the support of unpeeled edge(s) in should be reduced by exactly corresponding to this removal. However, when multiple (but not all) edges in are concurrently peeled in the same iteration of PBNG CD, multiple updates with aggregate value may be generated to unpeeled edges in .

It is possible that a butterfly contains multiple but not all edges from a partition. Thus, may need to be preserved in the representative subgraph of a partition, but will not be present in its edgeinduced subgraph.
Due to these reasons, a trivial extension of tip decomposition algorithm (sec.3.2) is not suitable for wing decomposition. In this section, we explore novel BEIndex based strategies to enable twophased peeling for wing decomposition.
3.3.2. Pbng Cd
This phase divides the edges into partitions – , as shown in alg.4. Not only do we utilize BEIndex for computationally efficient support update computation in PBNG CD, we also utilize it to avoid conflicts in parallel peeling iterations of PBNG CD. Since a butterfly is contained in exactly one maximal priority bloom (sec.2.3, property 2), correctness of support updates within each bloom implies overall correctness of support updates in an iteration. To avoid conflicts, we therefore employ the following resolution mechanism for each bloom :

[leftmargin=*]

If in an iteration, an edge but , then the support is decreased by exactly when is peeled. Other edges in do not propagate any updates to via bloom (alg.4, lines 2630). This is because is contained in exactly butterflies in , all of which are removed when is peeled. To ensure that correctly represents the butterflies shared between twin edges, support updates from all peeled edges are computed prior to updating .
Peeling an edge requires traversal in the BEIndex. Therefore, range determination in PBNG CD uses edge support as a proxy to estimate the workload of peeling each partition .
3.3.3. Pbng Fd
The first step for peeling a partition in PBNG FD, is to construct the corresponding BEIndex for its representative subgraph . One way to do so is to compute and then use the index construction algorithm (sec.2.3) to construct . However, this approach has the following drawbacks:

[leftmargin=*]

Computing requires mining all edges in that share butterflies with edges , which can be computationally expensive. Additionally, the overhead of index construction even for a few hundred partitions can be significantly large.

Any edge can potentially exist in all subgraphs such that . Therefore, creating and storing all subgraphs requires memory space.
To avoid these drawbacks, we directly compute for each by partitioning the BEIndex of the original graph (alg.5, lines 1225). Our partitioning mechanism ensures that all butterflies satisfying the two preservation conditions (sec.3.1.2) for a partition , are represented in its BEIndex .
Firstly, for an edge , its link with a bloom is preserved in if and only if the twin such that (alg.5, lines 1920). Since all butterflies in that contain also contain (sec.2.3, property 1), none of them need to be preserved in if . Moreover, if , their contribution to the bloom number is counted only once (alg.5, lines 22).
Secondly, for a spaceefficient representation, does not store a link if . However, while such an edge will not participate in peeling of , it may be a part of a butterfly in that satisfies both preservation conditions for (sec.3.1.2). For example, fig.2 shows the representative subgraph and BEIndex for the partition generated by PBNG CD in fig.3. For space efficiency, we do not store the links in . However, the two butterflies in – and , satisfy both preservation conditions for , and may be needed when peeling or . In order to account for such butterflies, we adjust the bloom number to correctly represent the number of butterflies in that only contain edges from all partitions such that (alg.5, lines 2324). For example, in fig.2b, we initialize the bloom number of to even though . Thus, correctly represents the butterflies in , that contain edges only from ^{4}^{4}4This is unlike the BEIndex of graph where bloom numb er (sec.2.3)..
After the BEIndices for partitions are computed, PBNG FD dynamically sche
Comments
There are no comments yet.