 # An Efficient Algorithm to Test Potentially Bipartiteness of Graphical Degree Sequences

As a partial answer to a question of Rao, a deterministic and customizable efficient algorithm is presented to test whether an arbitrary graphical degree sequence has a bipartite realization. The algorithm can be configured to run in polynomial time, at the expense of possibly producing an erroneous output on some "yes" instances but with very low error rate.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Given an arbitrary graphical degree sequence , let denote the set of all of its non-isomorphic realizations. As usual, let and denote the chromatic number and clique number of a finite simple undirected graph respectively. It is known from Punnim Punnim2002A that for any given the set is exactly a set of integers in some interval. Define to be and to be . These two quantities can be interesting for the structural properties of all the graphs in .

Good lower and upper bounds on are known from Dvořák and Mohar Dvorak2013 in terms of , which can be easily computed for any given using the algorithm from Yin Yin2012 . For example, , and .

It appears computationally intractable to compute for any given zero-free . In this paper we are concerned with the related, somewhat easier, decision problem of whether . Clearly, this is equivalent to decide whether has a bipartite realization, which is actually the first listed unsolved problem in Rao Rao1981 to characterize potentially bipartite graphical degree sequences and which remains unsolved to our knowledge. Note that the input is a single sequence of vertex degrees. A related problem is to decide, given two sequences of positive integers , where and and , whether there is a bipartite graph whose two partite sets have and as their respective degree sequences. This problem can be easily solved by applying the Gale-Ryser theorem Gale1957 ; Ryser1957 , which states that the answer is “yes” if and only if the conjugate of dominates (or, equivalently, the conjugate of dominates ). Here we use the common definition of domination between two partitions of the same integer: a partition dominates a partition if for each . By convention, for , where denotes the number of parts in the partition . We also use to denote the weight of the partition , that is, the sum of all the parts of .

The rest of the paper is organized as follows. Section 2 describes the algorithm to decide whether a given has a bipartite realization. Section 3 gives a time complexity analysis of the algorithm. Section 4 presents some experimental results. Section 5 discusses alternative designs of the algorithm and comments on the complexity of the decision problem. Section 6 concludes with further research directions.

## 2 Description of the Algorithm

Clearly, to decide whether any zero-free graphical degree sequence with weight has a bipartite realization, we first need to determine whether it has a bipartition into and of equal weights (for convenience, we call such bipartitions of candidate bipartitions). One may feel it challenging to find a candidate bipartition of in the first place, because it looks exactly like the well-known subset sum problem, which is known to be NP-complete GareyJohnson1979 . Fortunately, since every term in of length is less than , this restricted subset sum problem can be solved easily through dynamic programming in polynomial time GareyJohnson1979 ; Koiliaris2019 . In fact, many inputs admit a large number of candidate bipartitions. Now we can see that the decision problem boils down to checking whether has at least one candidate bipartition and, if this is the case, whether any of those candidate bipartitions satisfies the Gale-Ryser condition.

A naive algorithm can simply enumerate all candidate bipartitions of and check each of them against the Gale-Ryser condition. Such an algorithm necessarily runs in exponential time in the worst case. Our algorithm is more sophisticated than that. It has two phases. The first phase utilizes up to seven rules that can all be easily checked. As a matter of fact, in Section 4 we will show that most of the inputs can be resolved by this phase alone. The second is the enumeration phase, in which we do “brute-force” search in a clever way.

In describing and justifying the seven rules in the first phase, we seek a candidate bipartition of into the left side and the right side in such a way that at least half of the largest terms in appear in , without loss of generality. For example, for any input of length 50 with the largest term 34 whose multiplicity is 5 (i.e. there are exactly 5 copies of 34 in ), we will seek a candidate bipartition such that the left side contains at least 3 copies of 34.

###### Rule 1.

If does not have a candidate bipartition, then it is not potentially bipartite.

###### Proof.

This rule is obvious. As mentioned above, this rule can be easily implemented through dynamic programming for the subset sum problem. ∎

###### Rule 2.

If , then is not potentially bipartite.

###### Proof.

Based on Mantel’s theorem Mantel1907 , any simple undirected bipartite graph on vertices has at most edges. So the degree sum cannot exceed for any that is potentially bipartite. ∎

###### Rule 3.

If , then is not potentially bipartite.

###### Proof.

Suppose is potentially bipartite. The left partite set contains a vertex of degree so the right partite set contains at least vertices ( neighbors), each of which has a degree at most since the left partite set has at most vertices. Consequently, must contain at least degrees that are . Therefore, must be for to be potentially bipartite. ∎

###### Rule 4.

If , then is not potentially bipartite.

###### Proof.

As mentioned in the proof of Rule 3, the left partite set has at most vertices. Clearly, the degree sum of the left side is impossible to exceed . Therefore, must be at least for to be potentially bipartite. ∎

###### Rule 5.

If , then is not potentially bipartite.

###### Proof.

As shown in the proof of Rule 3, each of the right side degrees in is at most . Therefore, every degree larger than must be in the left side and the sum of such degrees should not exceed for any that is potentially bipartite. ∎

For the following rule, we will need the concept of residue of a finite simple undirected graph or a graphical degree sequence introduced in Favaron et al. Favaron1991 and we use and as notations. We also use to denote the complementary graphical degree sequence of : , which is the degree sequence of the complementary graph of any realization of .

###### Rule 6.

If , then is not potentially bipartite.

###### Proof.

As proved in Favaron1991 , the residue of a graphical degree sequence is a lower bound on the independence number of any realization of . Then clearly is a lower bound on the clique number of any realization of . The result follows because any graph with a clique of size at least 3 is not bipartite. ∎

The following is a similar rule that uses the concept of Murphy’s bound introduced in Murphy MURPHY1991 , denoted or here, which is also a lower bound on the independence number of any realization of .

###### Rule 7.

If , then is not potentially bipartite.

If the input passes the tests of all of the above seven rules and cannot be resolved as a “no” instance, then our algorithm will enter the enumeration phase. From Rule 5 we know that by now we must have . In the special case that equality holds, which means the left side must contain exactly those degrees that are larger than should be potentially bipartite, our algorithm can immediately stop based on the result of the Galy-Ryser conditional test on this candidate bipartition of . Otherwise, our algorithm continues with , which is the sum of the additional degrees that need to be in the left side besides those that are larger than . For convenience, we use to denote the subsequence of consisting of those degrees that are larger than . Note that is an empty sequence when .

The second phase will then enumerate candidate bipartitions of into by specifying which degrees will be in the left side , which also automatically specifies . As we already know, we need to choose a subsequence of (i.e. from those degrees in that are at most ) with sum and concatenate with this subsequence of degrees to form based on the above discussion. Several restrictions regarding can be put on the left side for the candidate bipartitions to possibly satisfy the Galy-Ryser conditional test so that our algorithm will enumerate as few candidate bipartitions as possible.

###### Restriction 1.

The number of degrees in the left side cannot exceed . This is because the right side contains at least degrees.

###### Restriction 2.

Let be the maximum number of degrees in with sum at most . Then the number of degrees in the left side cannot exceed . This is because the degrees in the left side must have sum .

###### Restriction 3.

Let be the minimum largest degree in any subsequence of with sum at least . Then the number of degrees in the left side must be at least . This is because the largest degree in the right side must be at least and the conjugate of should dominate .

###### Restriction 4.

Let be the minimum number of degrees in with sum at least . Then the number of degrees in the left side must be at least . The reason is similar to that for Restriction 2.

It’s not hard to see that , and can all be easily calculated with greedy algorithms. The above discussion shows we can enumerate all subsequences of that satisfies the following three requirements:

1. it includes all degrees in (i.e. those degrees in that are greater than ).

2. it has sum .

3. its number of degrees should satisfy .

In order to find a successful (i.e. satisfying the Gale-Ryser condition) candidate bipartition of , our intuition is to include a suitable number of large degrees from and as many small degrees of as possible into without violating requirement 3 mentioned above. In this way will not include many of the largest degrees in while will still include enough number of degrees, which makes it more likely for the conjugate of to dominate .

Following this intuition we calculate a maximum index such that cannot include all in order for its conjugate to dominate . This index can be easily calculated as follows. Starting from , if for some , when we include all in and include from as many smallest degrees as possible into while still maintaining the correct sum , and when the number of degrees in starts to fall below , then can be chosen to be .

After has been calculated, we will try to find out if we can include a subsequence of into together with some degrees in such that the conjugate of dominates . Without loss of generality, this subsequence can be chosen to be the largest terms of . Or, equivalently, we can remove the smallest terms from one at a time to get these subsequences. For each such subsequence , where since necessarily includes all degrees in according to the above discussion, we perform the following two enumerative steps to fully construct :

1. starting from the largest possible, choose some degree from and include some copies of into . We also stipulate that no degree larger than from will be included into . Here is defined as follows. If includes all copies of from , then includes together with all copies of the degree from which is immediately smaller than . If does not include all copies of from , then includes together with all the remaining copies of from . The motivation for such a definition is that we don’t want to equal a degree we have just excluded from a previous consideration of when is being reduced starting from .

2. include some small terms that are all less than from into , where is the subsequence of consisting of all copies of . We can generate a number of possible combinations of small terms with each combination summing to a suitable value based on the choice of and the choice in the enumerative step (1) and having a suitable number of terms so that satisfies the inequality in the above requirement 3. An appropriate procedure can be designed for this purpose such that combinations with more smaller terms are generated first and each combination can be generated in time.

Note that both of these steps are enumerative steps. Step (1) must be exhaustive by trying each possible distinct from and each of the possible number of copies up to its multiplicity in . Step (2) can be non-exhaustive, which means we can impose a limit on the number of possible combinations of small terms to be included into . This parameter is the place where our algorithm is customizable and in reality we can choose to be a constant or a low degree polynomial of . This non-exhaustive enumeration step does open the possibility of our algorithm making an error on some “yes” input instances if the specified limit will cause our algorithm to skip some of the possible combinations. However, this step will not introduce any error on “no” input instances. We also note that some of the choices in these two steps can be pruned during the enumerative process to speed up the enumeration phase when they will cause to fail to satisfy the inequality in the above requirement 3. In fact, the lower bound on can be improved during the process as is being reduced so that the minimum largest degree in increases.

The reader may have noticed that these enumerative steps are more sophisticated and complicated than the simple naive scheme of enumerating all possible subsequences of with sum . We will discuss several alternative enumeration schemes later in Section 5. The presented enumeration scheme here is the fastest we found through experiments.

During the enumeration phase, the algorithm will stop and output “yes” if a successful candidate bipartition is found. Otherwise, it will stop enumeration and output “no” when the subsequence becomes shorter than , or, in the case that is empty, when includes less than half of the largest degrees from .

We note that the enumeration phase can be easily parallelized with respect to the different choices of . However, it may not be worth it given the good run time performance of the serial version unless the input is long and hard (say ). See the following sections for run time complexity analysis and experimental evaluations.

## 3 Analysis of Run Time Complexity

The seven rules in the first phase can all be checked in polynomial time. It can be easily verified that the total running time of these rules is .

In the second phase, the three quantities , and can all be computed in time. The maximum index can be calculated in time. The number of choices for is . For each choice of , the number of choices for and its number of copies to be included in in the enumerative step (1) is . The maximum number of combinations of the remaining small terms to be included in in the enumerative step (2) can be chosen to be , , etc. Each combination can be generated in time. Whenever a full left side has been constructed, the Galy-Ryser conditional test on the candidate bipartition can be performed in time. Overall, we can see that the second phase runs in time when is . Note this run time is achieved at the expense of the algorithm possibly producing an erroneous output on some “yes” instances. However, the observed error rate is so low that we consider the limit on worthwhile. On the other hand, if no limit is placed on , then our algorithm will always produce a correct output, at the expense of possibly running in exponential time in the worst case.

In summary, our algorithm can be customized to run in polynomial time with satisfactory low error rates (see Section 4 for some evidence of error rates). Also note that it is a deterministic instead of a randomized algorithm.

## 4 Experiments

We mainly tested our implementation of the decision algorithm with the parameter customized as . We first show the low error rates of the algorithm and then show the good run time performance.

### 4.1 Error Rates

We first demonstrate the somewhat surprising power of the seven rules in the first phase. In Table 1 we show the number of all zero-free graphical degree sequences of length that can be resolved by one of these rules and their proportion among all zero-free graphical degree sequences of length . Based on the description of the rules, these instances are all “no” instances. The function values are obtained through a program that incorporates our decision algorithm into the algorithm to enumerate all degree sequences of a certain length from Ruskey et al. Ruskey1994 . Let be the number of zero-free potentially bipartite graphical degree sequences of length . Clearly since some of the “no” instances are resolved in the second phase. It looks safe to conclude from this table that tends to 1 as grows towards infinity and so tends to 0. Note that these are just empirical observations. Rigorous proofs of the asymptotic orders of these functions or their relative orders might require advanced techniques Wang2019 .

In fact, those instances that can be resolved by one of the seven rules are not the only ones that can avoid the enumeration phase of our algorithm. For example, those instances that have can also be resolved immediately following the tests of the rules according to our description in Section 2.

Next we demonstrate the low error rates of our algorithm. In Table 2 we show the number of all zero-free potentially bipartite graphical degree sequences of length that will be incorrectly reported as a “no” instance if we set and their proportion among all zero-free potentially bipartite graphical degree sequences of length . Even with the smallest possible , our algorithm makes very few errors on the “yes” instances. In fact, if we set , then our algorithm makes no error on all zero-free graphical degree sequences of length . However, the observed trend is that the limit need to grow with for our algorithm to always make no error. We are unable to prove whether there is any polynomial of to bound such that our algorithm can always give correct outputs or the error rate is always below some constant. If grows faster than a polynomial of , then our algorithm could run more than polynomial time in the worst case. In our experiments we did not find any “yes” instance of length that will be misclassified by our algorithm under the setting of .

We note that the error rates reported in Table 2 is with respect to the “yes” instances. The error rate will be much lower if they are computed with respect to all instances of length because, as we know from Table 1, by far the majority of the “no” instances have already been correctly detected by the seven rules. For example, at the setting of . Plus, increasing from to also further reduces the error rate. For example, at the setting of .

### 4.2 Run Time Performance

We now demonstrate the run time performance of our algorithm with the setting of . Here the reported run times were obtained through a C++ implementation tested under typical Linux workstations. We have already shown in Section 3 that our algorithm runs in polynomial time if is bounded by a polynomial of . We generated random graphical degree sequences of specified length , largest term and smallest term . For a wide range of , we found that the hardest instances for our algorithm are approximately in the range of and . The instances in these ranges are the most likely to cause our algorithm to enter the enumeration phase. However, even the hardest instances we tested for can be finished in about a couple of minutes, which are necessarily those “no” instances that will go through the entire enumeration phase without any successful candidate bipartition being found. All the tested instances that are decided in the first phase can be finished almost instantly. All of the tested “yes” instances detected in the enumeration phase can be decided in at most tens of seconds due to the empirical fact that most of the “yes” instances have a successful candidate bipartition that can be found even when is set to 1.

## 5 Discussions

We mentioned in Section 2 that our algorithm is customizable through the limit in the enumerative step (2). In this section we describe several alternatives to the enumeration phase.

In the enumerative step (1) we have chosen from largest to smallest. Instead, we can choose from smallest to largest. On average, we found that the former has better run time performance.

In the enumerative step (2) we prefer to enumerate the combinations of smallest terms first. Instead, we can choose to enumerate those of largest terms first. On average, we still found that the former has better run time performance.

The enumerative steps (1) and (2) can even be combined into one step to make the enumeration phase simpler. That is, we can exhaustively enumerate all possible combinations of terms from with an appropriate sum subject to the requirement 3 about the number of terms in . (Or, to make it more naive, we could exhaustively enumerate all possible combinations of terms from with the sum .) With these schemes we still face the choice of enumerating largest terms first or smallest terms first. On average, the choice of “smallest terms first” still enjoys better run time performance. However, in order to achieve similar low error rates in these alternative schemes with this choice of “smallest terms first,” the limit on the number of combinations to be generated will usually have to be much larger than the chosen limit in our design in Section 2, causing these alternatives to have much worse run time performance on those instances that require the second phase to decide. If no limit is placed on the number of combinations to be generated, these alternatives will all produce correct outputs always. Nevertheless, the run time performance could become terrible. For example, for some hard instances with length from 100 to 300, it could take days to detect a successful candidate bipartition for “yes” instances and tens of days to decide for “no” instances when unlimited is chosen, a clear evidence of exponential run time behavior. For longer hard instances in the range , these more naive enumeration phases with unlimited might take years or longer time to finish.

As mentioned before, our algorithm always gives the correct conclusion for “no” instances. But it could give an incorrect output for some “yes” instances depending on the limit set in the enumeration phase. This kind of behavior can be contrasted with some randomized algorithms. The error our algorithm might make is fixed and it comes from the fact that not all potentially bipartite graphical degree sequences exhibit the kind of pattern that can be captured by the particular “limited” search process of our algorithm. Simply put, our algorithm is deterministic. If it makes an error on an input under a particular setting of , it always makes an error on that input with that setting. If a randomized algorithm makes an error on an input, then it could produce a correct output the next time it runs.

Now we comment on the complexity of the decision problem of potentially bipartiteness of graphical degree sequences. It is obviously in . We don’t know whether it is in co- or in , nor do we know whether it is -complete. Whenever our algorithm reports an input as an “yes” instance, it can also output a successful candidate bipartition. We are not sure if this is necessary for this decision problem. For example, the well-known decision problem of primality of integers can be decided in polynomial time AKS2004 . However, a “composite” output does not come with a prime factor. It is known from the prime number theorem Hadamard1896 ; Poussin1896 that almost all integers are composite. In this sense, the polynomial solvability of the primality testing problem seems intuitive. We would also like to compare this problem with the decision problem of whether a given graph is of class 1 or class 2, i.e. whether its edge chromatic number is equal to or where is the maximum degree of the given graph. It is known from ERDOS1977 that almost all graphs on vertices are of class 1 as grows towards infinity. However, it is -complete to decide whether a graph is of class 1 or class 2 Holyer1981 . These facts sound more unintuitive. It is almost certain from our experimental results that the proportion of zero-free graphical degree sequences of length that are not potentially bipartite approaches 1 as grows towards infinity. Is it possible that the decision problem is actually in ? Or, could it be that some hidden classes of hard instances are overlooked by our experiments and the decision problem is actually -complete or -intermediate, should .

In this paper we dealt with the decision problem of whether . In the case that is not potentially bipartite and it is desired to compute , we can decide, for each successive fixed , whether there is a -colorable realization of , until the answer becomes “yes.” We conjecture that each of these decision problems is -complete.

## 6 Summary and directions for future research

We presented a fast algorithm to test whether a graphical degree sequence is potentially bipartite. The algorithm works very well in practice. It remains open whether the decision problem can be solved in polynomial time. The complexity of the decision problem whether is also to be resolved.

## 7 Acknowledgements

This research has been supported by a research seed grant of Georgia Southern University. The computational experiments have been supported by the Talon cluster of Georgia Southern University.

## References

•  Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. Primes is in P. Annals of Mathematics, 160(2):781–793, 2004.
•  de la Vallée Poussin. Recherches analytiques la théorie des nombres premiers. Ann. Soc. scient. Bruxelles, 20:183–256, 1896.
•  Zdeněk Dvořák and Bojan Mohar. Chromatic number and complete graph substructures for degree sequences. Combinatorica, 33(5):513–529, 2013.
•  Paul Erdős and Robin J. Wilson. On the chromatic index of almost all graphs. Journal of Combinatorial Theory, Series B, 23(2):255–257, 1977.
•  O. Favaron, M. Mahéo, and J.-F. Saclé. On the residue of a graph. Journal of Graph Theory, 15(1):39–64, 1991.
•  D. Gale. A theorem on flows in networks. Pacific J. Math, 7(2):1073–1082, 1957.
•  Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979.
•  J. Hadamard. Sur la distribution des zéros de la fonction zeta(s) et ses conséquences arithmétiques. Bull. Soc. math. France, 24:199–220, 1896.
•  I. Holyer. The -completeness of edge-coloring. SIAM Journal on Computing, 10(4):718–720, 1981.
•  Konstantinos Koiliaris and Chao Xu. Faster pseudopolynomial time algorithms for subset sum. ACM Trans. Algorithms, 15(3):40:1–40:20, 2019.
•  W. Mantel. Problem 28 (solution by H. Gouwentak, W. Mantel, J. Teixeira de Mattes, F. Schuh and W. A. Wythoff). Wiskundige Opgaven, 10:60–61, 1907.
•  Owen Murphy. Lower bounds on the stability number of graphs computed in terms of degrees. Discrete Mathematics, 90(2):207–211, 1991.
•  Narong Punnim. Degree sequences and chromatic numbers of graphs. Graphs and Combinatorics, 18(3):597–603, 2002.
•  S. B. Rao. A survey of the theory of potentially P-graphic and forcibly P-graphic degree sequences. In Siddani Bhaskara Rao, editor, Combinatorics and Graph Theory: Lecture Notes in Mathematics, vol 885, pages 417–440. Springer Berlin Heidelberg, 1981.
•  Frank Ruskey, Robert Cohen, Peter Eades, and Aaron Scott. Alley CATs in search of good homes. In 25th S.E. Conference on Combinatorics, Graph Theory, and Computing, volume 102, pages 97–110. Congressus Numerantium, 1994.
•  H. J. Ryser. Combinatorial properties of matrices of zeros and ones. Canadian Journal of Mathematics, 9:371–377, 1957.
•  Kai Wang. Efficient counting of degree sequences. Discrete Mathematics, 342(3):888–897, 2019.
•  Jian-Hua Yin. A short constructive proof of A.R. Rao’s characterization of potentially -graphic sequences. Discrete Applied Mathematics, 160(3):352–354, 2012.