Bipartite Graphs of Small Readability

05/12/2018 ∙ by Rayan Chikhi, et al. ∙ University Lille 1: Sciences and Technologies Boston University University of Primorska Humboldt-Universität zu Berlin 0

We study a parameter of bipartite graphs called readability, introduced by Chikhi et al. (Discrete Applied Mathematics, 2016) and motivated by applications of overlap graphs in bioinformatics. The behavior of the parameter is poorly understood. The complexity of computing it is open and it is not known whether the decision version of the problem is in NP. The only known upper bound on the readability of a bipartite graph (following from a work of Braga and Meidanis, LATIN 2002) is exponential in the maximum degree of the graph. Graphs that arise in bioinformatic applications have low readability. In this paper, we focus on graph families with readability o(n), where n is the number of vertices. We show that the readability of n-vertex bipartite chain graphs is between Ω( n) and O(√(n)). We give an efficiently testable characterization of bipartite graphs of readability at most 2 and completely determine the readability of grids, showing in particular that their readability never exceeds 3. As a consequence, we obtain a polynomial time algorithm to determine the readability of induced subgraphs of grids. One of the highlights of our techniques is the appearance of Euler's totient function in the analysis of the readability of bipartite chain graphs. We also develop a new technique for proving lower bounds on readability, which is applicable to dense graphs with a large number of distinct degrees.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this work we further the study of readability of bipartite graphs initiated by Chikhi et al. [6]. Given a bipartite graph , an overlap labeling of is a mapping from vertices to strings, called labels, such that for all and there is an edge between and if and only if the label of overlaps with the label of (i.e., a non-empty suffix of ’s label is equal to a prefix of ’s label). The length of an overlap labeling of is the maximum length (i.e., number of characters) of a label. The readability of , denoted , is the smallest nonnegative integer such that there is an overlap labeling of of length . We emphasize that in this definition, no restriction is placed on the alphabet. One could also consider variants of readability parameterized the size of the alphabet. A result of Braga and Meidanis [5] implies that these variants are within constant factors of each other, where the constants are logarithmic in the alphabet sizes.

The notion of readability arises in the study of overlap digraphs. Overlap digraphs constructed from DNA strings have various applications in bioinformatics.111In the context of genome assembly, variants of overlap digraphs appear as either de Bruijn graphs [11] or string graphs [18, 21] and are the foundation of most modern assemblers (see [17, 19] for a survey). Several graph-theoretic parameters of overlap digraphs have been studied [3, 2, 4, 9, 15, 16, 20, 23], with a nice survey in [14]. Most of the graphs that occur as the overlap graphs of genomes have low readability. Chikhi et al. [6] show that the readability of overlap digraphs is asymptotically equivalent to that of balanced bipartite graphs: there is a bijection between overlap digraphs and balanced bipartite graphs that preserves readability up to (roughly) a factor of . This motivates the study of bipartite graphs with low readability. In this work we derive several results about bipartite graphs with readability sublinear in the number of vertices.

For general bipartite graphs, the only known upper bound on readability is implicit in a paper on overlap digraphs by Braga and Meidanis [5]. As observed by Chikhi et al. [6], it follows from [5] that the readability of a bipartite graph is well defined and at most , where is the maximum degree of the graph. Chikhi et al. [6] showed that almost all bipartite graphs with vertices in each part have readability . They also constructed an explicit graph family (called Hadamard graphs) with readability .

For trees, readability can be defined in terms of an integer function on the edges, without any reference to strings or their overlaps [6]. In this work, we reveal another connection to number theory, through Euler’s totient function, and use it to prove an upper bound on the readability of bipartite chain graphs.

So far, our understanding of readability has been hindered by the difficulty of proving lower bounds. Chikhi et al. [6] developed a lower bound technique for graphs where the overlap between the neighborhoods of any two vertices is limited. In this work, we add another technique to the toolbox. Our technique is applicable to dense graphs with a large number of distinct degrees. We apply this technique to obtain a lower bound on readability of bipartite chain graphs.

We give a characterization of bipartite graphs of readability at most  and use this characterization to obtain a polynomial time algorithm for checking if a graph has readability at most . This is the first nontrivial result of this kind: graphs of readability at most  are extremely simple (disjoint unions of complete bipartite graphs, see [6]), whereas the problem of recognizing graphs of readability 3 is open.

We also give a formula for the readability of grids, showing in particular that their readability never exceeds . As a corollary, we obtain a polynomial time algorithm to determine the readability of induced subgraphs of grids.

1.1 Our Results and Structure of the Paper

Preliminaries are summarized in Section 2; here we only state some of the most important technical facts. In the study of readability, it suffices to consider bipartite graphs that are connected and twin-free. A bipartite graph is twin-free if no two vertices in the same part have the same sets of neighbors [6]. Since connected bipartite graphs have a unique bipartition up to swapping the two parts, some of our results are stated without specifying the bipartition.

Bounds on the readability of bipartite chain graphs (Section 3).

Bipartite chain graphs are the bipartite analogue of a family of digraphs that occur naturally as subgraphs of overlap graphs of genomes. A bipartite chain graph is a bipartite graph such that the vertices in (or ) can be linearly ordered with respect to inclusion of their neighborhoods. That is, we can write so that (where denotes the set of ’s neighbors). A twin-free connected bipartite chain graph must have the same number of vertices on either side. For each , there is, up to isomorphism, a unique connected twin-free bipartite chain graph with vertices in each part, denoted . The graph is where , and . The graph is shown in Figure 1. We prove an upper and a lower bound on the readability of .

Figure 1: The graph
Theorem 1.

For all , the graph has readability , with labels over an alphabet of size 3.

We prove Theorem 1 by giving an efficient algorithm that constructs an overlap labeling of of length using strings over an alphabet of size .

Theorem 2.

For all , the graph has readability .

Characterization of bipartite graphs with readability at most  (Section 4).

Let for denote the simple cycle with vertices. The domino is the graph obtained from the cycle by adding an edge between two diametrically opposite vertices. For a graph and a set , let denote the subgraph of induced by .

Chikhi et al. [6] proved that every bipartite graph with readability at most  is a disjoint union of complete bipartite graphs (also called bicliques). The characterization in the following theorem extends our understanding to graphs of readability at most . Recall that a matching in a graph is a set of pairwise disjoint edges.

Theorem 3.

A twin-free bipartite graph has readability at most  if and only if  has a matching such that the graph satisfies the following properties:

  1. is a disjoint union of complete bipartite graphs.

  2. For , if is a , then is the disjoint union of three edges.

  3. For , if is a domino, then is the disjoint union of a and an edge.

Note that Theorem 3 expresses a condition on vertex labels of a bipartite graph in purely graph theoretic terms. This reduces the problem of deciding if a graph has readability at most  to checking the existence of a matching with a specific property.

An efficient algorithm for readability (Section 5).

It is unknown whether computing the readability of a given bipartite graph is NP-hard. In fact, it is not even known whether the decision version of the problem is in NP, as the only upper bound on the readability of a bipartite graph with vertices in each part is  [5]. We make progress on this front by showing that for readability 2, the decision version is polynomial time solvable.

Theorem 4.

There exists an algorithm that, given a bipartite graph , decides in polynomial time whether has readability at most .

Moreover, if the answer is “yes”, the algorithm can also produce an overlap labeling of length at most .

Readability of grids and grid graphs (Section 6).

We give a full characterization of the readability of grids. A (two-dimensional) grid is a graph with vertex set such that there is an edge between two vertices if and only if the -distance between them is . An example is shown in Figure 2. The following theorem fully settles the question of readability of grids.

Figure 2: The grid and toroidal grid .
Theorem 5.

For any two positive integers with , we have

Theorem 5 has an algorithmic implication for the readability of grid graphs, where a grid graph is an induced subgraph of a grid. Several problems are known to be NP-hard on the class of grid graphs, including Hamiltonicity problems [12], various layout problems [8], and others (see, e.g., [7]). We show that unless P = NP, this is not the case for the readability problem.

Corollary 1.

The readability of a given grid graph can be computed in polynomial time.

1.2 Technical Overview

We now give a brief description of our techniques. The key to proving the upper bound on the readability of bipartite chain graphs is understanding the combinatorics of the following process. We start with the sequence . The process consists of a series of rounds, and as a convention, we start at round 3: we write () between and and obtain the sequence . More generally, in round , we insert between all the consecutive pairs of numbers in the current sequence that sum up to . Thus, we obtain in round 4, then in round 5, and so on. The question is to determine the length of the sequence formed in round as a function of . We prove that this length is , where is the famous Euler’s totient function denoting the number of integers in that are coprime to .

To prove our lower bound on the readability of bipartite chain graphs, we define a special sequence of subgraphs of the bipartite chain graph such that the number of graphs in the sequence is a lower bound on the readability. The sequence that we define has the additional property that if two vertices in the same part have the same set of neighbors in one of the graphs, then they have the same set of neighbors in all of the preceding graphs in the sequence. If the readability is very small, then we cannot simultaneously cover all the edges incident with two large-degree nodes as well as have their degrees distinct. The only properties of the connected twin-free bipartite chain graph that our proof uses are that it is dense and all vertices in the same part have distinct degrees. Hence, this technique is more broadly applicable to any graph class satisfying these properties.

Our characterization of graphs of readability at most , roughly speaking, states that a twin-free bipartite graph has readability at most  if and only if the graph can be decomposed into two subgraphs and such that is a disjoint union of bicliques and is a matching satisfying some additional properties. For , the edges in model overlaps of length exactly . The heart of the proof lies in observing that for each pair of bicliques in the first subgraph, there can be at most one matching edge in the second subgraph that has its left endpoint in the first biclique and the right endpoint in the second biclique.

To derive a polynomial time algorithm for recognizing graphs of readability two, we first reduce the problem to connected twin-free graphs of maximum degree at least three. For such graphs, we show that the constraints from our characterization of graphs of readability at most  can be expressed with a 2SAT formula having variables on edges and modeling the selection of edges forming a matching to form the graph of the decomposition.

In order to determine the readability of grids, we establish upper and lower bounds and in both cases use the fact that readability is monotone under induced subgraphs (that is, the readability of a graph is at least the readability of each of its induced subgraphs). The upper bound is derived by observing that every grid is an induced subgraph of some toroidal grid (see Figure 2) and exploiting the symmetric structure of such toroidal grids to show that their readability is at most . This is the most interesting part of our proof and involves partitioning the edges of a toroidal grid into three sets and coming up with labels of length at most  for each vertex based on the containment of the four edges incident with the vertex in each of these three parts. Our characterization of graphs of readability at most  is a helpful ingredient in proving the lower bound on the readability of grids, where we construct a small subgraph of the grid for which our characterization easily implies that its readability is at least .

2 Preliminaries

For a string , let (respectively, ) denote the prefix (respectively, suffix) of of length . A string overlaps another string if there exists an with such that . If , we say that properly overlaps with . For a positive integer , we denote by the set . Let be a (finite, simple, undirected) graph. If is a connected bipartite graph, then it has a unique bipartition (up to the order of the parts). In this paper, we consider bipartite graphs . If the bipartition is specified, we denote such graphs by . Edges of a bipartite graph are denoted by or by (which implicitly implies that and ). We respect bipartitions when we perform graph operations such as taking an induced subgraph and disjoint union. For example, we say that a bipartite graph is an induced subgraph of a bipartite graph if , , and . The disjoint union of two vertex-disjoint bipartite graphs and is the bipartite graph .

The path on vertices is denoted by . Given two graphs and , graph is said to be -free if no induced subgraph of is isomorphic to . Two vertices in a bipartite graph are called twins if they belong to the same part of the bipartition and have the same neighbors (that is, if ). Given a bipartite graph we can define its twin-free reduction as the graph with vertices being the equivalence classes of the twin relation on (that is, if and only if and are twins in ), and two classes and are adjacent if and only if for some and . For graph theoretic terms not defined here, we refer to [24].

We now state some basic results for later use.

Lemma 1.

Let and be two bipartite graphs.

  1. If is an induced subgraph of , then .

  2. If is the disjoint union of and , then .

  3. The readability of is the same for all bipartitions of .

  4. .

Proof.

If is any overlap labeling for then the restriction of to yields an overlap labeling for . Thus, .

Part implies that and ; thus . On the other hand, let and be optimal labelings of and , over and , respectively. By introducing new characters if necessary, we may assume that . Thus, the combined labeling of over , defined as

for all , is an overlap labeling of , showing that .

By part , the readability of is the maximum readability of a connected component of . Therefore, it is sufficient to prove the lemma for the case when is connected. Every connected graph has a unique bipartition, up to switching the roles of and . Switching the roles of and in a graph does not affect its readability, because an overlap labeling of the new graph can be obtained by reversing all the labels in the overlap labeling of the original graph. Thus, the readability of is not affected by the choice of bipartition of .

It suffices to prove that for a pair of twins and , . By part , we have . Conversely, an optimal overlap labeling of can be extended to an overlap labeling of of the same maximum length as by setting, for all ,

Thus, . ∎

Lemma 1(b) shows that the study of readability reduces to the case of connected bipartite graphs. By Lemma 1(c), the readability of a bipartite graph is well defined even if a bipartition is not given in advance. We state our results without specifying a bipartition in Sections 4-5. Lemma 1(d) further shows that to understand the readability of connected bipartite graphs, it suffices to study the readability of connected twin-free bipartite graphs.

3 Readability of bipartite chain graphs

In this section, we prove an upper (Section 3.1) and a lower (Section 3.2) bound on the readability of twin-free bipartite chain graphs, . Recall that the graph is where , , and .

3.1 Upper bound

To prove Theorem 1, we construct a labeling of length for that satisfies (1) for all , and (2) properly overlaps if and only if . It is easy to see that such an will be a valid overlap labeling of . As the labels on either side of the bipartition are equal, we will just come up with a sequence of strings to be assigned to one of the sides of such that the strings satisfy condition (2) above.

Definition 1.

A sequence of strings is forward-matching if

  • string does not have a proper overlap with itself and

  • string overlaps string if and only if .

Given an integer , we will show how to construct a forward-matching sequence with strings, each of length at most , over an alphabet of size . This will imply an overlap labeling of length for , proving Theorem 1. The following lemma is crucial for this construction.

Lemma 2.

For all integers and all , if is forward-matching, so is .

Proof.

For the purposes of notation, let be an arbitrary string from (if it exists), let , , and let be an arbitrary string from (if it exists). The reader can easily verify that and overlap with the new string , and overlaps with and , as desired. What remains to show is that there are no undesired overlaps. Suppose for the sake of contradiction that overlaps , and let be the length of any such overlap. If only includes characters from , then overlaps ; if it includes characters from (and the entire ) then has a proper overlap with itself (see Figure 2(a)). In either case, we reach a contradiction. So, does not overlap . By a symmetric argument, does not overlap .

(a) does not overlap .
(b) has no proper overlap with itself.
Figure 3: Overlaps in the proof of Lemma 2

Next, suppose for the sake of contradiction that overlaps , and let be the length of any such overlap. If only includes characters from , then overlaps ; if it includes characters from (and the entire ) then overlaps . In either case, we reach a contradiction. So, does not overlap . By a symmetric argument, does not overlap .

Finally, suppose for the sake of contradiction that has a proper overlap with itself, and let be the length of any such overlap. Since does not overlap , it follows that must include characters from and the entire . But then has a proper overlap with , a contradiction (see Figure 2(b)). So, does not have a proper overlap with itself, completing the proof. ∎

Now, we show how to construct a forward-matching sequence . For the base case, we let . It can be easily verified that is forward-matching. Inductively, let for denote the sequence obtained from by applying the operation in Lemma 2 to all indices such that is of length , that is, add all obtainable strings of length . Let , for all integers , be the sequence of lengths of strings in . We can obtain directly from by performing the following operation: for each consecutive pair of numbers in , if then insert between and . Note that there is a mirror symmetry to the sequences with respect to the middle element, 1. The right sides of the first 6 sequences starting from the middle element, are as follows:

It turns out that , and, by extension, , is closely related to the totient summatory function [22], also called the partial sums of Euler’s totient function. This is the function where is the number of integers in that are coprime to . The asymptotic behavior of is well known:  [10, p. 268]. The following lemma therefore implies , completing the proof of Theorem 1.

Lemma 3.

For all integers , the length of the sequence is .

Proof.

For the base case, observe that . In general, consider the case of .

Definition 2.

Two elements of are called neighbors in if they appear in two consecutive positions in .

We will show that any two neighbors are coprime (Claim 1) and any pair of coprime positive integers that sum up to appears exactly once as a pair of ordered neighbors in (Claim 2). Together, these claims show that the neighbor pairs in that sum up to are exactly the pairs of coprime positive integers that sum up to .

Fact 1.

If and are coprime then each of them is coprime with and with

By this fact, there is a bijection between pairs of coprime positive integers that sum up to and integers that are coprime to . Hence, the number of neighbor pairs in that sum up to is . Therefore, contains occurrences of . By induction, it follows that , proving the Lemma. ∎

We now prove the necessary claims.

Claim 1.

For all , if two numbers are neighbors in , they are coprime.

Proof.

We prove the claim by induction. For the base case of , the claim follows from the fact that 1 and 2 are coprime. For the general case of , recall that was obtained from by inserting an element between all neighbors and in that summed to . By the induction hypothesis, , and, hence, by Fact 1, and . Therefore, any two neighbors in must be coprime. ∎

Claim 2.

For all

, every ordered pair

of coprime positive integers that sum to occurs exactly once as neighbors in .

Proof.

We prove the claim by strong induction. The reader can verify the base case (when ). For the inductive step, suppose the claim holds for all for some . Consider an ordered pair of coprime positive integers that sum to . Assume that ; we know that , and the case of is symmetric. Since , we have that . In the recursive construction of the sequences , the elements are added to the sequence when is created from . Since , all the elements are already present in . By Fact 1, since , we get that . By the inductive hypothesis, pair appears exactly once as an ordered pair of neighbors in . Consequently, must appear exactly once as an ordered pair of neighbors in . No new elements are added to the sequence in later stages, when . Also, no new elements are inserted between and when . Therefore, the ordered neighbor pair appears exactly once in . ∎

3.2 Lower bound

In this section, we prove Theorem 2, namely that the readability of is . First, we will need the notion of a HUB decomposition from [6]. Given and a function , we define , for , as the graph with the same vertex set as and edges given by . Observe that the edge sets of form a partition of . We say that is a hierarchical-union-of-bicliques decomposition, abbreviated as HUB decomposition, if the following conditions hold: i) for all , is a disjoint union of bicliques, and ii) if two distinct vertices and are non-isolated twins in for some then, for all , and are (possibly isolated) twins in . The parameter is called the size of the decomposition . Now, consider a HUB decomposition of of size .

Lemma 4.

For each , graph has maximum degree at most .

Proof.

We prove the lemma by strong induction on . The base case is when . Observe that if has non-isolated twins, then those must be twins in for each , and, as a result, in . Since has no twins, has no non-isolated twins. By the first property of the HUB decomposition, must have maximum degree at most 1.

For general , let denote the graph . By the inductive hypothesis, has maximum degree at most . Consider a group of vertices in the same part of that have the same degree in the graph . Since no two vertices in the same part of have the same degree, no two vertices in have the same degree in . Combining this with the fact that the degree of any vertex in is at most , we infer that .

By the second property of the HUB decomposition, if two vertices are non-isolated twins in , they are twins in . Consequently, each group of twins in has size at most . By the first property of the HUB decomposition, is a disjoint union of bicliques. It follows that each of these bicliques is a subgraph of the complete bipartite graph implying the required bound on the maximum degree. ∎

Proof of Theorem 2.

By Lemma 4, graph has at most  edges. Since the edge sets of form a partition of the edge set of , the number of edges in is We get that . It was shown in [6] that the readability of every bipartite graph is bounded from below by the minimum size of a HUB decomposition of . This completes the proof. ∎

4 A characterization of graphs with readability at most 2

In this section, we characterize bipartite graphs with readability at most  by proving Theorem 3. Due to Lemma 1, it is enough to obtain such a characterization for connected twin-free bipartite graphs. We use this characterization in Section 5 to develop a polynomial time algorithm for recognizing graphs of readability at most  and also in Section 6 to prove a lower bound on the readability of general grids. Recall that a domino is the graph obtained from by adding an edge between two vertices at distance . We first define the notion of a feasible matching, which is implicitly used in the statement of Theorem 3.

Definition 3.

A matching in a bipartite graph is feasible if the following conditions are satisfied:

  1. The graph is a disjoint union of bicliques (equivalently: -free).

  2. For , if is a , then is the disjoint union of three edges.

  3. For , if is a domino, then is the disjoint union of a and an edge.

We prove Theorem 3 by showing that a bipartite graph has readability at most  iff has a feasible matching.

Proof of Theorem 3.

We show that if and only if has a feasible matching.

Necessity. Suppose that is a twin-free bipartite graph of readability at most . Let be an overlap labeling of of length at most . Since is an overlap labeling of , we can partition the edge set of into two sets, and , by setting and . Then for all , we have , that is, . Note that due to the definition of the overlap function, for every edge , the labels of and must not have an overlap of length one.

We claim that is a feasible matching. If is not a matching, we can assume by symmetry that there exists a vertex and a pair of distinct vertices in such that . But then , which implies that and are twins in , a contradiction. Thus, is a matching.

Let denote the graph . Next, we show that is -free. If forms an induced in (with edge set ), then , implying that , a contradiction. Therefore, is -free.

Now let us verify the remaining two properties in the definition of a feasible matching. Let be a subset of vertices in . If is isomorphic to , we would like to show that is a union of three disjoint edges. Suppose for the sake of contradiction that it is not. Consider an edge labeling of as in Figure 4. Since is a matching, the only other way for to be -free, i.e., if it was not a union of three disjoint edges, is for to contain two diametrically opposite edges of , say and . Let for all (addition modulo ). Let, without loss of generality, .Then . Since by our assumption, we have , say . We have and . Since , we get . Therefore , which is a contradiction, since and is an overlap labeling of .

Figure 4: The , the domino and the fork

Finally, suppose that is isomorphic to the domino, and assume an edge labeling as in Figure 4. Since is -free, is also -free and hence can only be isomorphic to either (1) a disjoint union of a and an edge (which is what we want to show), or (2) a disjoint union of two ’s. Suppose we are in case (2). Then we have . Let for all (addition modulo ). We may assume without loss of generality that . Since and , we can follow the same reasoning as above, and conclude that the labels of and are equal, which is a contradiction, since and is an overlap labeling of . This establishes the necessity of the condition.

Sufficiency. Suppose now that is a twin-free bipartite graph with a feasible matching . We will show that has readability at most  by constructing an overlap labeling of of length at most . Since is a feasible matching, the graph is -free, that is, a disjoint union of bicliques. Let be the bipartitions of the vertex sets of the connected components (bicliques) of (so that for all ; some of the ’s or ’s may be empty). Then . Assign a partial labeling over to vertices of by setting if and only if . For each edge , extend the labels of and as follows. Let and . Then because edges of bicliques in cannot be in . Replace with , and with . Since is a matching, every vertex will have a label of length or at the end of this procedure. Extend the labels of length by unique new characters to make them of length . By construction, the overlaps of the obtained labeling create all edges of .

Let us verify that no new edges were created by . Suppose that is a pair of vertices with with and and . If and have an overlap of length , then by construction. Suppose that and do not have an overlap of length but have an overlap of length . Then for two distinct . By construction, vertex is adjacent to a unique vertex via a matching edge in , moreover and . If , then the edge is in and hence in . So we may assume that . Similarly, vertex is adjacent to a unique vertex in , and and . If , then again the edge is in and hence in . So we may assume that . Since , there exists a vertex . Similarly, since , there exists a vertex . Notice that since is of degree in , and since and belong to distinct connected components of . Therefore, , and, similarly, . But now, the subset induces a subgraph of isomorphic to either a (if ) or a domino (otherwise). In either case, one of the conditions for the and for the domino in Definition 3 is violated, contrary to the fact that is a feasible matching.

This shows that is an overlap labeling of and implies that the readability of is at most . ∎

Corollary 2.

Every bipartite graph of maximum degree at most  has readability at most .

Proof.

If is a connected twin-free bipartite graph of maximum degree at most , then is a path or an (even) cycle. In this case, the edge set of can be decomposed into two matchings and . Both and are feasible matchings. Thus, by Theorem 3, has readability at most . ∎

5 An efficient algorithm for readability

In this section, we prove Theorem 4 by developing a polynomial time algorithm for the following problem.

Readability Instance: A bipartite graph . Question: Is ?

First, we use Lemma 1 and Corollary 2 to reduce the problem to connected twin-free bipartite graphs of maximum degree at least 3. We then apply Theorem 3 and reduce the problem to checking for the existence of a feasible matching (Definition 3). Finally, we show how to reduce this problem to the 2SAT problem (Lemma 5), which is well known to be solvable in linear time (see, e.g., [1]).

Proof of Theorem 4..

Given a bipartite graph , we first reduce the problem to its connected components. That is, if is not connected, then, by Lemma 1(b), if and only if all components of satisfy . Second, assuming is connected, we compute the twin-free reduction of , which, by Lemma 1(d), does not change the readability. We test whether is of maximum degree at most . If this is the case, then, by Corollary 2, we assert that has readability at most .

Consider a connected twin-free bipartite graph of maximum degree at least . Let denote the set of all edges in such that either (1) has a vertex of degree at least , or (2) is contained in some induced . The definition of and the fact that is connected and of maximum degree at least imply that if an induced subgraph of is isomorphic to a , a fork, a , or a domino (see Figure 4), then .

Let be a set of variables. We now define a 2SAT formula over such that has a feasible matching (and hence, readability at most ) if and only if is satisfiable. The formula contains the following five types of clauses.

  1. For each pair of distinct edges that share an endpoint, add the clause to .

  2. For each induced subgraph of isomorphic to and each matching in , add the clauses and (equivalent to ) to .

  3. For each induced subgraph of isomorphic to , with edges labeled as in Figure 4, add the clause , the clauses corresponding to and , and the clauses corresponding to and to .

  4. For each induced subgraph of isomorphic to the domino, with edges labeled as in Figure 4, add the clauses and to .

  5. For each induced subgraph of isomorphic to the fork, with edges labeled as in Figure 4, add the clause to .

The following lemma shows that if is satisfiable, then , otherwise, .

Lemma 5.

Graph has a feasible matching if and only if formula is satisfiable.

Proof.

Suppose first that has a feasible matching, say . Let be an assignment of Boolean values to the variables in such that for every , variable is true if and only if . We will prove that is a satisfying assignment for . It is easy to see that clauses of type (1) in are satisfied as is a matching.

Consider a pair of clauses and of type (2) in . These correspond to an induced subgraph of isomorphic to a and a matching in . Since is a feasible matching, the graph is -free, and so we have if and only if . Hence satisfies both the clauses.

Clauses in of type (3) deal with induced -cycles and those of type (4) deal with induced dominos. Both types of clauses are satisfied by due to the fact that , which is a feasible matching, satisfies conditions 2 and 3 in Definition 3.

Finally, clauses in of type (5) are satisfied only if for each induced subgraph of isomorphic to the fork (with edges labeled as in Figure 4), we have . Suppose for the sake of contradiction that there exists an induced fork for which this is not the case. Since is -free, so is and hence and are both in , which is a contradiction. This shows that formula is satisfiable.

For the converse direction, suppose that formula is satisfiable and let be a satisfying assignment. Let be the set of edges such that is set to true in . Extend greedily to a set of edges by setting and then iteratively adding the middle edge of any induced subgraph of isomorphic to that contains no edge of . We claim that the so obtained set is a feasible matching of . This will be easy to show once we prove the following claim.

Claim 3.

is a matching in with .

Proof.

The claim is true if , since is a matching by virtue of type (1) clauses. Henceforth, assume that . We will first show that . For this, it is enough to prove that for each .

Consider an edge . By our construction of , the edge is the middle edge of an induced subgraph of isomorphic to such that contains no other edge of . In particular, has no edge of . Let and be the endpoints of and let and be the remaining two vertices in such that and are the other two edges in . Assume for the sake of contradiction that . Then, either (a) has a vertex of degree at least , or (b) is contained in some induced .

Suppose first that (b) holds. Then, by virtue of the type (3) clauses, either or both and are in . Both cases contradict our premise that contains no edge of .

Suppose now that (a) holds. Assume that the degree of is at least . Let be a neighbor of such that . We will show that the set induces a fork in . Since is a bipartite graph, it has no ’s and hence . If , then the set induces a . Since is of degree at least , we have and hence, by virtue of clauses of type (2), either and are in , or both and are in . Both of these contradict our premise that contains no edge of . Therefore, the set induces a fork in , and by virtue of its associated type (5) clause, either or