Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation

06/21/2019 ∙ by Marcelo Arenas, et al. ∙ Carnegie Mellon University Pontificia Universidad Católica de Chile 0

In this work, we study two simple yet general complexity classes, based on logspace Turing machines, which provide a unifying framework for efficient query evaluation in areas like information extraction and graph databases, among others. We investigate the complexity of three fundamental algorithmic problems for these classes: enumeration, counting and uniform generation of solutions, and show that they have several desirable properties in this respect. Both complexity classes are defined in terms of non-deterministic logspace transducers (NL transducers). For the first class, we consider the case of unambiguous NL transducers, and we prove constant delay enumeration, and both counting and uniform generation of solutions in polynomial time. For the second class, we consider unrestricted NL transducers, and we obtain polynomial delay enumeration, approximate counting in polynomial time, and polynomial-time randomized algorithms for uniform generation. More specifically, we show that each problem in this second class admits a fully polynomial-time randomized approximation scheme (FPRAS) and a polynomial-time Las Vegas algorithm for uniform generation. Interestingly, the key idea to prove these results is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length n (given in unary) accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem, and obtain as a welcome corollary that every function in SpanL admits an FPRAS.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Arguably, query answering is the most fundamental problem in databases. In this respect, developing efficient query answering algorithms, as well as understanding when this cannot be done, is of paramount importance in the area. In the most classical view of this problem, one is interested in computing all the answers, or solutions, to a query. However, as the quantity of data becomes enormously large, the number of answers to a query could also be enormous, so computing the complete set of solutions can be prohibitively expensive. In order to overcome this limitation, the idea of enumerating the answers to a query with a small delay has been recently studied in the database area [Seg13]. More specifically, the idea is to divide the computation of the answers to a query into two phases. In a preprocessing phase, some data structures are constructed to accelerate the process of computing answers. Then in an enumeration phase, the answers are enumerated with a small delay between them. In particular, in the case of constant delay enumeration algorithms, the preprocessing phase should take polynomial time, while the time between consecutive answers should be constant.

Constant delay enumeration algorithms allow users to retrieve a fixed number of answers very efficiently, which can give them a lot of information about the solutions to a query. In fact, the same holds if users need a linear or a polynomial number of answers. However, because of the data structures used in the preprocessing phase, these algorithms usually return answers that are very similar to each other [BDG07, Seg13, FRU18]; for example, tuples with elements where only the first few coordinates are changed in the first answers that are returned. In this respect, other approaches can be used to return some solutions efficiently but improving the variety. Most notably, the possibility of generating an answer uniformly, at random, is a desirable condition if it can be done efficiently. Notice that returning varied solutions has been identified as a important property not only in databases, but also for algorithms that retrieve information in a broader sense [AMSW16].

Efficient algorithms for either enumerating or uniformly generating the answers to a query are powerful tools to help in the process of understanding the answers to a query. But how can we know how long these algorithms should run, and how complete the set of computed answers is? A third tool that is needed then is an efficient algorithm for computing, or estimating, the number of solutions to a query. Then, taken together, enumeration, counting and uniform generation techniques form a powerful attacking trident when confronting to the problem of answering a query.

In this paper, we follow a more principled approach to study the problems of enumerating, counting and uniformly generating the answers to a query. More specifically, we begin by following the guidance of [JVV86], which urges the use of relations to formalize the notion of solution to a given input of a problem (for instance, to formalize the notion of answer to an input query over an input database). While there are many ways of formalizing this notion, most such formalizations only make sense for a specific kind of queries, e.g. a subset of the integers is well-suited as the solution set for counting problems, but not for sampling problems. Thus, if denotes a finite alphabet, then by following [JVV86], we represent a problem as a relation , and we say that is a solution for an input if . Note that the problem of enumerating the solutions to a given input corresponds to the problem of enumerating the elements of the set , while the counting and uniform generation problems correspond to the problems of computing the cardinality of and uniformly generating, at random, a string in this set, respectively.

Second, we study two simple yet general complexity classes for relations, based on non-deterministic logspace transducers (NL transducers), which provide a unifying framework for studying enumeration, counting and uniform generation. More specifically, given a finite alphabet , an NL-transducer is a nondeterministic Turing Machine with input and output alphabet , a read-only input tape, a write-only output tape and a work-tape of which, on input , only the first cells can be used. Moreover, a string is said to be an output of on input , if there exists a run of on input that halts in an accepting state with as the string in the output tape. Finally, assuming that all outputs of on input are denoted by , a relation of is said to be accepted by if for every input , it holds that .

The first complexity class of relations studied in this paper consists of the relations accepted by unambiguous NL-transducers. More precisely, an NL-transducer is said to be unambiguous if for every input and , there exists exactly one run of on input that halts in an accepting state with as the string in the output tape. For this class, we are able to achieve constant delay enumeration, and both counting and uniform generation of solutions in polynomial time. For the second class, we consider (unrestricted) NL-transducers, and we obtain polynomial delay enumeration, approximate counting in polynomial time, and polynomial-time randomized algorithms for uniform generation. More specifically, we show that each problem in this second class admits a fully polynomial-time randomized approximation scheme (FPRAS) [JVV86] and a polynomial-time Las Vegas algorithm for uniform generation. It is important to mention that the key idea to prove these results is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length (given in unary) accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete [ÁJ93], it was open whether it admits an FPRAS, and only quasi-polynomial time randomized approximation schemes (QPRAS) were known for it [KSM95, GJK97]. In this work, we solve this open problem, and obtain as a welcome corollary that every function in SpanL admits an FPRAS. Thus, to the best of our knowledge, we obtain the first complexity class with a simple and robust definition based on Turing Machines, that contains #P-complete problems and where each problem admits an FPRAS.

Organization of the paper. The main terminology used in the paper is given in Section 2. In Section 3, we define the two classes studied in this paper and state our main results. In Section 4, we show how these classes can be used to obtain positive results on query evaluation in information extraction, graph databases, and binary decision diagrams. The complete proofs of our results are presented in Sections 5 and 6, and Appendix A. In particular, we explain the algorithmic techniques used to obtain an FPRAS for the #NFA problem in Section 6, where we also provide a detailed proof of this result. Finally, some concluding remarks are given in Section 7.

2 Preliminaries

2.1 Relations and problems

Let be a finite alphabet with at least two symbols. As usual, we represent inputs as words and the length of is denoted by . A problem is represented as a relation . For every pair , we interpret as being the encoding of an input to some problem, and as being the encoding of a solution or witness to that input. For each , we define the set , and call it the witness set for . Also, if , we call a witness or a solution to .

This is a very general framework, so mostly we work with relations that meet two additional properties. First, we only work with relations where both the input and the witnesses have a finite encoding. Second, we work with -relations [JVV86], namely, satisfies that (1) there exists a polynomial such that implies that and (2) there exists a deterministic Turing Machine that receives as input , runs in polynomial time and accepts if, and only if, . Without loss of generality, from now on we assume that for a -relation , there exists a polynomial such that for every

. This is not a strong requirement, since all witnesses can be made to have the same length through padding.

2.2 Enumeration, counting and uniform generation

Given a -relation , we are interested in the following problems:

Problem: Input: A word Output: Enumerate all without repetitions

Problem: Input: A word Output: The size

Problem: Input: A word Output: Generate uniformly, at random, a word in

Given that for every , we have that is finite and these three problems are well defined. Notice that in the case of , we do not assume a specific order on words, so that the elementos of can be enumerated in any order (but without repetitions). Moreover, in the case of , we assume that is encoded in binary and, therefore, the size of the output is logarithmic in the size of . Finally, in the case of , we generate a word

with probability

if the set is not empty; otherwise, we return a special symbol to indicate that .

2.3 Enumeration with polynomial and constant delay

An enumeration algorithm for is a procedure that receives an input and, during the computation, it outputs each word in , one by one and without repetitions. The time between two consecutive outputs is called the delay of the enumeration. In this paper, we consider two restrictions on the delay: polynomial-delay and constant-delay. Polynomial-delay enumeration is the standard notion of polynomial time efficiency in enumeration algorithms [JYP88] and is defined as follows. An enumeration algorithm is of polynomial delay if there exists a polynomial such that for every input , the time between the beginning of the algorithm and the initial output, between any two consecutive outputs, and between the last output and the end of the algorithm, is bounded by .

Constant-delay enumeration is another notion of efficiency for enumeration algorithms that has attracted a lot attention in the last years [Bag06, Cou09, Seg13]. This notion has stronger guarantees compared to polynomial delay: the enumeration is done in a second phase after the processing of the input and taking constant-time between to consecutive outputs in a very precise sense. Several notions of constant-delay enumeration has been given, most of them in database theory where it is important to divide the analysis between query and data. In this paper, we want a definition of constant-delay that is agnostic of the distinction between query and data (i.e. combined complexity) and, for this reason, we use a more general notion of constant-delay enumeration than the one in [Bag06, Cou09, Seg13].

As it is standard in the literature [Seg13], for the notion of constant-delay enumeration we consider enumeration algorithms on Random Access Machines (RAM) with addition and uniform cost measure [AH74]. Given a relation , an enumeration algorithm  for has constant-delay if runs in two phases over the input .

  1. The first phase (precomputation), which does not produce output.

  2. The second phase (enumeration), which occurs immediately after the precomputation phase, where all words in are enumerated without repetitions and satisfying the following conditions, for a fixed constant :

    1. the time it takes to generate the first output is bounded by ;

    2. the time between two consecutive outputs and is bounded by and does not depend on ; and

    3. the time between the final element that is returned and the end of the enumeration phase is bounded by ,

We say that is a constant delay algorithm for with precomputation phase , if has constant delay and the precomputation phase takes time . Moreover, we say that can be solved with constant delay if there exists a constant delay algorithm for with precomputation phase for some polynomial .

Our notion of constant-delay algorithm differ from the definitions in [Seg13] in two aspects. First, as it was previously mention, we relax the distinction between query and data in the preprocessing phase, allowing our algorithm to take polynomial time in the input (i.e. combined complexity). Second, our definition of constant-delay is what in [Cou09, Bag06] is called linear delay in the size of the output, namely, writing the next output is linear in its size and not depending on the size of the input. This is a natural assumption, since each output must at least be written down to return it to the user. Notice that, given an input and an output , the notion of polynomial-delay above means polynomial in and, instead, the notion of linear delay from [Cou09, Bag06] means linear in , i.e., constant in the size of . Thus, we have decided to call the two-phase enumeration from above “constant-delay”, as it does not depend on the size of the input , and the delay is just what is needed to write the output (which is the minimum requirement for such an enumeration algorithm).

2.4 Approximate counting and Las Vegas uniform generation

Given a relation , the problem can be solved efficiently if there exists a polynomial-time algorithm that, given , computes . In other words, if we think of as a function that maps to the value , then can be computed efficiently if , the class of functions that can be computed in polynomial time. As such a condition does not hold for many fundamental problems, we also consider the possibility of efficiently approximating the value of the function . More precisely, is said to admit a fully polynomial-time randomized approximation scheme (FPRAS) [JVV86] if there exists a randomized algorithm and a polynomial such that for every and , it holds that:

and the number of steps needed to compute is at most . Thus, approximates the value with a relative error of , and it can be computed in polynomial time in the size of and the value .

The problem can be solved efficiently if there exists a polynomial-time randomized algorithm that, given , generates an element of

with uniform probability distribution (if

, then it returns ). However, as in the case of , the existence of such a generator is not guaranteed for many fundamental problems, so we also consider a relaxed notion of generation that has a probability of failing in returning a solution. More precisely, is said to admit a polynomial-time Las Vegas uniform generator (PLVUG) if there exists a randomized algorithm , a polynomial and a function such that for every :

  1. ;

  2. if , then ;

  3. for every :

    1. if , then ;

    2. if , then ;

  4. the number of steps needed to compute is at most .

The invocation can fail in generating an element of , in which case it returns fail. By condition (1), we know that this probability of failing is smaller than , so that by invoking several times we can make this probability arbitrarily small (for example, the probability that returns fail in 100 consecutive independent invocations is at most ). Assume that the invocation does not fail. If , then we have by condition 3 (a) that , so the randomized algorithm indicates that there is no witness for in this case. If , then we have by conditions (2) and (3) that returns an element . Moreover, we know by condition 3 (b) that the probability of returning such an element is . Thus, we have a uniform generator in this case, as the probability of returning each element is the same. Finally, we have that can be computed in polynomial time in the size of .

It is important to notice that the notion of polynomial-time Las Vegas uniform generator corresponds to the notion of uniform generator used in [JVV86]. However, we have decided to use the term “Las Vegas” to emphasize the fact that there is a probability of failing in returning a solution. Moreover, the notion of polynomial-time Las Vegas uniform generator imposes stronger requirements than the notion of fully polynomial-time almost uniform generator introduced in [JVV86]. In particular, the latter not only has a probability of failing, but also considers the possibility of generating a solution with a probability distribution that is almost uniform, that is, an algorithm that generates an string with a probability in an interval for a given error , where is defined as in the notion of PLVUG.

3 NLOGSPACE transducers: definitions and our main results

The goal of this section is to provide simple yet general definitions of classes of relations with good properties in terms of enumeration, counting and uniform generation. More precisely, we are first aiming at providing a class of relations that has a simple definition in terms of Turing Machines and such that for every relation , it holds that can be solved with constant delay, and both and can be solved in polynomial time. Moreover, as it is well known that such good conditions cannot always be achieved, we are then aiming at extending the definition of to obtain a simple class, also defined in terms of Turing Machines and with good approximation properties. It is important to mention that we are not looking for an exact characterization in terms of Turing Machines of the class of relations that admit constant delay enumeration algorithms, as this may result in an overly complicated model. Instead, we are looking for simple yet general classes of relations with good properties in terms of enumeration, counting and uniform generation, and which can serve as a starting point for the systematic study of these three fundamental properties.

A key notion that is used in our definitions of classes of relations is that of transducer. Given a finite alphabet , an NL-transducer is a nondeterministic Turing Machine with input and output alphabet , a read-only input tape, a write-only output tape where the head is always moved to the right once a symbol is written in it (so that the output cannot be read by ), and a work-tape of which, on input , only the first cells can be used, where . A string is said to be an output of on input , if there exists a run of on input that halts in an accepting state with as the string in the output tape. The set of all outputs of on input is denoted by (notice that can be empty). Finally, the relation accepted by , denoted by , is defined as .

Definition 1.

A relation is in RelationNL if, and only if, there exists an NL-transducer such that .

The class RelationNL should be general enough to contain some natural and well-studied problems. A first such a problem is the satisfiability of a propositional formula in DNF. As a relation, this problem can be represented as follows:

SAT-DNF

Thus, we have that corresponds to the problem of enumerating the truth assignments satisfying a propositional formula in DNF, while and correspond to the problems of counting and uniformly generating such truth assignments, respectively. It is not difficult to see that . In fact, assume that we are given a propositional formula of the form , where each is a conjunction of literals, that is, a conjunction of propositional variables and negation of propositional variables. Moreover, assume that each propositional variable in is of the form , where is a binary number, and that , , are the variables occurring in . Notice that with such a representation, we have that is a string over the alphabet . We define as follows an NL-transducer such that is the set of truth assignments satisfying . On input , the NL-transducer non-deterministically chooses a disjunct , which is represented by two indexes indicating the starting and ending symbols of in the string . Then it checks whether is satisfiable, that is, whether does not contain complementary literals. Notice that this can be done in logarithmic space by checking for every , whether and are both literals in . If is not satisfiable, then halts in a non-accepting state. Otherwise, returns a satisfying truth assignment of as follows. A truth assignment for is represented by a string of length over the alphabet , where the -th symbol of this string is the truth value assigned to variable . Then for every , if is a conjunct in , then write the symbol 1 in the output tape, and if is a conjunct in , then write the symbol 0 in the output tape. Finally if neither nor is a conjunct in , then non-deterministically chooses a symbol , and it writes in the output tape.

Given that is a #P-complete problem, we cannot expect to be solvable in polynomial time for every . However, admits an FPRAS [KL83], so we can still hope for to admit an FPRAS for every . It turns out that proving such a result involves providing an FPRAS for another natural and fundamental problem: #NFA. More specifically, #NFA is the problem of counting the number of words of length accepted by a non-deterministic finite automaton without epsilon transitions (NFA), where is given in unary (that is, is given as a string ). It is known that #NFA is #P-complete [ÁJ93], but it is open whether it admits an FPRAS; in fact, the best randomized approximation scheme known for #NFA runs in time  [KSM95]. In our notation, this problem is represented by the following relation:

MEM-NFA

that is, we have that . It is easy to see that . Hence, we give a positive answer to the open question of whether #NFA admits an FPRAS by proving the following general result about RelationNL.

Theorem 2.

If , then can be solved with polynomial delay, admits an FPRAS, and admits a PLVUG.

It is worth mentioning a fundamental consequence of this result in computational complexity. The class of function SpanL was introduced in [ÁJ93] to provide a characterization of some functions that are hard to compute. More specifically, given a finite alphabet , a function is in SpanL if there exists an NL-transducer with input alphabet such that for every . The class SpanL is contained in #P, and it has been instrumental in proving that some functions are difficult to compute [ÁJ93, HV95, ACP12, LM13], as if a function is complete for SpanL and , then  [ÁJ93]. Given that #NFA is SpanL-complete under parsimonious reductions [ÁJ93], and parsimonious reductions preserve the existence of an FPRAS, we obtain the following corollary from Theorem 2.

Corollary 3.

Every function in SpanL admits an FPRAS.

Although some classes containing #P-complete functions and for which every admits an FPRAS have been identified before [SST95, AMR17], to the best of our knowledge this is the first such a class with a simple and robust definition based on Turing Machines.

A tight relationship between the existence of an FPRAS and the existence of a schema for almost uniform generation was proved in [JVV86], for the class of relations that are self-reducible. Thus, one might wonder whether the existence of a PLVUG for in Theorem 2 is just a corollary of our FPRAS for along with the result in [JVV86]. Interestingly, the answer to this question is no, as the notion of PLVUG ask for a uniform generator without any distributional error , whose existence cannot be inferred from the results in [JVV86]. Thus, we prove in Section 6 that admits an FPRAS and admits a PLVUG, for a relation , without utilizing the aforementioned result from [JVV86].

A natural question at this point is whether a simple syntactic restriction on the definition of RelationNL gives rise to a class of relations with better properties in terms of enumeration, counting and uniform generation. Fortunately, the answer to this question comes by imposing a natural and well-studied restriction on Turing Machines, which allows us to define a class that contains many natural problems. More precisely, we consider the notion of UL-transducer, where the letter “U” stands for “unambiguous”. Formally, is an UL-transducer if is an NL-transducer such that for every input and , there exists exactly one run of on input that halts in an accepting state with as the string in the output tape. Notice that this notion of transducer is based on well-known classes of decision problems (e.g. UP [Val76] and UL [RA00]) adapted to our case, namely, adapted to problems defined as relations.

Definition 4.

A relation is in RelationUL if, and only if, there exists an UL-transducer such that .

For the class RelationUL, we obtain the following result.

Theorem 5.

If , then can be solved with constant delay, there exists a polynomial-time algorithm for , and there exists a polynomial-time randomized algorithm for .

In particular, it should be noticed that given and an input , the solutions for can be enumerated, counted and uniformly generated efficiently.

Classes of problems definable by machine models and that can be enumerated with constant delay have been proposed before. In [ABJM17], it is shown that if a problem is definable by a d-DNNF circuit, then the solutions of an instance can be listed with linear preprocessing and constant delay enumeration. Still, to the best of our knowledge, this is the first such a class with a simple and robust definition based on Turing Machines.

4 Applications of the Main Results

Before providing the proofs of Theorems 2 and 5, we give some implications of these results. In particular, we show how NL and UL transducers can be used to obtain positive results on query evaluation in areas like information extraction, graph databases, and binary decision diagrams.

4.1 Information extraction

In [FKRV15], the framework of document spanners was proposed as a formalization of ruled-based information extraction. In this framework, the main data objects are documents and spans. Formally, given a finite alphabet , a document is a string and a span is pair with . A span represents a continuous region of the document , whose content is the substring of from positions to . Given a finite set of variables X, a mapping is a function from X to the spans of .

Variable set automata (VA) are one of the main formalisms to specify sets of mappings over a document. Here, we use the notion of extended VA (eVA) from [FRU18] to state our main results. We only recall the main definitions, and we refer the reader to [FRU18, FKRV15] for more intuition and further details. An eVA is a tuple such that is a finite set of states, is the initial state, and is the final set of states. Further, is the transition relation consisting of letter transitions , or variable-set transitions , where and . The symbols and are called markers, and they are used to denote that variable is open or close by , respectively. A run over a document is a sequence of the form: where each is a (possible empty) set of markers, , and , whenever , and otherwise (that is, when ). We say that a run is valid if for every there exists exactly one pair such that and . A valid run naturally defines a mapping that maps to the only span such that and . We say that is accepting if . Finally, the semantics of over is defined as the set of all mappings where is a valid and accepting run of over .

In [Fre17, MRV18], it was shown that the decision problem related to query evaluation, namely, given an eVA and a document deciding whether , is NP-hard. For this reason, in [FRU18] a subclass of eVA is considered in order to recover polynomial-time evaluation. An eVA is called functional if every accepting run is valid. Intuitively, a functional eVA does not need to check validity of the run given that it is already known that every run that reaches a final state will be valid.

For the query evaluation problem of functional eVA (i.e. to compute ), one can naturally associate the following relation:

It is not difficult to show that EVAL-eVA is in RelationNL. Hence, by Theorem 2 we get the following results.

Corollary 6.

can be enumerated with polynomial delay, admits an FPRAS, and admits a PLVUG.

In [FRU18], it was shown that every functional RGX or functional VA (not necessarily extended) can be converted in polynomial time into an functional eVA. Therefore, Corollary 6 also holds for these more general classes. Notice that in [FKP18], it was given a polynomial-delay enumeration algorithm for . Thus, only the results about and are new.

Regarding efficient enumeration and exact counting, a constant-delay algorithm with polynomial preprocessing was given in in [FRU18] for the class of deterministic functional eVA. Here, we can easily extend these results for a more general class, that we called unambiguous functional eVA. Formally, we say that an eVA is unambiguous if for every two valid and accepting runs and , it holds that . In other words, each output of an unambiguous eVA is witness by exactly one run. As in the case of EVAL-eVA, we can define the relation EVAL-UeVA, by restricting the input to unambiguous functional eVA. By using UL-transducers and Theorem 5, we can then extend the results in [FRU18] for the unambiguous case.

Corollary 7.

can be solved with constant delay, there exists a polynomial-time algorithm for , and there exists a polynomial-time randomized algorithm for .

Notice that this result gives a constant-delay algorithm with polynomial preprocessing for the class of unambiguous functional eVA. Instead, the algorithm in [FRU18] has linear preprocessing over documents, restricted to the case of deterministic eVA. This leaves open whether there exists a constant-delay algorithm with linear preprocessing over documents for the unambiguous case.

4.2 Query evaluation in graph databases

Enumerating, counting, and generating paths are relevant tasks for query evaluation in graph databases [AAB17]. Given a finite set of labels, a graph database is a pair where is a finite set of vertices and is a finite set of labeled edges. Here, nodes represent pieces of data and edges specify relations between them [AAB17]. One of the core query languages for posing queries on graph databases are regular path queries (RPQ). An RPQ is a triple where are variables and is a regular expression over . As usual, we denote by all the strings over that conform to . Given an RPQ , a graph database , and nodes , one would like to retrieve, count, or uniformly generate all paths111Notice that the standard semantics for RPQs is to retrieve pair of nodes. Here we consider a less standard semantics based on paths which is also relevant for graph databases [ACP12, LM13, AAB17]. in going from to that satisfies . Formally, a path from to in is a sequence of vertices and labels of the form , such that , , and . A path is said to satisfy if the string . The length of is defined as . Clearly, between and there can be an infinite number of paths that satisfies . For this reason, one usually wants to retrieve all paths between and of at most certain length , namely, one usually considers the set of all paths from to in such that satisfies and . This naturally defines the following relation representing the problem of evaluating an RQP over a graph database:

Using this relation, fundamental problems for RPQs such as enumerating, counting, or uniform generating paths can be naturally represented. It is not difficult to show that EVAL-RPQ is in RelationNL, from which the following corollary can be obtained by using Theorem 2.

Corollary 8.

admits an FPRAS, and admits a PLVUG.

It is important to mention that giving a polynomial-delay enumeration algorithm for EVAL-RPQ is straightforward, but the existence of an FPRAS and a PLVUG for EVAL-RPQ was not known before when queries are part of the input (that is, in combined complexity [Var82]).

4.3 Binary decision diagrams

Binary decision diagrams are an abstract representation of boolean functions which are widely used in computer science and have found many applications in areas like formal verification [Bry92]. A binary decision diagram (BDD) is a directed acyclic graph where each node is labeled with a variable and has at most two edges going to children and . Intuitively, and represent the next nodes when takes values and , respectively. contains only two terminal, or sink nodes, labeled by or , and one initial node called . We assume that every path from to a terminal node does not repeat variables. Then given an assignment from the variables in to , we have that naturally defines a path from to a terminal node or . In this way, defines a boolean function that gives a value in to each assignment ; in particular, corresponds to the sink node reached by starting from and following the values in . For Ordered BDDs (OBDDs), we also have a linear order over the variables in such that, for every with a child of , it holds that . Notice that not necessarily all variables appear in a path from the initial node to a terminal node or . Nevertheless, the promise in an OBDD is that variables will appear following the order .

An OBDD defines the set of assignments such that . Then can be considered as a succinct representation of the set , and one would like to enumerate, count and uniformly generate assignments given . This motivates the relation:

Given in EVAL-OBDD, there is exactly one path in that witnesses . Therefore, one can easily show that EVAL-OBDD is in RelationUL, from which we obtain that:

Corollary 9.

can be enumerated with constant delay, there exists a polynomial-time algorithm for , and there exists a polynomial-time randomized algorithm for .

The above results are well known. Nevertheless, they show how easy and direct is to use UL transducers to realize the good algorithmic properties that a data structure like OBDD has.

Some non-deterministic variants of BDDs have been studied in the literature [ACMS18]. In particular, an nOBDD extends an OBDD with vertices without variables (i.e. ) and without labels on its children. Thus, an nOBDD is non-deterministic in the sense that given an assignment , there can be several paths that bring from the initial node to a terminal node with labeled or . Without lost of generality, nOBDDs are assumed to be consistent in the sense that, for each , all paths of in can reach or , but not both.

As in the case of OBDDs, we can define a relation EVAL-nOBDD that pairs an nOBDD with an assignment that evaluate to  (i.e. ). Contrary to OBDDs, an nOBDD looses the single witness property, and now an assignment can have several paths from the initial node to the terminal node. Thus, it is not clear whether EVAL-nOBDD is in RelationUL. Still one can easily show that , from which the following results follow.

Corollary 10.

can be solved with polynomial delay, admits an FPRAS, and admits a PLVUG.

It is important to stress that the existence of an FPRAS and a PLVUG for EVAL-nOBDD was not known before, and one can easily show this by using NL-transducers and then applying Theorem 2.

5 Completeness, Self-reducibility, and their Implications to the Class RelationUL

The goal of this section is to define a simple notion of reduction for the classes RelationNL and RelationUL, and then to show how it can be used to prove Theorem 5. In Section 6, we use this notion again when proving Theorem 2.

A natural question to ask is which notions of “completeness” and “reduction” are appropriate for our framework. Notions of reductions for relations have been proposed before, in particular in the context of search problems [DGP09]. However, we do not intent to discuss them here; instead, we use an idea of completeness that is very restricted, but that turns out to be useful for the classes we defined. Let be a complexity class of relations and , and recall that is defined as the set of witnesses for input , that is, . We say is reducible to if there exists a function , computable in polynomial time, such that for every : . Also, if is reducible to for every , we say is complete for . Notice that this definition is very restricted, since the notion of reduction requires the witness set to be exactly the same for both relations (it is not sufficient that they have the same size, for example). The benefit behind this kind of reduction is that it preserves all the properties of efficient enumeration, counting and uniform generation that we introduced in Sections 2 and 3, as stated in the following result.

Proposition 11.

If a relation can be reduced to a relation , then:

  • If can be solved with constant (resp. polynomial) delay, then can be solved with constant (resp. polynomial) delay.

  • If there exists a polynomial-time algorithm (resp. an FPRAS) for , then there exists a polynomial-time algorithm (resp. an FPRAS) for .

  • If there exists a polynomial-time randomized algorithm (resp. a PLVUG) for , then there exists a polynomial-time randomized algorithm (resp. a PLVUG) for .

Proof.

Since can be reduced to , there exist a polynomial and a function such that for every input string , and can be computed in time .

First, suppose can be solved with constant (resp. polynomial) delay, so there is an algorithm that enumerates with constant (resp. polynomial) delay and with precomputation phase of time for some polynomial . Now, consider the following procedure for on input . First, we compute in time . Then, we run , which enumerates all witnesses in , that is, it enumerates all witnesses in . So, the precomputation time of the procedure takes time , which is polynomial on . The enumeration phase is the same as for , so it has constant (resp. polynomial) delay. We conclude that can be solved with constant (resp. polynomial) delay.

Now, suppose there exists a polynomial-time algorithm for , and let be the polynomial that characterizes its complexity. Now, consider the following procedure for on input . First, we construct in time . Next, we run , which computes , that is, it computes . So, the procedure calculates and takes time , which is polynomial on . We conclude that has a polynomial-time algorithm. The proof for the case of an FPRAS is completely analogous.

Finally, suppose there exists a polynomial-time randomized algorithm for , and let be the polynomial that characterizes its complexity. Now, consider the following procedure for on input . First, we construct in time . Next, we run , which outputs a witness from , that is, a witness from , uniformly at random. So, the procedure generates an element from uniformly at random and takes time , which is polynomial on . We conclude that has a polynomial-time randomized algorithm. The proof for the case of an PLVUG is completely analogous.

Therefore, by finding a complete relation for a class under the notion of reduction just defined, we can study the aforementioned problems for knowing that the obtained results will extend to every relation in the class . In what follows, we identify complete problems for the classes RelationNL and RelationUL, and use them first to establish the good algorithmic properties of RelationUL. Moreover, we prove that the identified problems are self-reducible, which will be useful for establishing some of the results of this section as well as for some of the results proved in Section 6 for the class RelationNL.

5.1 Complete problems for RelationNL and RelationUL

The notion of reduction just defined is useful for us as RelationNL and RelationUL admit complete problems under this notion. These complete relations are defined in terms of NFAs, and the idea behind them is the following. Take a relation in RelationNL (the case for RelationUL is very similar). We know there is an NL-transducer that characterizes it. Consider now some input . Since is a non-deterministic logspace Turing Machine, there is only a polynomial number of different configurations that can be in (polynomial on ). So we can consider the set of possible configurations as the states of an NFA , which has polynomial size, and whose transitions are determined by the transitions between the configurations of . Moreover, whenever a symbol is output by the transducer , that symbol is read by the automaton . In this way, accepts exactly the language . We formalize this idea in the following result, where

and an NFA is said to be unambiguous if there exists exactly one accepting run for every string accepted by it.

Proposition 12.

MEM-NFA is complete for RelationNL and MEM-UFA is complete for RelationUL.

We will prove the result only for the case of RelationUL and MEM-UFA, as the other case is completely analogous. The following lemma is the key ingredient in our argument. The proof of this lemma is given in Appendix A.1.

Lemma 13.

Let be a relation in RelationUL defined on an alphabet . Then there exists a polynomial-time algorithm that, given , produces an unambiguous NFA such that .

Proof of Proposition 12.

Let be a relation in RelationUL and be a string in . We know by Lemma 13 that we can construct in polynomial time an unambiguous NFA such that . Now, since is a -relation, there exists a polynomial such that for all . Thus, given that , we have that all words accepted by have the same length . We conclude that . Since this works for every and every input , by definition of completeness we deduce that MEM-UFA is complete for RelationUL. ∎

5.2 Mem-Nfa and Mem-Ufa are self-reducible

We focus on the case of MEM-NFA (it extends easily to MEM-UFA). To show this result, we need to include a little more detail in our definition of MEM-NFA, to consider some corner cases. First of all, we have to consider the cases where the string in unary is empty. That is, the case where in input . This just amounts to the following: if the starting state is a final state, we consider that the automaton does accept the empty string. So, if , and is an NFA that has all the properties stated in the definition of MEM-NFA, plus its starting state is the accepting state, then . Also, we need to consider the cases where does not have all the properties stated in the definition of MEM-NFA (for example, when it has more than one final state). In those cases, we consider that , for any , does not have any witnesses. Also, and this gets more technical, we consider that any input that has an invalid encoding does not have any witnesses either. We will not be completely precise about which encoding should be used (although during the proof we will mention some important points regarding that). But we will ask that the correction of the encoding can be checked in polynomial time (this is a mild requirement as any reasonable encoding will allow for it). And it is important to have in mind that for some technical concepts like self-reduciblity, the encoding of the problem is critical.

We use the notion of self-reducibility stated in [Sch09], because we want to utilize a result from that article which is proved under that specific notion of self-reducibility. We include the definition here, adapted to our situation, since [Sch09] uses a slightly different framework to define an enumeration problem. We say a relation is self reducible if there exist polynomial-time computable functions , and such that for every :

  1. if , then ,

  2. if , it can be tested in polynomial time in , whether the empty string is a witness for .

  3. ,

  4. if and only if ,

  5. ,

  6. , and

  7. .

The last condition can be equivalently stated in the following way, which is how we will use it:

  1. if , it holds that .

As we already stated, the empty string is a witness only when the input is correctly encoded and the initial and final states of the automaton coincide. So condition (2) from the previous definition is satisfied regardless of our definition of . We will focus from now on on the other six conditions. Let . Following the previous notation, we define the functions , and that characterize self-reducibility. The only interesting cases, of course, are those where the automaton in the input is in (and the input is correctly encoded). In all others, the input is not correct, so the witness set is empty, and we do not need to worry about self-reducibility. That said, we define

Both functions are clearly computable in polynomial time. The definition of is just saying that on input , any witness will have length , which comes directly from the definition of MEM-NFA. The definition of indicates that, for any input, as long as its witnesses have positive length, we can create another input that has the same witnesses, but with the first character removed. Notice that with these definitions, conditions (3) and (4) for self-reducibility are trivially met. Condition (1) is also met, which is easy to see from the definitions of MEM-NFA and . The only task left is to define and prove conditions (5), (6) and (8). We now proceed in that direction.

Let be an automaton in . Notice we are making the assumption that has a unique final state, since it makes the idea clearer and the proof only has to be modified slightly for the general case. We will mention some points about the exact encoding soon (which is key for condition (5) to hold). But first, consider an input which is incorrectly encoded or where is not in . Then, it has no witnesses and it is enough to set for all (which is clearly computable in polynomial time). In that case, notice that condition (5) is trivially true. Also, notice that since is not in (or is encoded in an incorrect format), we have , so for any it holds that

so condition (6) is also true. And given that , condition (8) amounts to checking that for every , it holds that if and only if , which is obviously true. Now, consider the case of an input that is correctly encoded and where is in . There are two main cases to consider.

First, the case where . This case is also simple, because we can set for all (which is computable in polynomial time and means that condition (5) is trivially true), and since , it is possible to prove that conditions (6) and (8) hold as before. Second, we need to consider the case where . Then we have , so only needs to be defined when is a single symbol. Then, for every , we set , where is defined as follows. Let be the set

Thus, is the set of states that can be reached (with one transition) from the initial state, by reading the symbol . Now, we define where is a new state not contained in , and:

Notice that this construction takes only polynomial time. What we are doing, basically, is the following. Imagine as a first ‘layer” of states reachable from in one step. We want to merge all of in a single new initial state , while ensuring that from we can reach the same states as were previously reachable from . The definitions are a little complicated because we have to account for some special cases. For example, we would maybe want to remove (since now we have a new initial state) but there is the possibility that is part of the acceptance runs of some strings, and not only as an initial state. The same goes for the states in , and that is why we have many different cases to consider in the definition of . We have to make sure not to lose any accepting runs with the removal of .

Notice something about . To construct , we are removing at least one state from . But we are adding at most one new state, . That means that (notation here indicates set cardinality). Similarly for the construction of . Notice that each transition we add to construct (besides the ones that come directly from ) corresponds to a transition that already existed, that involved at least one state from . So, all in all, we have not really added any new transitions, just simulated the ones where states in appeared. That means that . So, as a whole, contains at most as many states and transitions as , and maybe less. Does that mean that (notation here indicates encoding sizes) ? It will depend on the type of encoding used, of course. So we will consider that the NFA in the input is encoded in the following (natural) way. First, a list of all states, followed by the list of all tuples in the transition relation, and at the end the initial and final states. Also, we assume that all states have an encoding of the same size (which is easy to achieve through padding). And the same goes for all transitions. With that encoding, since has less (or equal) number of states and transitions than , it is clear that . Of course, it is also true that . We can then conclude that , that is, condition (5) is satisfied. We also have by definition of that and . Since , condition (6) is also true:

Finally, we turn to condition (8). Let . Since , condition (8) amounts to checking that

where is constructed by considering , that is, . Notice that if , then both sides of the equivalency above are immediately false (and thus the equivalency is true), so we need only consider the case where . We will now prove both directions of the equivalency. First, suppose . Then, by definition, we know there is an accepting run of on input such that

where , and for all . Now, we will show that , that is, is accepted by . To do that, we first show by induction the following property: for all there is a valid run of on input (although the run is not necessarily accepting) that looks like this:

where and for all , we have that and

To prove this fact by induction, consider first the case of . By definition, we know that and . There are now two different possibilities. First, if , then by definition of , we know that . Second, if , then by definition of , we know that