Haydi: Rapid Prototyping and Combinatorial Objects

09/27/2019
by   Stanislav Böhm, et al.
0

Haydi (http://haydi.readthedocs.io) is a framework for generating discrete structures. It provides a way to define a structure from basic building blocks and then enumerate all elements, all non-isomorphic elements, or generate random elements in the structure. Haydi is designed as a tool for rapid prototyping. It is implemented as a pure Python package and supports execution in distributed environments. The goal of this paper is to give the overall picture of Haydi together with a formal definition for the case of generating canonical forms.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

08/08/2020

FitsGeo – Python package for PHITS geometry development and visualization

An easy way to define and visualize geometry for PHITS input files intro...
02/27/2019

Formal structure of periodic system of elements

For more than 150 years the structure of the periodic system of the chem...
09/10/2019

PySPH: a Python-based framework for smoothed particle hydrodynamics

PySPH is an open-source, Python-based, framework for particle methods in...
05/04/2020

Abstract Mathematical morphology based on structuring element: Application to morpho-logic

A general definition of mathematical morphology has been defined within ...
11/12/2016

GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

Generative Adversarial Networks (GAN) have limitations when the goal is ...
10/27/2017

Generating global network structures by triad types

This paper addresses the question of whether it is possible to generate ...
03/31/2022

TropeTwist: Trope-based Narrative Structure Generation

Games are complex, multi-faceted systems that share common elements and ...

1 Introduction

The concept of rapid prototyping helps in verifying the feasibility of an initial idea and reject the bad ones fast. In mathematical world, there are tools like Matlab, SageMath, or R that allow one to build a working prototype and evaluate an idea quickly. This paper is focused on the field of combinatorial objects and provides a prototyping tool that allows to check claims on small instances by search over relevant objects. haydi (Haystack Diver) is an open-source Python package that provides an easy way of describing such structures by composing basic building blocks (e.g. Cartesian product, mappings) and then enumerating all elements, all non-isomorphic elements, or generating random elements.

The main design goal is to build a tool that is simple to use, since building prototypes have to be cheap and fast. There has been an attempt to build a flexible tool that describes various structures and reduces limitations for the user. The reasonable performance of the solution is also important, but it has a lower priority than the first two goals.

To fulfill these goals, haydi has been built as a Python package. Python is a well-known programming language and is commonly used as a prototyping language that provides a high degree of flexibility. Since haydi is written purely in Python, it is compatible with PyPy111https://pypy.org/ – a fast Python implementation with JIT compiler. Moreover, haydi is designed to transparently utilize a cluster of computers to provide a better performance without sacrificing simplicity or flexibility. The distributed execution is built over Dask/distributed222https://github.com/dask/distributed and it was tested on Salomon cluster333https://docs.it4i.cz/salomon/introduction/.

The goal of this paper is to give the overall picture of haydi together with a formal definition for the case of generating canonical forms. More detailed and programmer-oriented text can be found in the user guide444http://haydi.readthedocs.io/. haydi is released as an open source project at https://github.com/spirali/haydi under MIT license.

The original motivation for the tool was to investigate hard instances for equivalence of deterministic push-down automata (DPDAs). We have released a data set containing non-equivalent normed DPDAs [ndpda].

The paper starts with two motivation examples in Section 2 followed by covering related works in Section 3. Section 4 introduces the architecture of haydi. Section 5 shows a theoretical framework for generating canonical forms. Section 6 covers a basic usage of distributed computations and used optimizations. The last section shows performance measurements.

2 Examples

To give an impression of how haydi works, basic usage of haydi is demonstrated on two examples. The first one is a generator for directed graphs and the second one is a generator of finite state automata for the reset word problem.

2.1 Example: Directed graphs

In this example, our goal is to generate directed graphs with nodes. Our first task is to describe the structure itself: we represent a graph as a set of edges, where an edge is a pair of two (possibly the same) nodes. For the simplicity of outputs, we are going to generate graphs on two nodes. However, this can be simply changed by editing a single constant, namely the number 2 on the second line in following code:

python ¿¿¿ import haydi as hd ¿¿¿ nodes = hd.USet(2, ”n”) # A two-element set with elements n0, n1 ¿¿¿ graphs = hd.Subsets(nodes * nodes) # Subsets of a cartesian product

The first line just imports haydi package. The second one creates a set of nodes, namely a set of two “unlabeled” elements. The first argument is the number of elements, the second one is the prefix of each element name. The exact meaning of USet will be discussed further in the paper. For now, it just creates a set with elements without any additional quality, the elements of this set can be freely relabeled. In this example, it provides us with the standard graph isomorphism. The third line constructs a collection of all graphs on two nodes, in a mathematical notation it could be written as “”.

With this definition, we can now iterate all graphs:

python ¿¿¿ list(graphs.iterate()) [, (n0, n0), (n0, n0), (n0, n1), (n0, n0), (n0, n1), (n1, n0), # … 3 lines removed … n1), (n1, n0), (n1, n0), (n1, n1), (n1, n1)]

or iterate in a way in which we can see only one graph per isomorphic class:

python ¿¿¿ list(graphs.cnfs()) # cnfs = canonical forms [, (n0, n0), (n0, n0), (n1, n1), (n0, n0), (n0, n1), (n0, n0), (n0, n1), (n1, n1), (n0, n0), (n0, n1), (n1, n0), (n0, n0), (n0, n1), (n1, n0), (n1, n1), (n0, n0), (n1, n0), (n0, n1), (n0, n1), (n1, n0)]

or generate random instances (3 instances in this case):

python ¿¿¿ list(graphs.generate(3)) [(n1, n0), (n1, n1), (n0, n0), (n0, n1), (n1, n0)]

haydi supports standard operations such as map, filter, and reduce. The following example shows how to define graphs without loops, i.e. graphs such that for all edges hold that :

python ¿¿¿ no_loops = graphs.filter(lambda g: all(a!=b for (a,b) in g.to_set()))

All these constructions can be transparently evaluated as a pipeline distributed across a cluster. haydi uses Dask/distributed for distributing tasks, the following code assumes that dask/distributed server runs at hostname:1234:

python # Initialization ¿¿¿ from haydi import DistributedContext ¿¿¿ context = DistributedContext(”hostname”, 1234)

# Run a pipeline ¿¿¿ graphs.iterate().run(ctx=context)

2.2 Example: Reset words

A reset word is a word that sends all states of a given finite automaton to a unique state. The following example generates automata and computes the length of a minimal reset word. It can be used for verifying the Černý conjecture on bounded instances. The conjecture states that the length of a minimal reset word is bounded by where is the number of states of the automaton [cerny1964, Volkov2008].

First, we describe deterministic automata by their transition functions (a mapping from a pair of state and symbol to a new state). In the following code, n_states is the number of states and n_symbols is the size of the alphabet. We use USet even for the alphabet, since we do not care about the meaning of particular symbols, we just need to distinguish them.

python # set of states q0, q1, …, q_n_states-1 ¿¿¿ states = hd.USet(n_states, ”q”) # set of symbols a0, …, a_a_symbols-1 ¿¿¿ alphabet = hd.USet(n_symbols, ”a”)

# All mappings (states * alphabet) -¿ states ¿¿¿ delta = hd.Mappings(states * alphabet, states)

Now we can create a pipeline that goes through all the automata of the given size (one per an isomorphic class) and finds the maximal length among minimal reset words:

python ¿¿¿ pipeline = delta.cnfs().map(check_automaton).max(size=1) ¿¿¿ result = pipeline.run()

¿¿¿ print (”The maximal length of a minimal reset word for an ” … ”automaton with states and symbols is .”. … format(n_states, n_symbols, result[0]))

The function check_automaton takes an automaton (as a transition function) and returns the length of the minimal reset word, or 0 when there is no such a word. It is just a simple breadth-first search on sets of states. The function is listed in Appendix 0.A.

3 Related works

Many complex software frameworks are designed for rapid checking mathematical ideas, for example Maple, Matlab, SageMath. Most of them also contain a package for combinatorial structures, e.g. Combinatorics in SageMath555http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/tutorial.html, combstruct in Maple666https://www.maplesoft.com/support/help/Maple/view.aspx?path=combstruct.

From the perspective of the mentioned tools, haydi is a small single-purpose package. But as far as we know, there is no other tool that allows building structures by composition, searching only one structure of each isomorphism class as well as offering simple execution in distributed environment.

Tools focused on the generation of specific structures are on the other side of the spectrum. One example is Nauty [McKayGeng] that contains Geng for generating graphs, another ones are generators for parity games in PGSolver [PGSolver] or automata generator for SageMath [ASM]. These tools provide highly optimized generators for a given structure.

4 Architecture

haydi is a Python package for rapid prototyping of generators for discrete structures. The main two components are domains and pipelines. The former is dedicated to defining structures and the latter executes an operation over domains. In this section, both domains and pipelines are introduced. Parts that are related to generating canonical forms are omitted. This is covered separately in Section 5.

4.1 Domains

The basic structure in haydi is a domain that represents an unordered collection of (Python) objects. On abstract level, domains can be viewed as countable sets with some implementation details. The basic operations with the domains are iterations through their elements and generating a random element. Domains are composable, i.e., more complex domains can be created from simpler ones.

There are six elementary domains: Range (a range of integers), Values (a domain of explicitly listed Python objects), Boolean (a domain containing True and False), and NoneDomain (a domain containing only one element: None). Examples are shown in Figure 1. There are also domains USet and CnfValues; their description is postponed to Section 5, since it is necessary to develop a theory to explain their purpose.

python ¿¿¿ import haydi as hd

¿¿¿ hd.Range(4) # Domain of four integers ¡Range size=4 0, 1, 2, 3¿

¿¿¿ hd.Values([”Haystack”, ”diver”]) ¡Values size=2 ’Haystack’, ’diver’¿

Figure 1: Examples of elementary domains

New domains can be created by composing existing ones or applying a transformation. There are the following compositions: Cartesian product, sequences, subsets, mappings, and join. Examples are shown in Figure 2, more details can be found in the user guide. There are two transformations map and filter with the standard meaning. Examples are shown in Figure 3.

python ¿¿¿ import haydi as hd ¿¿¿ a = hd.Range(2) ¿¿¿ b = hd.Values((”a”, ”b”, ”c”))

¿¿¿ hd.Product((a, b)) # Cartesian product ¡Product size=6 (0, ’a’), (0, ’b’), (0, ’c’), (1, ’a’), …¿

¿¿¿ a * b # Same as above ¡Product size=6 (0, ’a’), (0, ’b’), (0, ’c’), (1, ’a’), …¿

¿¿¿ hd.Subsets(a) # Subsets of ’a’ ¡Subsets size=4 , 0, 0, 1, 1¿

¿¿¿ hd.Mappings(a, a) # Mappings from ’a’ to ’a’ ¡Mappings size=4 0: 0; 1: 0, 0: 0; 1: 1, 0: 1; 1: 0, …¿

¿¿¿ hd.Sequences(a, 3) # Sequences of length 3 over ’a’ ¡Sequences size=8 (0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), …¿

¿¿¿ hd.Join((a, b)) # Join ’a’ and ’b’, can be also written as ’a + b’ ¡Join size=5 0, 1, ’a’, ’b’, ’c’¿

Figure 2: Examples of domain compositions

python ¿¿¿ a = hd.Range(5)

¿¿¿ a.map(lambda x: x * 10) ¡MapTransformation size=5 0, 10, 20, 30, 40¿

Figure 3: Examples of domain compositions

4.2 Pipeline

Domains in the previous section describe a set of elements. Pipelines provide a way how to work with elements in these sets. Generally, a pipeline provides methods for generating and iterating elements and optionally applying simple “map & reduce” transformations.

The pipeline creates a stream of elements from a domain by one of the three methods. We can apply transformations on elements in the stream. The pipeline ends by a reducing action. The schema is shown in Figure 4. The pipeline consists of:

Figure 4: The pipeline schema

Method It specifies how to take elements from the domain into the stream. Haydi provides three options: iterate(), generate(n), and cnfs(). Method
iterarate() iterates all elements of a given domain, generate(n) creates

random elements of the domain (by default with the uniform distribution over all elements), and

cnfs() iterates over canonical forms (Section 5).

Transformations Transformation modifies/filters elements in a stream.
There are three pipeline transformations: map(fn) – applies the function fn on each element that goes through the pipeline, filter(fn) – filters elements in the pipeline according to the provided function, take(count) – takes only first count elements from the stream. The reason why transformations on domains and in pipeline are distinguished is described in https://haydi.readthedocs.io/en/latest/pipeline.html#transformations.

Actions Action is a final operation on a stream of elements. For example there are: collect() – creates a list form of the stream, reduce(fn) – applies binary operation on elements of the stream, max() – takes maximal elements in the stream.

run() The previous operations declare the pipeline, which is an immutable representation of a computational graph. The run() method actually executes the pipeline. The optional ctx (context) parameter specifies how should the computation be performed (serially or in a distributed way on a cluster).

The examples of pipelines are shown in Figure 5. Not all parts of a pipeline have to be specified, if some of them are missing, defaults are used; the default method is iterate() and the default action is collect().

python ¿¿¿ domain = hd.Range(5) * hd.Range(3)

# Iterate all elemenets and collect them ¿¿¿ domain.iterate().collect().run() [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2)]

# The same as above, since iterate() and collect() is default ¿¿¿ domain.run() [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2)]

# Generate three elements ¿¿¿ domain.generate(3).run() [(3, 2), (4, 0), (1, 2)]

# Take elements that are maximal in first component ¿¿¿ domain.max(lambda x: x[0]).run() [(4, 0), (4, 1), (4, 2)]

Figure 5: Examples of pipelines

5 Generating canonical forms

In many cases, when we want to verify a property of a discrete structure, we are not interested in the names of the elements in the structure. For example, in the case of graphs we usually want to see only one graph for each isomorphic class. Another example can be finite-state automata; in many cases we are not especially interested in names of states and actual symbols in the input alphabet. For example, in Černý conjecture, the minimal length of reset words is not changed when the alphabet is permuted. On the other hand, symbols in some other problems may have special meanings and we cannot freely interchange them.

haydi introduces haydi.USet as a simple but expressive mechanism for describing what permutations we are interested in. It serves to define partitions of atomic objects. Each partition creates a set of atomic objects that can be freely interchanged with one another; we call these partitions Unlabeled sets. They then establish semantics to what structure should be preserved when isomorphisms on various structures are defined.

haydi allows to iterate a domain in a way where we see only one element for each isomorphic class. It is implemented as an iteration through canonical forms (CNFS).

In this section, a simple theoretical framework is built. It gives a formal background to this feature. Paragraphs starting with “Abstract:” are meant as part of a theoretical description. Paragraphs starting with “Haydi:” describe the implementation of the framework in haydi.

Abstract: Let be a set of all atomic objects whose structure is not investigated any further. The set of objects is the minimal set with the following properties:

  •  (atoms)

  • If for then  (finite sets)

  • If for then  (finite sequences)

It is assumed that type of each object (atom, sequence, and set) can always be determined. Therefore, it is assumed that sequences and sets are not contained in atoms.

Function that returns the atoms contained in an object is defined as follows:

  • if

  • if or

haydi: The used Python instantiation of the definitions is the following: contains None, True, False, all instances of types int (integers) and str (strings) and instances of the class haydi.Atom. Except for the last one, they are Python built-in objects; the last one is related to unlabeled sets and will be explained later. Sequences in are identified with Python tuples, sets with haydi.Set (analogous to standard set). haydi also contains the type haydi.Map (analogous to dict) for mappings. In the theoretical framework, mappings were not explicitly distinguished, as they can be considered sets of pairs. For sets and maps, standard python objects are not directly used for performance reasons777 Built-in classes set and dict are optimized for lookups; however, Haydi needs fast comparison methods as will be seen later. haydi.Set and haydi.Map are stored in a sorted state to enable this.; however, both haydi.Set and haydi.Map can be directly transformed into their standard Python counter-parts.

Note: Generally, domains in haydi may contain any Python object; however, domains that support iterating over CNFS impose some restrictions that will be shown later. Since the theoretical framework is built just for CNFS, its formalization to Python is mapped in a way that respects these limitations from the beginning. For this reason, is not identified with all Python objects. The Python incarnation of is called basic objects.

Now we established an isomorphism between objects. We define that two objects are isomorphic if they can be obtained one from another by permuting its atoms. To control permutations, partitioning of atoms is introduced and permutation of atoms is allowed only within its “own” class. These classes are defined through , that is an abbreviation of “unlabeled set”.

Abstract: Let us fix a function in the following way:

Obviously partitions into disjoint classes.

Let be a set of all bijective functions from to such that for each holds .

Applying to an object (written as ) is defined as follows:

  • if

  • if

  • if

Let then and are isomorphic (written as ) if there exists such that .

haydi: All integers, strings, None, True, and False have a singleton unlabeled set, i.e., . Therefore, all objects that contain only these atoms always form their own “private” isomorphic class. For example: ("abc", 1) cannot be isomorphic to anything else since string "abc" and integer 1 cannot be replaced, because for each holds and .

The only way to create a non-singleton unlabeled set is to use domain haydi.USet (Unlabled set) that creates a set of atoms belonging to the same unlabeled set; in other words, if is created by haydi.USet then for each holds .

python ¿¿¿ a = hd.USet(3, ”a”) ¿¿¿ list(a) [a0, a1, a2]

The first argument is the size of the set, and the second one is the name of the set that has only informative character. The name is also used as the prefix of element names, again without any semantical meaning. Elements of USet are instances of haydi.Atom that is a wrapper over an integer and a reference to the USet that contains them.

Method haydi.is_isomorphic takes two objects and returns True iff the objects are isomorphic according to our definition. Several examples are shown in Figure 6.

python ¿¿¿ a0, a1, a2 = hd.USet(3, ”a”) ¿¿¿ b0, b1 = hd.USet(2, ”b”) ¿¿¿ hd.is_isomorphic(a0, a2) True ¿¿¿ hd.is_isomorphic(b1, b0) True ¿¿¿ hd.is_isomorphic(a0, b0) False ¿¿¿ hd.is_isomorphic((a0, b0), (a2, b1)) True ¿¿¿ hd.is_isomorphic((a0, a0), (a0, a2)) False

Figure 6: Isomorphism examples

5.1 Canonical forms

haydi implements an iteration over canonical forms as a way to obtain exactly one element for each isomorphic class. We define a canonical form as the smallest element from the isomorphic class according to a fixed ordering.

Abstract: We fix a binary relation for the rest of the section such that is well-ordered under . As usual, we write if and . A canonical form of an object is . We denote as a set of all canonical forms.

haydi: Canonical forms can be generated by calling cnfs() on a domain. Figure 7 shows simple examples of generating CNFS. In case 1, we have only two results (a0, a0) and (a0, a1); the former represents a pair with the same two values and the latter represents a pair of two different values. Obviously, we cannot get one from the other by applying any permutation, and all other elements of Cartesian product a * a can be obtained by permutations. This fact is independent of the size of a (as it has at least two elements). In case 2, the result is two elements, since we cannot permute elements from different usets. The third case shows canonical forms of a power set of a, as we see there is exactly one canonical form for each size of sets.

python ¿¿¿ a = hd.USet(3, ”a”) ¿¿¿ b = hd.USet(2, ”b”)

¿¿¿ list((a * a).cnfs()) # 1 [(a0, a0), (a0, a1)]

¿¿¿ list((a + b).cnfs()) # 2 [a0, b0]

¿¿¿ list(hd.Subsets(a).cnfs()) # 3 [, a0, a0, a1, a0, a1, a2]

Figure 7: CNFS examples

Generating CNFS is limited in haydi to strict domains that have the following features:

  1. A strict domain contains only basic objects (defined at the beginning of this section).

  2. A strict domain is closed under isomorphism.

The first limitation comes from the need of ordering. The standard comparison method __eq__ is not sufficient since it may change between executions. To ensure deterministic canonical forms888 In Python 2, instances of different types are generally unequal, and they are ordered consistently but arbitrarily. Switching to Python 3 does not help us, since comparing incompatible types throws an error (e.g. 3 < (1, 2)), hence standard comparison cannot serve as ordering that we need for basic objects. , haydi defines haydi.compare method. This method is responsible for deterministic comparison of basic objects and provides some additional properties that are explained later. The second condition ensures that canonical forms represent all elements of a domain. Usually these conditions do not present a practical limitation. Elementary domains except for haydi.Values are always strict and standard compositions preserve strictness. Elementary domain haydi.CnfsValues allows to define a (strict) domain through canonical elements, hence it serves as a substitute for haydi.Values in a case when a strict domain from explicitly listed elements is needed.

5.2 The Algorithm

This section describes implementation of the algorithm that generates canonical forms. A naïve approach would be to iterate over all elements and filter out non-canonical ones. haydi avoids the naïve approach and makes the generation of canonical forms more efficient. It constructs new elements from smaller ones in a depth-first search manner. On each level, relevant extensions of the object are explored, and non-canonical ones are pruned. The used approach guarantees that all canonical forms are generated, and each will be generated exactly once, hence the already generated elements do not need to be remembered (except the current branch in a building tree).

This approach was already used in many applications and extracted into an abstract framework (e.g.  [McKay98]). The main goal of this section is to show correctness of the approach used in haydi and not to give an abstract framework for generating canonical elements, since it was done before. However, the goal is not to generate a specific kind of structures, but provide a framework for their describing, therefore, a rather abstract approach must still be used.

Let us note that the algorithm is not dealing here with efficiency of deciding whether a given element is in a canonical form. In our use cases, most elements are relatively small, hence all relevant permutations are checked during checking the canonicity of an element. Therefore, the implementation in haydi is quite straightforward. It exploits some direct consequences of Proposition 1 that allow the algorithm to reduce the set of relevant permutations and in some cases immediately claim non-canonicity.

The used approach is based on the following two propositions. The first says that an object cannot be canonical if it contains “gaps” in atoms occurring in the object. The second shows that new elements can only be constructed from existing canonical forms and still all of them are reached.

At the beginning, let us introduce some properties of the ordering given by haydi.compare which allow the propositions to be established. On the abstract level, the following properties for ordering are assumed where :

  • Tuples of the same length are lexicographically ordered.

  • If where and where then if .

A set contains a gap if there exists such that there is and where .

Proposition 1

If and contains a gap, then is not a canonical form.

Proposition 1 is a direct consequence of the following claim:

Proposition 2

If and such that and is a permutation that only swaps and then .

Proof

The proposition is proved by induction on the structure of ; let be as in the statement of the proposition: If is an atom then directly and from assumptions. Now assume that and the proposition holds for all . From assumptions we get that each does not contain and there is the minimal index such that contains . Hence for all and by the induction assumption. Since tuples are lexicographically ordered it follows that . Similar ideas apply also for sets. ∎

Let us define function (where is a fresh symbol) that gives rise to a search tree. The function returns a “smaller” object from which the object may be constructed. The function returns for “ground” objects (atoms, empty tuples/sets).

Proposition 3

For each holds:

  1. Exists such that .

  2. If is a canonical form then is or a canonical form.

Proof

(1) If then , if is a set/tuple then is the number of elements in the set/tuple.

(2) Assume that there is and and there is such that . Since , has to be a non-empty tuple or set by definition of . If is a tuple then from the lexicographic ordering of tuples follows that and this is a contradiction. Now we explore the case where for and . If there is such that for then from fact that follows that there has to be such that and for all . The last step is to explore what happens when is applied on ; let such that for and let such that . Since applying on an object is bijective, . If then it follows that for and and hence . If then for and and hence . ∎

Proposition 3.1 shows that defines a tree where: is the root; non-root nodes are elements from ; 3.2 shows that each canonical form can be reached from the root by a path that contains only canonical forms. Moreover, the elements “grow” with the distance from the root.

This serves as a basis for the algorithm generating canonical forms of elements from a domain. It recursively takes an object and tries to create a bigger one, starting from . On each level, it checks whether the new element is canonical, if not, the entire branch is terminated. The way of getting a bigger object from a smaller one, depends on the specific domain, what type of objects are generated and by which elements the already found elements are extended. Since domains are composed from smaller ones, haydi iterates the elements of a subdomain to gain possible “extensions” to create a new object; such extensions are then added to the existing object to obtain a possible continuation in the tree. Since subdomains are also strict domains, only through canonical elements of the subdomain is iterated and new extensions are created by applying permutations on the canonical forms. Once it is clear that the extension leads to an object with a gap, then such a permutation is omitted. Therefore, it is not necessary to go through all of the permutations.

Example:

python ¿¿¿ a = hd.USet(1000, ”a”) ¿¿¿ b = a * a ¡Product size=1000000 (a0, a0), (a0, a1), (a0, a2), (a0, a3), …¿ ¿¿¿ list(b.cnfs()) [(a0, a0), (a0, a1)]

The domain in variable b has one million of elements; however, only two of them are canonical forms. haydi starts with an empty tuple, then it asks for canonical forms of the subdomain a that is a set containing only a0. The only permutation on a0 that does not create a gap after adding into empty tuple is identity, so the only relevant extension is a0. Therefore, only (a0,) is examined as a continuation. It is a canonical form, so the generation continues. Now the second domain from Cartesian product is used, in this particular example, again canonical forms of a is used. At this point the only no-gap (partial) permutations are identity and swap of a0 and a1, hence possible extensions are a0 and a1. Extending (a0,) give us (a0, a0) and (a0, a1) as results.

The approach is similar when sets are generated. The only thing that needs to be added for this case is a check that the extending object is bigger (w.r.t. ) than previous ones, to ensure that the current object is the actual parent of the resulting object.

6 Distributed computations

haydi was designed to enable parallel computation on cluster machines from the beginning. Dask/distributed999https://github.com/dask/distributed serves as the backend for computations. The code that uses this feature was already shown at the end of Section 2.1.

haydi contains a scheduler that dynamically interacts with dask/distributed scheduler. haydi’s scheduler gradually takes elements from a pipeline and assigns them to dask/distributed. haydi calculates an average execution time of recent jobs and the job size is altered to having neither too small jobs nor too large with respect to job time constraints.

haydi chooses a strategy to create jobs in dependence on a chosen method of a domain exploration. The simplest strategy is for randomly generated elements; the stream in the pipeline induces independent jobs and the scheduler has to care only about collecting results and adhering to a time constraint (that may be specified in run method).

In the case of iterating over all elements, there are three supported strategies: strategy for domains that support full slicing, for domains with filtered slicing, and a generic strategy for domains without slicing. The last one is a fallback strategy where haydi scheduler itself generates elements, these elements with the rest of the pipeline are sent as jobs into dask/distributed.

The full slicing is supported if the number of elements in a domain is known and an iterator over the domain that skips the first elements can be efficiently created. In this case, the domain may be sliced into disjunct chunks of arbitrary sizes. haydi scheduler simply creates lightweight disjoint tasks to workers in form “create iterator at steps and process elements” without transferring explicit elements of domains. All built-in domains support full slicing as long as the filter is not applied.

If a domain is created by applying a filter, both properties are lost generally, i.e., the exact number of elements, and an efficient iterator from the -th item. However, if the original domain supports slicing it is possible to utilize this fact. Domains can be sliced as if there was no filter present in domain or subdomains at all, while allowing to signalize that some elements were skipped. Note, the filtered elements cannot be silently swallowed, because the knowledge of how many elements were already generated in the underlying domain would be lost. In such a case it could not be possible to ensure that the iterations go over disjunct chunks of a domain. The iterators that allow to signalize that one or more elements were internally skipped are called “skip iterators”. The ability to signalize skipping more elements at once allows to implement efficient slicing when filter domains are used in composition. For example, assume a Cartesian product of two filtered domains, where consecutive chunks of elements are dropped when a single element is filtered in a subdomain. This strategy usually works well in practice for domains when elements dropped by the filter are spread across the whole domain.

If canonical forms are generated, then the goal is to build a search tree. One job assigned to dask/distributed represents a computation of all direct descendants of a node in a search tree. In the current version, it is quite a simple way of distributed tree search and there is a space for improvements; it is the youngest part of haydi.

7 Performance

The purpose of this section is to give a basic impression of haydi’s performance. For comparison the two examples from Sections 2.1 ad 2.2 are used.

First of all, Haydi is compared with other tools. This comparison is demonstrated on the example of generating directed graphs. Geng [McKayGeng] is used as a baseline within the comparison, since it is a state of the art generator for graphs. In order to simulate a prototyping scenario a special version is included. This version loads graphs generated by geng into Python. The loading process is done using the networkx101010https://networkx.github.io/ library and a small manual wrapper. Moreover, the results of two other experiments are included. Both experiments summarize graphs generated by SageMath, in the first case SageMath uses geng as the backend while in the other one it uses its own graph implementation Cgraph. haydi was executed with Python 2.7.9 and PyPy 5.8.0. Geng 2.6r7 and SageMath 8.0 were used. Experiments were executed on a laptop with Intel Core i7–7700HQ (2,8 GHz). Source codes of all test scripts can be found in the haydi’s git repository. The results are shown in Table 1. In all cases, except the last one, the goal was to generate all non-isomorphic graphs with the given number of vertices without any additional computation on them. The last entry generates all possible graphs (including isomorphic ones) and runs in parallel on 8 processes.

Tool/# of vertices 5 6 7 8
geng (without loading to Python) <0.01s <0.01s <0.01s 0.01s
geng + manual parser <0.01s 0.03s 0.05s 0.11s
geng + networkx 0.27s 0.28s 0.28s 0.94s
SageMath (geng backend) 0.17s 0.18s 0.26s 2.17s
SageMath (Cgraph backend) 0.09s 0.55s 6.59s 139.76s
haydi canonicals (Python) 0.53s 11.55s timeout timeout
haydi canonicals (PyPy) 0.34s 5.46 timeout timeout
haydi parallel iterate() (PyPy) 3.27s 3.49s 70.99s timeout

timeout is 200s

Table 1: Performance of generating graphs

It is obvious that Haydi cannot compete with Geng in generating graphs. Geng is hand-tuned for this specific use case, in contrast to Haydi that is a generic tool. On the other hand, the Haydi program that generates graphs can be simply extended or modified to generate different custom structures while modifying Geng would be more complicated.

The second benchmark shows strong scaling of parallel execution of the reset word generator from Section 2.2 for six vertices and two alphabet characters for variants where cnfs() was replaced by iterate(), since the parallelization of cnfs() is not fully optimized, yet. The iterated domain supports the full slicing mode. The experiment was executed on the Salomon cluster.

Nodes (24 CPUs/node) Time Strong scaling
1 3424s 1
2 1908s 0.897
4 974s 0.879
8 499s 0.858
Table 2: Performance of iterate() on Salomon

The last note on performance: we have experimented with several concepts of the tool. The first version was a tool named Qit111111https://github.com/spirali/qit, that shares similar ideas in API design with haydi. It also has Python API, but generates C++ code behind the scene that is compiled and executed. Benchmarks on prototypes showed it was around times faster than pure Python version (executed in PyPy); however, due to C++ layer, Qit was less flexible than the current version haydi and hard to debug for the end user. Therefore, this version was abandoned in favor of the pure-python version to obtain a more flexible environment for experiments and prototyping. As the problems encountered in generation of combinatorial objects are often exponential, speedup does not compensate inflexibility.

References

Appendix 0.A Function check_automaton

python from haydi.algorithms import search

# Let us precompute some values that will be repeatedly used init_state = frozenset(states) max_steps = (n_states**3 - n_states) / 6 # Known result is that we do not need more than (n^3 - n) / 6 steps

def check_automaton(delta): # This function takes automaton as a transition function and # returns the minimal length of reset word or 0 if there # is no such word

def step(state, depth): # A step in bread-first search; gives a set of states # and return a set reachable by one step for a in alphabet: yield frozenset(delta[(s, a)] for s in state)

delta = delta.to_dict() return search.bfs( init_state, # Initial state step, # Function that takes a node and # returns the followers lambda state, depth: depth if len(state) == 1 else None, # Run until we reach a single state max_depth=max_steps, # Limit depth of search not_found_value=0) # Return 0 when we exceed # depth limit