# Fully Automatic, Verified Classification of all Frankl-Complete (FC(6)) Set Families

The Frankl's conjecture, formulated in 1979. and still open, states that in every family of sets closed for unions there is an element contained in at least half of the sets. A family Fc is called Frankl-complete (or FC-family) if in every union-closed family F containing Fc, one of the elements of union Fc occurs in at least half of the elements of F (so F satisfies the Frankl's condition). FC-families play an important role in attacking the Frankl's conjecture, since they enable significant search space pruning. We extend previous work by giving a total characterization of all FC-families over a 6-element universe, by defining and enumerating all minimal FC and maximal nonFC-families. We use a fully automated, computer assisted approach, formally verified within the proof-assistant Isabelle/HOL.

## Authors

• 16 publications
• 1 publication
• 1 publication
• ### Characterizing 3-sets in Union-Closed Families

A family of sets is union-closed (UC) if the union of any two sets in th...
03/06/2019 ∙ by Jonad Pulaj, et al. ∙ 0

• ### Diversity of uniform intersecting families

A family f⊂ 2^[n] is called intersecting, if any two of its sets inter...
09/08/2017 ∙ by Andrey Kupavskii, et al. ∙ 0

• ### Regular Intersecting Families

We call a family of sets intersecting, if any two sets in the family int...
09/29/2017 ∙ by Ferdinand Ihringer, et al. ∙ 0

• ### Structure and properties of large intersecting families

We say that a family of k-subsets of an n-element set is intersecting, i...
10/06/2017 ∙ by Andrey Kupavskii, et al. ∙ 0

• ### Simple juntas for shifted families

We say that a family F of k-element sets is a j-junta if there is a set...
01/12/2019 ∙ by Peter Frankl, et al. ∙ 0

• ### Edge Partitions of Complete Geometric Graphs (Part 1)

In this note, we disprove the long-standing conjecture that any complete...
08/11/2021 ∙ by Johannes Obenaus, et al. ∙ 0

• ### What are the minimal conditions required to define a SIC POVM?

Symmetric informationally complete (SIC) POVMs are a class of quantum me...
07/20/2020 ∙ by Isabelle Jianing Geng, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Union-closed set conjecture, an elementary and fundamental statement formulated by Péter Frankl in 1979. (therefore also called Frankl’s conjecture), states that for every family of sets closed under unions, there is an element contained in at least half of the sets (or, dually, in every family of sets closed under intersections, there is an element contained in at most half of the sets). Up to the best of our knowledge, the problem is still open, and that is not because of the lack of interest — a recent survey by Bruhn and Schaudt lists more than 50 published research articles on the topic frankl-survey .

The conjecture has been confirmed for many finite special cases. For example, Bošnjak and Marković frankl-bosnjak-markovic proved that the conjecture holds for families such that their union has at most elements and Živković and Vučković frankl-zivkovic-vuckovic describes the use of computer programs to check the case of elements. Lo Faro frankl-lo-faro establishes the connection between the size of the union and the size of the minimal counter-example, proving that for any the minimal counter-example has at least sets. Using results of Zivković and Vučković the conjecture is true for every family containing sets.

It can easily be shown that if a union-closed family contains a one-element set, then that element is abundant (occurs in at least half of the sets). Similarly, one of the elements of a two-element set in a family is abundant. Unfortunately, as first shown by Renaud and Sarvate frankl-sarvate-renaud , the pattern breaks for a three-element set. This motivates the search for good local configurations as they enable significant search space pruning. Following Vaughan, these are sometimes called Frankl-complete families (or just FC-families). A family is an FC-family if in every union-closed family , one of the elements of is abundant. A FC family is called FC() if its union is an -element set. Most effort has been put on investigating uniform families, where all members have the same number of elements. The number FC(, ) is the minimal number such that any family containing -element sets whose union is an -element set is an FC-family. Poonen gives a necessary and sufficient conditions for a family to be FC frankl-poonen .

As it is usually the case in finite combinatorics, even for small values of , a combinatorial explosion occurs and assistance of a computer is welcome for case-analysis within proofs. The corresponding paradigm is sometimes called proof-by-evaluation or proof-by-computation. Since these are not classical mathematical results, these proofs sometimes raise controversies. We support this criticism, and advocate that the use of computer programs in classical mathematical proofs should be allowed only if the programs are formally verified.

In our previous work frankl-cicm , we have applied proof-by-computation techniques and developed a fully verified algorithm that can formally prove that a given family is FC and have applied it to confirm some known uniform FC-families and to discover a new FC-family (we have shown that each family containing a four 3-element sets contained in a 7-element set is FC, i.e., that FC(, ) , which, together with the lower bound on the number of 3-sets of Morris frankl-morris gives that FC(, ) ).

In this paper we extend these results by giving a fully automated and mechanically verified (within a proof assistant) characterization of all FC() families for . Such characterization requires three components:

1. a method to prove (within a proof-assistant) that some families are FC (the technique relies on the Poonen’s Theorem frankl-poonen and was already formalized in our previous work frankl-cicm ),

2. a method to prove (within a proof-assistant) that some families are not FC (the technique also relies on the Poonen’s Theorem frankl-poonen , but this is the first time that it is formalized),

3. finding a list of FC and a list of nonFC-families that are characteristic in some sense, formally verifying (within a proof-assistant) their FC-status (i.e., proving if a family is FC or nonFC), enumerating (within a proof-assistant) all relevant families from a 6-element universe and proving that all of them are in some sense covered by some of those characteristic families i.e., that their FC-status directly follows from the status of the covering family (this technique is novel).

Finding a list of characteristic FC and nonFC-families requires lot of experimenting and checking the FC-status of many candidate families. It has recently been shown that this process can be fully automated222All FC-families classification results in the present paper were obtained prior to Pulaj’s algorithm pulaj and for determining the FC status of various families we used a semi-automated procedure that is in spirit somewhat similar to Pulaj’s technique. Afterwards we fully automated the procedure, and confirmed previous results.. Namely, Pulaj recently proposed a fully automated method for determining the FC-status of an arbitrary given family pulaj . The method is based on linear integer programming, and, although not integrated within a proof-assistant, it is very reliable, as it uses exact arithmetic. Even with the fully automated FC-status checking procedure, the third point requires nontrivial effort and is the main contribution of this paper (although there are well-known algorithms for exhaustive generation of non-isomorphic objects isomorph-free ).

Apart from the significance of this core result, an important contribution of this paper is to demonstrate that in the field of finite combinatorics it is possible to use computer programs to push the bounds and simplify proofs, but in a way that does not jeopardize proof correctness. On the contrary, since all statements and algorithms have been verified within the theorem prover Isabelle/HOL, the trust in our results is significantly higher than most classical pen-and-paper proofs previously published on this topic. We emphasize that many experiments may be performed by unverified tools, and only the final results need to be checked within proof-assistants (e.g., we find the list of characteristic FC and nonFC-families using unverified tools, and verify only the final list using Isabelle/HOL).

#### Overview of the paper

The paper is organized as follows. In the rest of this section we describe proofs by computation and discuss some related work. In Section 2 we describe Isabelle/HOL and notation that is going to be used in the paper. In Section 3 we formally introduce basic definitions related to the Frankl’s conjecture (Frankl’s condition, FC and nonFC-families, etc.). In Section 4 we give a theorem (based on Poonen’s theorem frankl-poonen ) that can be used to formally prove that a family is FC and describe two different approaches for checking the conditions of that theorem (one based on a specialized, verified procedure, and one based on linear integer programming). In Section 5 we give a theorem (also based on Poonen’s theorem frankl-poonen ) that can be used to formally prove that a family is nonFC. In Section 6 we describe a fully automated (unverified) procedure for checking if an arbitrary given family is FC. In Section 7 we define the notion of covering and describe properties that our characteristic families should satisfy. In Section 8 we describe methods for enumerating all families with certain properties that is used both within an automated (unverified) procedure for finding all characteristic families, and to formally show that all families are covered by the given characteristic families. In Section 9 we give a full characterization of FC(6) families, by listing all found characteristic families, and formally proving that they cover all families in . In Section 11 we draw final conclusions and discuss possible further work.

#### ITPs and Proofs by computation

Interactive theorem provers (sometimes called proof assistants), like Coq, Isabelle/HOL, HOL Light, etc., have made great progress in recent years. Many classical mathematical theorems have been formally proved and proof assistants have been intensively used in hardware and software verification. Several of the most important results in formal theorem proving are for the problems that require proofs with much computational content. These proofs are usually highly complex (and therefore often require justifications by formal means) since they combine classical mathematical statements with complex computing machinery (usually computer implementation of combinatorial algorithms). The corresponding paradigm is sometimes referred to as proof-by-evaluation or proof-by-computation

. Probably, the most famous examples of this approach are the proofs of the Four-Color Theorem

gonthier-notices and the Kepler’s conjecture flyspeck . One of the authors of this paper, recently used a proof-by-computation technique to give a formal proof of the Erdös-Szekeres conjecture for hexagons maric-erdos within Isabelle/HOL.

#### Related work

Bruhn and Schaudt give a detailed survey of the Frankl’s conjecture frankl-survey .

The Frankl’s conjecture has also been formulated and studied as a question in lattice theory frankl-lattices-reinhold ; frankl-lattices-abe , and in the graph theory frankl-graphs .

FC-families have been introduced by Poonen frankl-poonen who gave a necessary and a sufficient condition for a family to be FC (based on weight functions). The term FC-family was coined by Vaughan frankl-vaughan-1 , and they were further studied by Gao and Yu frankl-gao-yu , Vaughan frankl-vaughan-1 ; frankl-vaughan-2 ; frankl-vaughan-3 , Morris frankl-morris , Marković frankl-markovic , Bošnjak and Marković frankl-bosnjak-markovic , and Živković and Vučković frankl-zivkovic-vuckovic . Poonen frankl-poonen proved that FC(, ) = . Vaughan frankl-vaughan-1 ; frankl-vaughan-2 ; frankl-vaughan-3 showed that FC(, ) and FC(, ) . Morris frankl-morris gives a full characterization of all FC()-families. He proves that FC(, )=, FC(, )=. Also, he proves that FC(, )= and FC(, ) . His proofs rely on computer programs, but these are not verified and not even presented in the article (as they are ,,fairly simple-minded”). In our previous work frankl-cicm we formally confirmed all these results within a theorem prover, additionally formally proving that FC(, ) .

Computer-assisted computational approach was applied by Morris frankl-morris and Živković and Vučković frankl-zivkovic-vuckovic for solving special cases of the Frankl’s conjecture. In the latter case, computations are performed by unverified Java programs.

## 2 Background and notation

Logic and the notation given in this paper will follow Isabelle/HOL, with some minor simplifications to make it approachable to wider audience. Isabelle/HOL isabelle is a development of Higher Order Logic (HOL), and it conforms largely to everyday mathematical notation. Embedded in a theory are the types, terms and formulae of HOL. The basic types include truth values (), natural numbers () and integers ().

Terms are formed as in functional programming by applying functions to arguments. Following the trandition of functional programming, functions are curried. For example, denotes the function applied to the arguments and then (in classical mathematics notation, this would usually be denoted by ).Terms may also contain -abstractions. For example, is the function that takes an argument and returns . Let-expressions, if-expressions, and case-expressions are also supported in terms. Let expressions are of the form ””. This expressions is equivalent to the one obtained from the term by substituting all free occurrences of the variable by the . For example ”” is equivalent to ””. If expression is of the form ””. Case expressions are of the form ””. This is equivalent to if matches the pattern .

Formule are terms of the type . Standard logical connectives (, , , and ) are supported. Quanfiers are written using dot-notation, as , and .

New functions can be defined by recursion (either primitive or general).

Sets over type , type , follow the usual mathematical conventions333In a strict type setting, sets containing elements of mixed types are not allowed.. In the presentation we use the term set for sets of numbers and denote these by , , …, the term family for sets of sets (i.e., object of the type ) of numbers and denote these by , , …and the term collection for sets of families (i.e., object of the type ) and denote these by , , …. The powerset (set of all subset) of a set will denoted by . Union of sets and is denoted by , and the union of all sets in a family is denoted by . Image of a set under a function is denoted by . In this paper, the number of elements in a set will be denoted by . The set will be denoted by .

Lists over type , type , come with the empty list , and the infix prepend constructor (every list is either or is of the form and these two cases are usually considered when defining recursive functions over lists). Standard higher order functions , , are supported and very often used for defining list operations (for details see isabelle ). In this paper, the N-th element of a list will be denoted by (positions are zero-based). denotes the list obtained from by removing its last argument. If contains natural numbers, is the list obtained from be decreasing its last element, and is the list obtained from by increasing its -th element. The predicate checks if the list has no repated elements, and the function removes duplicates from the list . List will be denoted by .

All definitions and statements given in this paper are formalized within Isabelle/HOL444Formal proofs are available at http://argo.matf.bg.ac.rs/downloads/formalizations/FCFamilies.zip. However, in order to make the text accessible to a more general audience not familiar with Isabelle/HOL, many minor details are omitted and some imprecisions are introduced. For example, we use standard symbolic notation common in related work, although it is clear that some symbols are ambiguous. Also, in the paper some notions will be defined by only using sets, while in the formalization they are defined by using lists (to obtain executability). Statements are grouped into propositions, lemmas, and theorems. Propositions usually express simple, technical results and are printed here without proofs, while the proofs of lemmas and theorems are given in the Appendix. All sets and families are considered to be finite and this assumptions (present in Isabelle/HOL formalization) will not be explicitly stated in the rest of the paper.

## 3 Basic notions

Since we are only dealing with finite sets and families, without loss of generality we can restrict the domain only to natural number domains.

A family over is a collection of sets such that . The collection of all families over will be denoted by .

### 3.1 Union-Closed Families

First we give basic definitions of union-closed families, closure under unions, and operations used to incrementally obtain closed families. Let .

Let and be families.

A family is union-closed, denoted by , iff , (i.e. ). A family is union-closed for , denoted by , iff , (i.e. ).

Union-closure of (abbr. closure), denoted by , is the minimal family of sets (in sense of inclusion) that contains and is union-closed.

Union-closure of for (abbr. closure for ), denoted by , is the minimal family of sets (in sense of inclusion) that contains and is union-closed for .

Insert and close operation of set to family , denoted by , is the family . Insert and close operation for of set to family , denoted by , is the family .

The following proposition gives some trivial properties of these notions.

###### Proposition 1
1. ,

2. If and then .

3. If and then .

### 3.2 The Frankl’s Condition

The next definition formalizes the Frankl’s condition and the notion of FC-family.

Family of sets is a Frankl’s family, denoted by , if it contains an element that satisfies the Frankl’s condition for , i.e., that occurs in at least half sets in the family . Formally, , where denotes .

### 3.3 FC-families

Family of sets is an FC-family if in every union-closed family such that one of the elements of satisfies the Frankl’s condition for . Every family that is not an FC-family is called a nonFC-family.

The next propositions give some properties of FC-families.

###### Proposition 2

Any superset of an FC-family is an FC-family. Any subset of a nonFC-family is a nonFC-family.

###### Proposition 3

A family is an FC-family iff the family is an FC-family.

###### Proposition 4

A family is an FC-family iff its closure is an FC-family.

## 4 Proving that a Family is FC

In this section we describe techniques that can be used to formally prove that a given family is FC. Most statements will be given without proofs, since the proofs are available in frankl-cicm .

### 4.1 Weight Functions and Shares

We describe the central technique for proving that a family is FC, relying on characterizations of the Frankl’s condition using weights and shares introduced by Poonen frankl-poonen , but adapted to work in a proof-assistant environment.

A function is a weight function on , denoted by , iff . Weight of a set wrt. weight function , denoted by , is the value . Weight of a family wrt. weight function , denoted by , is the value .

An important technique for checking Frankl’s condition is averaging — family is Frankl’s if and only if there is a weight function such that weighted average of number of occurrences of all elements exceeds . A more formal formulation of this claim (that uses only integers and avoids division) is given by the following Proposition.

###### Proposition 5

A concept that will enable a slightly more operative formulation of the previous characterization is the concept of share (again, to avoid rational numbers, definition is different from that is used in the literature). Let be a weight function. Share of a set wrt.  and a set , denoted by , is the value . Share of a family wrt.  and a set , denoted by , is the value .

###### Example 1

Let be a function such that , and for all other elements. is clearly a weight function. Then, and . Also, and

#### Union-closed extensions

The next definition introduces an important notion for checking FC-families. Union-closed extensions of a family are families that are created from elements of the domain of and are union closed for . Collection of all union-closed extensions is denoted by , and defined by .

The following theorem corresponds to first direction of Poonen’s theorem (Theorem 1 in frankl-poonen ). The proof is formalized within Isabelle/HOL and its informal counterpart is given in the Appendix.

###### Theorem 1

A family is an FC-family if there is a weight function such that shares (wrt.  and ) of all union-closed extension of are nonnegative, i.e., .

In the rest of this section we show two different possibilities for searching for a union-closed extension with a negative share – the first is based on a specialized algorithm, crafted specifically for this problem, while the other is based on integer linear programming and employs an integer linear programming (ILP) package or a satisfiability modulo theory (SMT) solver.

### 4.2 Search for Negative Shares

Theorem 1 inspires a procedure for verifying FC-families. It should take a weight function on and check that all union-closed extensions of have nonnegative shares. There are only finitely many union-closed extensions, so in principle, they can all be checked. However, in order to have efficient procedure, naive checking procedure will not suffice and further steps must be taken. We now define a procedure SomeShareNegative, denoted by , such that iff there is an such that . The procedure is based on a recursive function that preforms a systematic traversal of all union-closed extensions of , but with pruning that significantly speeds up the search. The procedure has four parameters (, , , and ) that we now describe. The two fixed parameters of the function (parameters that do not change troughout the recursive calls) are the family and the weight function . If a union-closed extension of has a negative share, it must contain one or more sets with a negative share. Therefore, a list of all different subsets of with negative shares is formed and each candidate family is determined by elements of that it includes. A recursive procedure creates all candidate families by processing elements of that list sequentially, either skipping them (in one recursive branch) or including them into the current candidate family (in the other recursive branch), maintaining the invariant that the current candidate family is always from . The two parameters of the recursive function that change during recursive calls are the remaining part of the list and the current candidate family . If the current leading element of has been already included in (by earlier closure operations required to maintain the invariant) the search can be pruned. If the sum of (negative) shares of (the remaining elements of ) is less then the (nonnegative) share of the current , then cannot be extended to a family with a negative share (even in the extreme case when all the remaining elements of are included) so, again, the search can be pruned.

The function is defined by a primitive recursion (over the structure of the list ):

 ssnFc,w [] Ft ≡ ¯w(⋃Fc)(Ft)<0 ssnFc,w (h#t) Ft ≡ if ¯w(⋃Fc)(Ft)+∑A∈h#t¯w(⋃Fc)(A)≥0 then ⊥ else if ssnFc,w t Ft then ⊤ else if h∈Ft then ⊥ else ssnFc,w t (icFc h Ft)

Let be a distinct list such that its set is .

 ssn Fc w≡ssn⟨Fc⟩,w L {}

The soundnes of the function is given by the following propositions.

###### Proposition 8

If (i) , (ii) for all elements in it holds that , (iii) for all , if , then is in , (iv) , and (v) , then .

###### Proposition 9

If and then .

Apart from being sound, the procedure can also be shown to be complete. Namely, it could be shown that if , then there is an such that . This comes from the invariant that the current family in the search is always in , which is maintained by taking the closure whenever an element is added. Since this aspect of the procedure is not relevant for the rest of the proofs, it will not be formally stated nor proved. However, this can give a method for finding a counterexample family for a given weight function, that can be useful for fully automated classification of a given family (described in Section 6), that we use to find the minimal FC-families (as described in Section 7.4).

#### Optimizations

Important optimization to the basic procedure is to avoid repeated computations of family shares (both for the elements of the list and the current family ). So, instead of accepting a list of families of sets , and the current family of sets

, the function is modified to accept a list of ordered pairs where first component is a corresponding element of

, and the second component is its share (wrt.  and ), and to accept an ordered pair where is its family share (wrt.  and ). The summation of shares of elements in is also unnecessarily repeated. It can be avoided if the sum is passed trough the function.

 ssnFc,w ([],0) (Ft,st) ≡ st<0 ssnFc,w ((h,sh)#t,sl) (Ft,st) ≡ if st+sl≥0 then ⊥ else if ssnFc,w (t,sl−sh) (Ft,st) then ⊤ else if h∈Ft then ⊥ else let F′t=icFc h Ft; s′t=¯w(⋃Fc)(F′t) in ssnFc,w (t,ls−sh) (F′t,s′t)

Another source of inefficiency is the calculation of . If performed directly based on the definition of family share for , the sum would contain shares of all elements from and of all elements that are added to when adding and closing for . However, it is already known that the sum of shares for elements of is and the implementation could benefit from this fact. Also, calculating shares of sets that are added to can be made faster. Namely, it happens that set share of a same set is calculated over and over again in different parts of the search space. So, it is much better to precompute shares of all sets from and store them in a lookup table that will be consulted each time a set share is needed. Note that in this case there is no more need to pass the function itself, nor to calculate the domain , but only the lookup table, denoted by .

 ssnFc,sw ([],0) (Ft,st) ≡ st<0 ssnFc,sw ((h,sh)#t,sl) (Ft,st) ≡ if st+sl≥0 then ⊥ else if ssnFc,sw (t,sl−sh) (Ft,st) then ⊤ else if h∈Ft then ⊥ else ssnFc,sw (t,sl−sh) (icswFc h (Ft,st))
 icswFc h (Ft,st) ≡ let  add = {h} ∪ (Ft⊎{A}) ∪ (Fc⊎{A}); new = {A∈add. A∉Ft} in (new∪Ft, st+∑A∈newsw A)

We have shown that this implementation is equivalent to the starting, abstract one (it returns false iff there is a union-closed extension with a negative share).

### 4.3 Integer linear programming

An alternative to using a specialized, verified procedure is to encode the existence of a union-closed extension with a negative share as a linear integer programming problem and to employ an existing solver to do the search pulaj . In our case, we need to formally prove (within the Isabelle/HOL) that our characteristic FC-families are indeed FC, so the SMT solver Z3 integrated within Isabelle/HOL can be used isabelle-z3 .

Assume that is given and is such number that . Each subset of can be either included or excluded from a family . There are such subsets, wich is significantly less than the number of families which is bounded above by . For each set we define a 0-1 integer (or Boolean) variable , and its value is 1 iff the set is included in the sought family i.e., . We must encode that the family is union-closed, so for every two sets and it must hold that and imply that , that is , which can be encoded as

 xA+xB≤1+xA∪B.

Next we must encode that family is closed for , so for every set and it must hold that , which can be encoded as

 xB≤xA∪B.

Finally, we should encode that has a negative share, i.e., . Since , the condition is equivalent to

 ∑A⊆{¯¯¯n}xA⋅¯w{¯¯¯n}(A)<0.

The conjunction of the three listed types of linear inequalities is given to the SMT solver and it returns a model iff there is an union-closed extension of with a negative share (values of variables uniquely determine that extension ). The result of the SMT solver (a model, or an unsatisfiability proof) is then verified by Isabelle/HOL, yielding a fully formally verified proof isabelle-z3 .

Note that the problem could be stated as a problem over rational weights, but in our whole framework we considered only integers, and it turned out that the search is efficient enough.

## 5 Proving that a Family is not FC

Proving that a family is not an FC-family is also based on the Poonen’s theorem (Theorem 1 in frankl-poonen ). The converse of our Theorem 1 also holds, and if there is no weight function satisfying the conditions of Theorem 1, then the family

is not an FC-familly. However, this is hard to prove formally within Isabelle/HOL (the original Poonen’s proof uses the hyperplane separation theorem for convex sets), so we formally proved the following variant that is both easier to prove and more suitable for further application.

###### Theorem 2

Assume that is a union-closed family. If there exists a sequence of families , and a sequence of natural numbers that:

1. for all it holds that ,

2. for every it holds that

 k∑i=0ci⋅(2⋅#aFi−|Fi|)<0,
3. not all are zero (i.e., ),

then the family is not an FC-family.

Major differences between this and Poonen’s original formulation are that instead of real we use only natural numbers, that instead of considering the whole collection we consider only some of its members, and that instead of showing that there is no weight function with non-negative shares for those selected union-closed extensions i.e., showing that the system i.e., , for all every , has no all-nonnegative, non-all-zero solutions, we show that the its dual system , for every , has a nontrivial solution ( equals the difference between the number of members of that contain and the number of members of that do not). The proof follows Poonen (to most extent) and is given in the Appendix.

Note that once the sequence of families and the sequence of numbers are known, the formal proof is much easier than in the FC case, as it need not use any search (all conditions of Theorem 2 can be directly checked). Finding those sequences is not trivial, but it can be done outside Isabelle/HOL.

## 6 Procedure for checking FC-status of a given family and finding witnesses

To prove that a family is FC based on Theorem 1 one requires a witnessing weight function . To prove that a family is nonFC based on Theorem 2 one requires a witnessing sequence of families and numbers . For the final formal proof of the FC-status of characteristic families it is not important how those witnesses are obtained. It is very desired to have a procedure that can obtain them fully automatically. Pulaj suggested the first algorithm capable of checking the FC-status of an arbitrary family based on the cutting planes method and linear (integer) programming implemented in SCIP pulaj , and it can easily be modified to provide required witnesses (both for the FC and the nonFC case). Note that such procedure need not be implemented within Isabelle/HOL – its purpose is to determine the status and give witnesses that can be used for Theorem 1 or 2, which are formally checked within Isabelle/HOL.

Assume that a family is given. The procedure alternates two phases. In the first one a candidate weight function is constructed, and in the second it is checked if it satisfies the condition of Theorem 1.

In the first phase, the candidate weight function (represented by unkwowns , for ) is constructed by solving a system of linear integer inequalities (as we use only natural numbers in our framework). In the beginning the system contains only conditions required for a weight function ( and ), but as new families are constructed in the second phase, it is extended by the condition , for each family obtained in the second phase. If the current system becomes unsatisfiable, than is not FC-family, the current set of families can be used as a witness for Theorem 2 and the coefficients are obtained by solving its dual system. Otherwise, its solution is the candidate weight function used in the second phase.

In the second phase it is checked if the weight function satisfies the conditions of Theorem 1 i.e., that there is no union-closed extension of with a negative share wrt. . For this, either one of the two approaches described in Section 4.2 (either on the procedure or solving the system of linear inequalities) can be used. If all shares are non-negative, then is an FC-family and the current weight function is used as a witness to formally prove that using Theorem 1. If it does not, than the procedure constructs a family that is in the union-closed extension of and has a negative share. That family is then added to the current set of such families and fed into the first phase again.

Unlike in the final Isabelle/HOL proofs, in the experimentation phase non-verified implementations can be used (since the final witnesses are checked again, using Isabelle/HOL). Therefore, in our implementation we have used the ILP package SCIP (the same one used in pulaj ) in all three cases (solving the system for finding a candidate weight, solving the system to find coefficients based on the sequence of families for which finding the weight function was shown to be impossible, and for solving the system that finds a family with a negative share wrt. the current weight function ), as our preliminary experiments indicated that it gives results faster then the SMT solver Z3 (when run outside Isabelle/HOL). Interestingly, the procedure often gave a family faster then SCIP, but the overall procedure required more iterations (we assume that this can be attributed to a very regular order in which enumerates families). One additional technique for which we noticed that significantly speeds up the convergence is to favor smaller weights i.e., to require that the weight function is minimal wrt. its sum of the weights (this was possible to obtain in SCIP by using its built-in optimization features and the objective function ).

## 7 Characteristic families

In this section we introduce the notion of FC-covering and nonFC-covering that enables to determine the FC-status of all families from from the status of just a small number of FC and nonFC-families that are characteristic in some sense (that we shall precisely define). Our goal is to give a full characterization of all families from (i.e., for each family to determine whether it is an FC-family or a nonFC-family), and in theory that can be done by explicitly checking the status for each of them. In practice that is almost impossible since even for there are families. However, (i) many of them are isomorphic and (ii) many have the same closure and (iii) many include smaller FC-families or are included in larger nonFC families – we shall show that in all those cases the FC-status can be deduced from the already known status of other families, so we base our definitions of characteristic families and covering on those facts. We shall devise methods that explicitly check the FC-status for only a minimal set of characteristic families, and after that enable us to easily get the status of every family from by checking if they are covered by the characteristic ones.

### 7.1 Isomorphic families. Representing collections. Bases.

Bijective changes of the domain of a family do not affect if the family is FC.

Two families and are isomorphic (denoted by ) if there is a bijective function between and such that .

If we consider families and , they are clearly isomorphic, so we consider only families over . The family also shares the same structure with the previous two (although, that might not be so obvious, consider the bijection )), so there are also many isomorphic families over .

Obviously, isomorphism is an equivalence relation and isomorphic families share all structural properties relevant to us ( is union closed iff and only if is, satisfies the Frankl’s condition iff does, the same holds for FC-family condition etc.).

###### Proposition 10

If then is an FC-family iff is an FC-family.

#### Checking if the two families in {{¯¯¯n}} are isomorphic

One (naive) method to check if the two given families are isomorphic is to check if the second family is among the families obtained by applying all the permutations of to the first family.

Another approach can be on defining the canonical representative for each family. It can be the minimal family among the families obtained by applying all permutations in to that family, where families are compared based on some fixed ordering (i.e., a lexicographic ordering, where the sets are also ordered lexicographically). Then two families are isomorphic iff they have the same canonical representative.

There are more efficient orderings and methods of finding the canonical representative, which avoid considering all permutations of generationUCMoore , but since we only consider the case where the number of permutations is rather small, we use only the naive methods.

#### Iso-representatives and iso-bases

If a collection of families contains many families whose structural properties should be checked, it suffices to focus only on a single representative from each isomorphism equivalence class.

A collection iso-represents the collection if for every there exists an such that . If there are no and such that , then is a iso-base of .

Iso-base of a given collection can be found algorithmically. Computation can start from the given collection , choose its arbitrary member for a representative, move it to the resulting collection, remove it and all its permuted variants from the original collection (under a given set of permutations), and repeat this sieving process until the list becomes empty. Isabelle/HOL implementation of this procedure will be denoted by and its implementation is available in our formal proof documents.

###### Proposition 11

If is a list of permutations of and if is a collection of families from , then iso-represents . If contains all permutations of , then is an iso-base of .

If an ordering of families is defined, another way to obtain an iso-base is to find the canonical representative of each family, and form the set of all different canonical representatives.

### 7.2 Irreducible families

Another technique that reduces the number of families for which the FC-status explicitly needs to be checked is based on the fact that the FC-status of a family depends only on its closure (and not the family itself). From Proposition 4 the following immediately follows.

###### Proposition 12

If then is an FC-family iff is an FC-family.

A set is dependent on a family (denoted by ) if it is a union of some of its members (i.e., if ).

###### Proposition 13

1. If a set is dependent on a family then .

2. If a set is dependent on a family , then .

Therefore, sets that can be expressed as unions of other sets of a family do not affect its closure. Irreducible family is obtained if all dependent sets are removed (so this family is minimal in some sense and it is a basis of its closure kim ).

A family is irreducible if none of its sets can be expressed as a union of some of its other members (i.e., if ).

For each family, an irreducible family can be obtained by removing all expressible sets, one by one until there are no more such sets. This procedure is guaranteed to terminate for finite sets.

###### Proposition 14

Each family has an irreducible subfamily such that .

The following interesting (and non-trivial) lemma, proved in kim and formally proved in the Appendix, shows that for all families having a same closure there is a unique irreducible family, and the previous procedure will always yield the same final answer in whatever order the sets are removed.

###### Lemma 1

If and are irreducible families and , then .

### 7.3 Covering

Total FC characterization of all families in is done by defining two collections and such that all families in are FC and that all families in are nonFC, and such that the status of each given family can easily be determined by an element of or (we say that the given family is covered by and ). The following definition formalizes the notion of covering and relies on Proposition 2 and Proposition 3, Proposition 4, and some trivial properties of isomorphic families.

1. A family is FC-covered by a family (denoted by ) if there exists such that and . A family is FC-covered by a collection of families (denoted by ) if there is an such that is FC-covered by (i.e., ). A collection of families is FC-covered by a collection of families (denoted by ) if all families are covered by (i.e., ).

2. A family is nonFC-covered by (denoted by ) if there is an such that and . A family is nonFC-covered by a collection of families (denoted by ) if there is an such that is nonFC-covered by (i.e., ). A collection of families is nonFC-covered by a collection of families (denoted by ) if all families are covered by (i.e., ).

3. A family is covered by and (denoted by ) if it is FC-covered by or it is nonFC-covered by . A collection of families is covered by and (denoted by ) if all its families are FC-covered by or nonFC-covered by .

The next lemma (proved in the Appendix) shows that our notion of covering guarantees that FC-covered families are FC and that nonFC-covered families are not FC.

###### Lemma 2

1. Any family that is FC-covered by an FC-family is an FC-family.

2. Any family that is nonFC-covered by a nonFC-family is not an FC-family.

An important aspect of our definition of the notion of covering is that its condition can be checked easily. First, there is only a relatively small number of possible families from that are isomorphic to a given family in (for example, for , these are generated by permutations of the domain). As the closure of a family can be easily effectively computed, it remains to check only if it contains one of these isomorphs, and this can be performed easily.

The following proposition gives some other easy consequences of the covering definition.

###### Proposition 15

1. Let . If then . If then .

2. If and then . If and then .

3. If , then . If , then . If , then .

Since covering is preserved by isomorphisms, to show that a collection is covered it suffices to show that its iso-base is covered, as shown by the following lemma proved in Appendix.

###### Lemma 3

Assume that iso-represents . If , then . If , then . If , then .

Similarly, if two families have the same closure, even if they are not equivalent, one is covered iff the other one is.

###### Proposition 16

If then is covered by and iff is.

Therefore, the following lemma, proved in Appendix, reduces the problem of checking all families in to checking just the irreducible ones.

###### Lemma 4

If all irreducible families in are covered by and , then all families in are covered by and .

### 7.4 Minimal FC-familes and maximal nonFC-families

We want to have as few as possible characteristic families, so we want all our characteristic families to be extreme in some sense.

1. An FC-family is minimal if it is irreducible and removing each of its sets yields a nonFC-family.

2. A nonFC-family is maximal if it is union-closed and every new set addeded yields an FC-family.

It can be easily shown that minimal and maximal families are exactly those that cannot be covered by smaller or larger families.

###### Proposition 17

1. A FC-family is minimal iff it is not FC-covered by any other family.

2. A nonFC-family is maximal iff it is not nonFC-covered by any other family.

## 8 Enumerating families

Next we describe efficient generic procedures for enumerating all families with certain properties. Note that all concepts in this section are generic and can be used in a wider context than checking the FC-status.

### 8.1 L-partitioning

In this section we develop efficient methods to enumerate all families in that have certain properties. As it is usually the case, an inductive construction gives good results. Larger families, can be obtained from the smaller ones, by adding new sets. A good attribute of a family that can be used to control the inductive construction is the number of its members of each cardinality.

Let be the list . A family is -partitioned if it consists of empty sets, sets with 1 element, …, and sets with elements, (i.e.,