A Group-Theoretic Approach to Abstraction: Hierarchical, Interpretable, and Task-Free Clustering

07/30/2018 ∙ by Haizi Yu, et al. ∙ University of Illinois at Urbana-Champaign 0

Abstraction plays a key role in concept learning and knowledge discovery. While pervasive in both human and artificial intelligence, it remains mysterious how concepts are abstracted in the first place. We study the nature of abstraction through a group-theoretic approach, formalizing it as a hierarchical, interpretable, and task-free clustering problem. This clustering framework is data-free, feature-free, similarity-free, and globally hierarchical---the four key features that distinguish it from common clustering models. Beyond a theoretical foundation for abstraction, we also present a top-down and a bottom-up approach to establish an algorithmic foundation for practical abstraction-generating methods. Lastly, using both a theoretical explanation and a real-world application, we show that the coupling of our abstraction framework with statistics realizes Shannon's information lattice and even further, brings learning into the picture. This gives a first step towards a principled and cognitive way of automatic concept learning and knowledge discovery.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 39

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Abstraction describes the process of generalizing high-level concepts from specific data samples by “forgetting the details” (Weinberg, 1968; Giunchiglia and Walsh, 1992; Saitta and Zucker, 1998). This conceptual process is pervasive in human reasoning, and it is evident that more advanced concepts can be abstracted once a “conceptual base” is established (Mandler, 2000). However, it remains mysterious how concepts are abstracted in the first place, which is generally attributed to innate biology (Mandler, 2000; Gómez and Lakusta, 2004; Biederman, 1987).

Considering artificial intelligence rather than biological minds, there are now algorithms to automate abstraction in various concept learning tasks (Saitta and Zucker, 2013; LeCun et al., 2015; Bredeche et al., 2006; Yu et al., 2016). However, almost all require handcrafted priors—the counterpart to innate biology (Marcus, 2018; Dietterich, 2018)

. While a prior can take many forms such as rules in automatic reasoning, distributions in Bayesian inference, features in classifiers, or architectures in neural networks, it is typically task-specific and/or domain-specific

(Raina et al., 2006; Yu et al., 2007; Krupka and Tishby, 2007). Therefore, extensive hand-design from domain knowledge is sometimes considered “cheating” if one hard codes all known high-level abstractions as priors (Ram and Jones, 1994). This motivates us to consider only universal priors as innate knowledge for abstraction.

This paper establishes both a theoretical and an algorithmic foundation for abstraction. It is worth noting that our abstraction framework is universal in the following two senses. First, we consider the general question of conceptualizing a domain—a task-free preparation phase before specific problem solving (Zucker, 2003). Second, we consider symmetries in nature (or groups in mathematics)—a universal prior that encodes no domain knowledge. The ultimate goal is to learn domain concepts/knowledge when our group-theoretic abstraction framework is connected to statistical learning. This is contrary to much prior work at the intersection of group theory and learning (Kondor, 2008) that often encodes domain-relevant symmetries in features or kernels rather than learns them as new findings.

1.1 Theoretical Foundation for Abstraction

Existing formalizations of abstraction form at least two camps. One uses mathematical logic where abstraction is explicitly constructed from abstraction operators and formal languages (Saitta and Zucker, 2013; Zucker, 2003; Bundy et al., 1990)

; another uses deep learning where abstraction is hinted at by the layered architectures of neural networks

(LeCun et al., 2015; Bengio, 2009). Their key characteristics—commonly known as rule-based (deductive) and data-driven (inductive)—are quite complimentary. The former enjoys model interpretability, but requires explicit handcrafting of complicated logic with massive domain expertise; the latter shifts the burden of model crafting to data, but makes the model less transparent.

This paper takes a new viewpoint, aiming for a middle ground between the two camps. We formalize abstraction as a symmetry-generated clustering, or more precisely a group-generated partition, which admits statistical learning. By clustering, we forget within-cluster variations and discern only between-cluster distinctions (Bélai and Jaoua, 1998; Sheikhalishahi et al., 2016), revealing the nature of abstraction (Livingston, 1998)

. While clustering is common in machine learning

(Duda et al., 2012, chap. 10), our clustering model is in stark contrast with the common settings, as follows.

  • Data-free. Our clustering model considers partitioning an input space rather than data samples. It is treated more as conceptual clustering than data clustering like -means (Michalski and Stepp, 1983; Fisher, 1987): clusters are formed in a mechanism-driven, not data-driven, fashion; and the mechanisms considered here are symmetries. The process is causal, and the results are interpretable. More importantly, a single clustering mechanism transfers to multiple domains, and a single clustering result transfers to various datasets.

  • Feature-free.

    Our clustering model involves no feature engineering, so no domain expertise. This particularly means three things. First, no feature design for inputs: we directly deal with mathematical spaces, e.g. vector spaces or manifolds. Second, no feature/assignment function for cluster designation: this differs from algorithms that hand-design abstraction operators

    (Zucker, 2003), arithmetic descriptors (Yu et al., 2016)

    , or decision-tree-like feature thresholding

    (Sheikhalishahi et al., 2016). Third, no meta-feature tuning such as pre-specifying the number of clusters.

  • Similarity-free. Our clustering model does not depend on a predefined notion of similarity. This differs from most clustering algorithms where much effort has been expended in defining “closeness” (Raman and Varshney, 2018; Rand, 1971). Instead, pairwise similarity is replaced by an equivalence relation induced from symmetry. Note that the definitions of certain symmetries may require additional structure of the input space, e.g. topology or metric, but this is not used as a direct measurement for inverse similarity. Therefore, points that are far apart (in terms of metric distance) in a metric space can be grouped together (in terms of equivalence) under certain symmetries, resulting in a “discontinuous” cluster comprising disconnected regions in the input space. This is not likely to happen for algorithms such as -means.

It is noteworthy that being feature-free and similarity-free makes a clustering model universal (Raman and Varshney, 2018), becoming more of a science than an art (Von Luxburg et al., 2012)

. Besides the above three distinguishing features, our clustering model exhibits one more distinction regarding hierarchical clustering for multi-level abstractions:

  • Global hierarchy. Like many hierarchical clusterings (Jain and Dubes, 1988; Rokach and Maimon, 2005), our clustering model outputs a family of multi-level partitions and a hierarchy showing their interrelations. However, here we have a global hierarchy formalized as a partition (semi)lattice, which is generated from another hierarchy of symmetries represented by a subgroup lattice. This is in contrast with greedy hierarchical clusterings such as agglomerative/divisive clustering (Cormack, 1971; Kaufman and Rousseeuw, 2009) or topological clustering via persistent homology (Oudot, 2015). These greedy algorithms lose many possibilities for clusterings since the hierarchy is constructed by local merges/splits made in a one-directional procedure, e.g. growing a dendrogram or a filtration. In particular, greedy hierarchical clustering is oft-criticized since it is hard to recover from bad clusterings in early stages of construction (Oudot, 2015). Lastly, our global hierarchy is represented by a directed acyclic graph rather than tree-like charts (essentially a linear structure) such as dendrograms or barcodes.

1.2 Algorithmic Foundation for Abstraction

Besides a group-theoretic formalism of hierarchical abstractions as clusterings induced from hierarchical symmetries, we introduce two general principles, a top-down approach and a bottom-up approach, which systematically enumerate symmetries to construct hierarchical abstraction families. Each principle leverages a different duality developed in the formalism, and leads to practical algorithms that realize the abstraction generating process.

  • A top-down approach. Starting from all possible symmetries, we gradually restrict our attention to certain types of symmetries which can lead to practical abstraction-construction algorithms. In general, the choices for restrictions can be made arbitrarily. However, it turns out that we can find a complete identification of all symmetries induced from affine transformations, where we explicitly give a full parametrization of affine symmetries. This complete identification not only decomposes a large symmetry-enumeration problem into smaller enumeration subproblems, but also suggests ways of adding restrictions to obtain desired symmetries. This approach from general symmetries to more restrictive ones corresponds to top-down paths in the symmetry hierarchy, which explains where the name comes from.

  • A bottom-up approach. Starting from a set of atomic symmetries, we generate all symmetries that are seeded from the given set. Based on a strong duality result developed in the formalism, we introduce an induction algorithm which directly computes a hierarchical family of abstractions without explicitly enumerating the corresponding symmetries. This induction algorithm is much more efficient than generating all abstractions from scratch, i.e. from symmetries. So, it is a good choice to quickly build an abstraction family in the first place, after which one can fine tune the generating set to balance the trade-off between efficiency and expressiveness. This approach from atomic symmetries to more complicated ones corresponds to bottom-up paths in the symmetry hierarchy, which explains where the name comes from.

The rest of the paper is organized as follows. In Section 2, we describe abstraction informally to give intuition on its nature and key properties via everyday examples; all points made in this section are formalized and algorithmically realized in later sections. Section 3 sets up the theoretical foundation for abstraction, where we formalize abstraction, symmetry, and their hierarchies; and cast symmetry-generated abstractions in a primal-dual viewpoint. Sections 4 and 5 set up the algorithmic foundation for abstraction, where we introduce the top-down approach and the bottom-up approach, respectively. In Section 6, we describe tricks and cautions in real implementations where abstractions have to be restricted to finite subspaces of an input space. In Section 7, we discuss connections to Shannon’s information lattice—a special case under our abstraction formalism—and a real application that realizes learning in an information lattice for automatic concept learning.

2 Abstraction: Informal Description

We informally discuss abstraction by drawing examples from different domains and summarizing their commonalities. Although expressed in everyday terms from specific domains, the conclusions from this section cover all key properties of abstraction that the remainder of the paper aims to capture formally. In particular, the rest of the paper formalizes the ideas from this section in a precise and general manner that, importantly, leads to principled algorithmic approaches for automatic concept learning and knowledge discovery.

2.1 Everyday Abstraction

Whether aware or not, abstraction is everywhere in our daily behaviors. It is in the nature of abstraction that it treats the set of instances that it subsumes as if they were qualitatively identical, although in fact they are not (Livingston, 1998). Examples of people making abstractions can be as simple as observing ourselves through social categories such as race or gender (Macrae and Bodenhausen, 2000); or as complicated as a systematic taxonomy of a subject domain. Here, we present examples of two systematic abstractions (Figure 1): one is from a taxonomy of animals; the other is from a classification of music chords.

Figure 1: Hierarchical abstractions of Animal Kingdom (left) and music chords (right). Both hierarchies are essentially linear, e.g. kindom phylum class species.

2.2 Abstraction in Common

There are many commonalities in examples from Section 2.1 as well as in many other real-life examples of abstraction. We summarize the key properties shared in these abstraction examples, which will be formalized in the following sections.

Nature of abstraction: clustering or classification?

One shared property among many examples of abstraction is the idea of clustering and then forgetting within-cluster variations. For instance, we cluster people into {men, women}, forgetting the difference between John and David, Mary and Rachel; we cluster animals with a backbone into {fish, amphibians, reptiles, birds, mammals}, forgetting the difference between penguins and eagles, bats and dogs; we cluster music triads into {major, minor, augmented, diminished, }, forgetting the difference between C-E-G and F-A-C, C-E

-G and A-C-E. This idea of clustering is pervasive in various definitions of abstraction, but more often termed as classification (or categorization, taxonomy). Although clustering and classification (likewise clusters and classes) are more or less synonyms in everyday life, there is a clear difference between the two in machine learning. The former generally falls under the realm of unsupervised learning, whereas the latter falls under supervised learning. The difference is merely whether or not there is a label for each cluster. Note that labels are important in supervised learning, since a perfect binary classifier with a 100% accuracy is clearly different from a bad one with a 0% accuracy. However, in light of clustering, the two classifiers are identical: the “bad” one, for instance, simply calls all men as women and all women as men, but still accurately captures the concept of gender. Consequently in this paper, we treat the nature of abstraction as clustering rather than classification, further formalized as a partition or equivalence relation. So, men and women are two equivalence classes of people: all men are equivalent, so are all women. An extended discussion on clustering and classification can be found in Section 

7.1

, relating to information elements and random variables.

Hierarchy.

Another shared property among many examples of abstraction is the presence of a hierarchy, where “later” abstractions can be made recursively from “earlier” ones. For instance, we cluster animals into {fish, birds, mammals, annelids, mollusks, }, and further cluster these abstracted terms into {vertebrates, invertebrates}; we cluster music chords into {major, minor, dominant, German, }, and further cluster these abstracted terms into {triads, seventh chords, sixth chords, }, and even further into {trichords, tetrachords, }. Hierarchy, being either explicit or implicit, brings the notion of level of an abstraction. For instance, biological taxonomy gives an explicit description of abstraction levels: kindom phylum class order family genus species; whereas the abstraction levels of music chords are relatively implicit but still present. In general, an abstraction hierarchy can be more complicated than simply linear due to various clustering possibilities. In this paper, a general hierarchy is formalized by a mathematical lattice.

Mechanism.

A third shared property among many examples of abstraction is the existence of a mechanism—a driving force that causes the resulting abstraction. For instance, the presence or absence of a backbone is the underlying mechanism that results in the abstraction of animals into vertebrates and invertebrates; the intervalic quality is the underlying mechanism that results in the abstraction of music chords. Having a mechanism is important for at least three reasons. First, it makes the abstraction process logical, so that every abstraction is made for a reason. This is a distinguishing feature in human intelligence, which is further key to the development of concepts and knowledge. Second, different mechanisms yield different abstractions, which further yield different attributes of an object. For instance, a bat can be abstracted as a mammal since, among many other reasons, it nurses its pups with milk; a bat can also be abstracted as a flying animal based on its capability of flying. In comparison, under the same two mechanisms, a penguin is abstracted as a bird but flightless. Third, perhaps most importantly, having a mechanism allows generalization, i.e. we can transfer a mechanism from one domain to another. For instance, generalizing the same mechanism under which we abstract people into men and women to other species, we get roosters and hens, bulls and cows, etc. As a result, we emphasize the generating mechanisms for abstractions. In this paper, we focus on symmetries—a type of domain-independent mechanism—and symmetry-generated abstractions.

Towards laws of nature.

Lastly, abstraction is a very important stage towards laws—or less seriously, rules or patterns—of nature (Schmidt and Lipson, 2009). An abstraction itself is not a rule, but an abstraction paired with a property describing that abstraction can be treated as a rule. For instance, the abstraction {fish, amphibians, reptiles, birds, mammals} of animals is not a rule, but a statement like “Most of the birds fly, whereas only a few fish, amphibians, reptiles, or mammals fly” is a rule which indicates what is special about this abstraction. While this paper focuses on abstractions only rather than rules, we discuss probabilistic rules made out of abstractions and their statistical properties in Section 7. There, we introduce the information lattice and a real implementation of probabilistic rule learning from our earlier work (Yu et al., 2016; Yu and Varshney, 2017; Yu et al., 2017).

3 Abstraction: Mathematical Formalism

We formalize an abstraction process on an underlying space as a clustering problem. In this process, elements of the space are grouped into clusters, abstracting away within-cluster variations. The outcome is a coarse-grained abstraction space whose elements are the clusters. Clustering is performed based on certain symmetries such that the resulting clusters are invariant with respect to the symmetries.

3.1 Abstraction as Partition (Clustering)

We formalize an abstraction of a set as a partition of the set, which is a mathematical representation of the outcome of a clustering process. Throughout this paper, we reserve to exclusively denote a set which we make abstractions of. The set can be as intangible as a mathematical space, e.g. , , a general manifold; or as concrete as a collection of items, e.g. {rat, ox, tiger, rabbit, dragon, snake, horse, sheep, monkey, rooster, dog, pig}.

Preliminaries (Appendix a.1):

partition of a set (), partition cell (); equivalence relation on a set (), quotient ().

An abstraction is a partition, and vice versa. The two terms refer to the same thing, with the only nuance being that one is used less formally, whereas the other is used in the mathematical language. When used as a single noun, these two terms are interchangeable in this paper.

A partition is not an equivalence relation. The two terms do not refer to the same thing (one is a set, the other is a binary relation), but convey equivalent ideas since they induce each other bijectively (Appendix A.1). In this paper, we use an equivalence relation to explain a partition: elements of a set are put in the same cell because they are equivalent. Based on this reason, abstracting the set is about treating equivalent elements as the same, i.e. collapsing equivalent elements in into a single entity (namely, an equivalence class or a cell) where collapsing is formalized by taking the quotient.

3.2 Abstraction Universe as Partition Lattice (Hierarchical Clustering)

A set can have multiple partitions, provided that . The number of all possible partitions of a set is called the Bell number . Bell numbers grow extremely fast with the size of the set: starting from , the first few Bell numbers are:

We use to denote the family of all partitions of a set , so . We can compare partitions of a set in two ways. One simple way is to compare by size: given two partitions of a set, we say that is no larger than (resp. no smaller than) if (resp. ). Another way of comparison considers the structure of partitions via a partial order on . The partial order further yields a partition lattice, a hierarchical representation of a family of partitions.

Preliminaries (Appendix a.2):

partial order, poset; lattice, join (), meet (), sublattice, join-semilattice, meet-semilattice, bounded lattice.

Let and be two abstractions of a set . We say that is at a higher level than , denoted , if as partitions, is coarser than . For ease of description, we expand the vocabulary for this definition, so the following are all equivalent:

  • , or equivalently (Figure 2).

  • As abstractions, is at a higher level than (or is an abstraction of ).

  • As partitions, is coarser than (or is a coarsening of ).

  • As abstractions, is at a lower level than (or is a realization of ).

  • As partitions, is finer than (or is a refinement of ).

  • Any in the same cell in are also in the same cell in .

  • Any in different cells in are also in different cells in .

Figure 2: The partial order compares the levels of abstractions.

It is known that the binary relation “coarser than” on the family of all partitions of a set is a partial order, so is the binary relation “at a higher level than” on abstractions. Given two partitions of a set, we can have , , or they are incomparable. Further, is a bounded lattice, in which the greatest element is the finest partition and the least element is the coarsest partition . For any pair of partitions , their join is the coarsest common refinement of and ; their meet is the finest common coarsening of and (Figure 3).

Figure 3: Two abstractions and their join and meet .

An abstraction universe for a set is a sublattice of , or a partition (sub)lattice in short. In particular, we call the partition lattice itself the complete abstraction universe for . An abstraction join-semiuniverse (resp. meet-semiuniverse) for a set is a join-semilattice (resp. meet-semilattice) of . An abstraction family for a set , an even weaker notion, is simply a subset of .

If the complete abstraction universe is finite, we can visualize its hierarchy as a directed acyclic graph where vertices denote partitions and edges denote the partial order. The graph is constructed as follows: plot all distinct partitions of starting at the bottom with the finest partition , ending at the top with the coarsest partition and, roughly speaking, with coarser partitions positioned higher than finer ones. Draw edges downwards between partitions using the rule that there will be an edge downward from to if and there does not exist a third partition such that . Thus, if , there is a path (possibly many paths) downward from to passing through a chain of intermediate partitions (and a path upward from to if ). For any pair of partitions , the join can be read from the graph as follows: trace paths downwards from and respectively until a common partition is reached (note that the finest partition at the bottom is always the end of all downward paths in the graph, so it is guaranteed that always exists). To ensure that , make sure there is no (indicated by an upward path from to ) with upward paths towards both and (otherwise replace with and repeat the process). Symmetrically, one can read the meet from the graph.

There are limitations to this process, especially if the set is infinite. Even for a finite set of relatively small size, the complete abstraction universe can be quite complicated to visualize (recall that we have to draw vertices where grows extremely fast with , let alone the edges). However, not all arbitrary partitions are of interest to us. In the following subsections, we study symmetry-generated abstractions and abstraction universes. So, later we can focus on certain partitions by considering certain symmetries.

3.3 Symmetry-Generated Abstraction

Recall that we explain an abstraction of a set by its inducing equivalence relation, where equivalent elements are treated as the same. Instead of considering arbitrary equivalence relations or arbitrary partitions, we construct every abstraction from an explicit mechanism—a symmetry—so the resulting equivalence classes or partition cells are invariant under this symmetry. To capture various symmetries, we consider groups and group actions.

Preliminaries (Appendix a.3):

group ( or ), subgroup (), trivial subgroup (), subgroup generated by a set (), cyclic subgroup (); group action, -action on (), orbit of (), set of all orbits ().

Consider a special type of group, namely the symmetric group defined over a set , whose group elements are all the bijections from to and whose group operation is (function) composition. The identity element of is the identity function, denoted . A bijection from to is also called a transformation of . Therefore, the symmetric group comprises all transformations of , and is also called the transformation group of , denoted . We use these two terms and notations interchangeably in this paper, with a preference for in general, while reserving mostly for a finite .

Given a set and a subgroup , we define an -action on by for any ; the orbit of under is the set . Orbits in under define an equivalence relation: if and only if are in the same orbit, and each orbit is an equivalence class. Thus, the quotient is a partition of . It is known that every cell (or orbit) in the abstraction (or quotient) is a minimal non-empty invariant subset of under transformations in . Therefore, we say this abstraction respects the so-called -symmetry or -invariance.

We succinctly record the above process of constructing an abstraction (of ) from a given subgroup in the following abstraction generating chain:

which can be further encapsulated by the abstraction generating function defined as follows.

The abstraction generating function is the mapping where is the collection of all subgroups of , is the family of all partitions of , and for any , , where .

The abstraction generating function is not necessarily injective. Let and be two transformations (also known as permutations, in the cycle notation) of ; consider the cyclic groups:

It is clear that but , the coarsest partition of .

The abstraction generating function is surjective. For any , let be the bijective function of the form

Pick any partition . For any cell , define

We claim . To see this, for any distinct that are in the same cell in , for some , so . This implies that and are in the same orbit in , since . Therefore, . Conversely, for any distinct that are in the same orbit in , there exists an such that . By definition, for some finite integer where . Suppose is the cell that is in, i.e. , then , since if and otherwise. Likewise, we have . This implies that , i.e.  and are in the same cell in . Therefore, . Combining both directions yields , so is surjective.

3.4 Duality: from Subgroup Lattice to Abstraction (Semi)Universe

Given a subgroup of , we can generate an abstraction of via the abstraction generating function . Thus, given a collection of subgroups of , we can generate a family of abstractions of . Further, given a collection of subgroups of with a hierarchy, we can generate a family of abstractions of with an induced hierarchy. This leads us to a subgroup lattice generating a partition (semi)lattice, where the latter is dual to the former via the abstraction generating function .

Preliminaries (Appendix a.4):

the (complete) subgroup lattice for a group (, ), join (), meet ().

We consider the subgroup lattice for , denoted . Similar to the complete abstraction universe , we can draw a directed acyclic graph to visualize if it is finite, where vertices denote subgroups and edges denote the partial order. The graph is similarly constructed by plotting all distinct subgroups of starting at the bottom with , ending at the top with and, roughly speaking, with larger subgroups positioned higher than smaller ones. Draw an upward edge from to if and there are no subgroups properly between and . For any pair of subgroups , the join can be read from the graph by tracing paths upwards from and respectively until a common subgroup containing both is reached, and making sure there are no smaller such subgroups; the meet can be read from the graph in a symmetric manner. For any subgroup , the subgroup sublattice for is part of the subgroup lattice for , which can be read from the graph for by extracting the part below and above .

[Duality] Let be the subgroup lattice for , and be the abstraction generating function. Then is an abstraction meet-semiuniverse for . More specifically, for any , the following hold:

  • partial-order reversal: if , then ;

  • strong duality: (Figure 4a);

  • weak duality: (Figure 4b).

Figure 4: Duality of join and meet between the subgroup lattice (left in each subfigure) and the partition lattice (right in each subfigure). In (a), the gray vertex denoting , i.e. the actual meet in the partition lattice, is equal to ; in (b), the gray vertex denoting , i.e. the actual join in the partition lattice, can be any vertex below and above or even equal to these three end points.

(Partial-order reversal) Pick any and . For any that are in the same cell in partition , . Since , then , which further implies that . So, and are in the same cell in partition . Therefore, .

(Strong duality) Pick any . By the definition of join, , so from what we have shown at the beginning, , i.e.  is a common coarsening of and . Since is the finest common coarsening of and , then . Conversely, for any that are in the same cell in partition , and must be in the same orbit under -action on , i.e.  which means for some finite integer where (note: the fact that are both subgroups ensures that is closed under inverses). This implies that and are either in the same cell in partition or in the same cell in partition depending on whether or , but in either event, and must be in the same cell in any common coarsening of and . Note that is a common coarsening of and (regardless of the fact that it is the finest), so and are in the same cell in partition . Likewise, and , and , , and are all in the same cell in partition . Therefore, and are in the same cell in partition . So, . Combining both directions yields .

(Weak duality) Pick any . By the definition of meet, , so from what have shown at the beginning, , i.e.  is a common refinement of and . Since is the coarsest common refinement of and , then . We cannot obtain equality in general. For example, let and , . It is clear that and , so , i.e. the finest partition of . However, and , i.e. the coarsest partition of , so . In this example, we see that but . [Practical implication] The strong duality in Theorem 3.4 suggests a quick way of computing abstractions. If one has already computed abstractions and , then instead of computing from , one can compute the meet , which is generally a less expensive operation than computing and identifying all orbits in .

Theorem 3.4 further allows us to build an abstraction semiuniverse with a partial hierarchy directly inherited from the hierarchy of the subgroup lattice. Nevertheless, there are cases where with incomparable and since the abstraction generating function is not injective (Theorem 3.3). If desired, one needs additional steps to complete the hierarchy or even to complete the abstraction semiuniverse into an abstraction universe.

3.5 More on Duality: from Conjugation to Group Action

Partitions of a set generated from two conjugate subgroups of can be related by a group action. We present this relation as another duality between subgroups and abstractions, which can also simplify the computation of abstractions.

Preliminaries (Appendix a.5):

conjugate, conjugacy class.

Let be a group, be a set, and be a -action on . Then

  • for any , , and the corresponding function defined by is a -action on ;

  • for any , , and the corresponding function defined by is a -action on .

See Appendix B.1.

[Duality] Let be a set, be the transformation group of , and be the abstraction generating function. Then for any and ,

where refers to the group action defined in Statement 2 in Theorem 3.5. For any , is an orbit in under , then for some . Note that in the above derivation, since . So, is the orbit of under , i.e. . This implies that . Therefore, .

Conversely, for any , for some . Note that is an orbit in under , i.e.  for some , then for some . Note that in the above derivation, since . Therefore, is the orbit of under , i.e. . This implies that . So, . [Practical implication] Theorem 3.5 relates conjugation in the subgroup lattice to group action on the partition lattice . In other words, the group action on the partition lattice is dual to the conjugation in the subgroup lattice. This duality suggests a quick way of computing abstractions. If one has already computed abstraction , then instead of computing from , one can compute , which is generally a less expensive operation than computing and identifying all orbits in .

3.6 Partial Subgroup Lattice

Theoretically, through the abstraction generating function and necessary hierarchy completions, we can construct the complete abstraction universe from the complete subgroup lattice . This is because the subgroup lattice is a larger space that “embeds” the partition lattice (more precisely, Theorem 3.3 and 3.3). However, as we mentioned earlier, it is not practical to even store for small , and not all arbitrary partitions of are equally useful. Instead of considering all subgroups of , we draw our attention to certain parts of the complete subgroup lattice . We introduce two general principles in extracting partial subgroup lattices: the top-down approach and the bottom-up approach.

The Top-Down Approach.

We consider the subgroup sublattice for some subgroup . If is finite, this is the part below and above in the directed acyclic graph for the complete subgroup lattice . As the name suggests, the top-down approach first specifies a “top” in (i.e. a subgroup ), and then extract everything below the “top” (i.e. the subgroup lattice ). The computer algebra system GAP (The GAP Group, 2018) provides efficient algorithmic methods to construct the subgroup lattice for a given group, and even maintains several data libraries for special groups and their subgroup lattices. In general, enumerating all subgroups of a group can be computationally intense, and therefore, is applied primarily to small groups. When computationally prohibited, a general trick is to enumerate subgroups up to conjugacy (which is also supported by the GAP system). Computing abstractions within the conjugacy class of any subgroup is then easy by the duality in Theorem 3.5, once the abstraction generated by a representative is computed. More details on picking a special subgroup (as the “top”) of are discussed in Section 4.

The Bottom-Up Approach.

We first pick some finite subset , and then generate a partial subgroup lattice for by computing for every , starting from smaller subgroups. As the name suggests, the bottom-up approach first constructs the trivial subgroup , i.e. the bottom vertex in the direct acyclic graph for if is finite, and then cyclic subgroups for every . We continue to construct larger subgroups from smaller ones by taking the join, which corresponds to gradually moving upwards in the graph for when is finite. In general, this approach will produce at most subgroups for a given subset , and will not produce the complete subgroup sublattice unless . Computing abstractions using this bottom-up approach is easy by the strong duality in Theorem 3.4, once the abstractions generated by all cyclic subgroups are computed. More details on this abstraction generating process and picking a generating set (as the “bottom”) are discussed in Section 5.

4 The Top-Down Approach: Special Subgroups

We follow a top-down approach to discuss subgroup enumeration problems. The plan is to start with the transformation group of , and then to consider special subgroups of and special subspaces of . To do this systematically, we derive a principle that allows us to hierarchically break the enumeration problem into smaller and smaller enumeration subproblems. This hierarchical breakdown can guide us in restricting both the type of subgroups and the type of subspaces, so that the resulting abstraction (semi)universe fits our desiderata, and more importantly can be computed in practice. Figure 5 presents an outline consisting of special subgroups and subspaces considered in this section as well as their hierarchies.

Figure 5: Special subgroups and spaces as well as their hierarchies. (a) presents a backbone of the complete subgroup lattice , including important subgroups and their breakdowns. One can check the above directed acyclic graph indeed represents a sublattice: it is closed under both join and meet. (b) presents important subspaces of , where restrictions are gradually added to eventually lead to practical abstraction-construction algorithms.

Note: we do not claim the originality of the content in this section. Indeed, many parts have been studied in various contexts. Our work is to extend existing results from specific context to a general setting. This generalization coherently puts different pieces of context-specific knowledge under one umbrella, forming the guiding principle of the top-down approach.

Preliminaries (Appendix a.6):

group homomorphism, isomorphism (); normalizer of a set in a group (), normal subgroup (); group decomposition, inner semi-direct product, outer semi-direct product ().

4.1 The Affine Transformation Group

An affine transformation of is a function of the form

where is an

real invertible matrix and

is an -dimensional real vector. We use to denote the set of all affine transformations of . There are two special cases:

  • A translation of is a function of the form where ; we use to denote the set of all translations of .

  • A linear transformation of is a function of the form where ; we use

    to denote the set of all linear transformations of

    .

It is easy to check that ; further, and are isomorphic to and , respectively. It is known that

So every affine transformation can be uniquely identified with a pair . In particular, the identity transformation is identified with , the translation group is identified with , and the linear transformation group is identified with . Under this identification, compositions and inverses of affine transformations become

(1)

The above identification further allows us to introduce two functions and to extract the linear and translation part of an affine transformation, respectively, where

Now we can start our journey towards a complete identification of every subgroup of . We introduce the first foundational quantity , which is the set of pure translations in , called the translation subgroup of . It is easy to check that since translations are normal in affine transformations. Therefore, the quotient group is well-defined. The elements in are called cosets. The following theorems reveal more structures of , the second foundational quantity.

is a homomorphism. For any , we have , which implies that is a homomorphism.

Let , . Then are in the same coset in if and only if they have the same linear part, i.e. . See Appendix B.2.

Let , . If are in the same coset in , then . See Appendix B.3. Theorems 4.1 and 4.1 present two characterizations of elements in the same coset in , respectively. The former, through the linear part, is an if-and-only-if characterization; while the latter, through the translation part, is a necessary but not sufficient characterization.

Let , . Then . It is clear that , since and is a homomorphism (Lemma 4.1) which preserves subgroups. Let be the function of the form , we claim that is an isomorphism. To see this, for any ,

which implies is a homomorphism. Further, for any , if , then . By Theorem 4.1, this implies that , so is injective. Lastly, for any , there exists an such that . For this particular , , and . This implies that is surjective. Theorem 4.1 can be proved directly from the first isomorphism theorem, by recognizing is a homomorphism whose kernel and image are and , respectively. However, the above proof explicitly gives the isomorphism which is useful in the sequel.

[Compatibility] Let , . For any and , we have . Further, if we define a function of the form , then is a group action of on . See Appendix B.4.

So far, we have seen that for any subgroup , its subset of pure translations is a normal subgroup of ; is also a normal subgroup of , since is a commutative group. As a result, both quotient groups and are well-defined. We next introduce a function, called a vector system, which connects the two quotient groups. It turns out that vector systems comprise the last piece of information that leads to a complete identification of every subgroup of . Note that (Theorem 4.1) and ; thus for conceptual ease (think in terms of matrices and vectors), we introduce vector systems connecting and instead.

[Vector system] For any and , an -vector system is a function , which in addition satisfies the following two conditions:

  • compatibility condition: for any , ;

  • cocycle condition: for any , .

Note: elements in are cosets of the form for . It is easy to check: for any two cosets in , the sum

for any and any coset in , the product

So, the sum and product in the cocycle condition are defined in the above sense. We use to denote the family of all -vector systems. One can check that if and only if are compatible (consider the trivial vector system given by for all ). We use to denote the universe of all vector systems. The universe of all vector systems can be parameterized by the set of compatible pairs . The reason is straightforward: and respectively define the domain and codomain of a function, and two functions are different if either their domains or their codomains are different.

Let , , and , then

  • for the identity matrix

    , ;

  • for any , .

See Appendix B.5.

[Affine subgroup identification] Let