B-Spline CNNs on Lie Groups

09/26/2019
by   Erik J. Bekkers, et al.
TU Eindhoven
0

Group convolutional neural networks (G-CNNs) can be used to improve classical CNNs by equipping them with the geometric structure of groups. Central in the success of G-CNNs is the lifting of feature maps to higher dimensional disentangled representations, in which data characteristics are effectively learned, geometric data-augmentations are made obsolete, and predictable behavior under geometric transformations (equivariance) is guaranteed via group theory. Currently, however, the practical implementations of G-CNNs are limited to either discrete groups (that leave the grid intact) or continuous compact groups such as rotations (that enable the use of Fourier theory). In this paper we lift these limitations and propose a modular framework for the design and implementation of G-CNNs for arbitrary Lie groups. In our approach the differential structure of Lie groups is used to expand convolution kernels in a generic basis of B-splines that is defined on the Lie algebra. This leads to a flexible framework that enables localized, atrous, and deformable convolutions in G-CNNs by means of respectively localized, sparse and non-uniform B-spline expansions. The impact and potential of our approach is studied on two benchmark datasets: cancer detection in histopathology slides in which rotation equivariance plays a key role and facial landmark localization in which scale equivariance is important. In both cases, G-CNN architectures outperform their classical 2D counterparts and the added value of atrous and localized group convolutions is studied in detail.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

11/16/2021

Enabling equivariance for arbitrary Lie groups

Although provably robust to translational perturbations, convolutional n...
10/25/2021

Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups

Group convolutional neural networks (G-CNNs) have been shown to increase...
04/10/2020

Theoretical Aspects of Group Equivariant Neural Networks

Group equivariant neural networks have been explored in the past few yea...
03/13/2022

Similarity Equivariant Linear Transformation of Joint Orientation-Scale Space Representations

Convolution is conventionally defined as a linear operation on functions...
09/15/2021

Automatic Symmetry Discovery with Lie Algebra Convolutional Network

Existing equivariant neural networks for continuous groups require discr...
11/19/2019

General E(2)-Equivariant Steerable CNNs

The big empirical success of group equivariant networks has led in recen...
06/10/2021

Group Equivariant Subsampling

Subsampling is used in convolutional neural networks (CNNs) in the form ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Group convolutional neural networks (G-CNNs) are as a class of neural networks that are equipped with the geometry of groups. This enables them to profit from the structure and symmetries in signal data such as images (cohen_group_2016). A key feature of G-CNNs is that they are equivariant with respect to transformations described by the group, i.e., they guarantee predictable behavior under such transformations and are insensitive to both local and global transformations on the input data. Classical CNNs are a special case of G-CNNs that are equivariant to translations and, in contrast to unconstrained NNs, they make advantage of (and preserve) the basic structure of signal data throughout the network (lecun_handwritten_1990). By considering larger groups (i.e. considering not just translation equivariance) additional geometric structure can be utilized in order to improve performance and data efficiency (see G-CNN literature in Sec. 2).

Part of the success of G-CNNs can be attributed to the lifting of feature maps to higher dimensional objects that are generated by matching kernels under a range of poses (transformations in the group). This leads to a disentanglement with respect to the pose and together with the group structure this enables a flexible way of learning high level representations in terms of low-level activated neurons observed in specific configurations. From a neuro-psychological viewpoint, this resembles a hierarchical composition from low- to high-level features akin to the recognition-by-components model by

biederman_recognition-by-components:_1987, a viewpoint which is also adopted in work on capsule networks (hinton_transforming_2011; sabour_dynamic_2017). In particular in (lenssen_group_2018) the relation to group theory is made explicit with group equivariant capsules that provide a sparse index/value representation of feature maps on groups. Fig. 1 illustrates how one can think of part-whole relations in terms of a relative configuration of group elements or as a density on the group.

Representing low-level features via features maps on groups, as is done in G-CNNs, is also motivated by the findings of hubel_receptive_1959 and bosking_orientation_1997 on the organization of orientation sensitive simple cells in the primary visual cortex V1. These findings are mathematically modeled by sub-Riemannian geometry on Lie groups (petitot_neurogeometry_2003; citti_cortical_2006; duits_association_2014) and led to effective algorithms in image analysis (franken_crossing-preserving_2009; bekkers_pde_2015; favali_analysis_2016; duits_optimal_2018; baspinar_minimal_2018). In recent work montobbio_receptive_2019 show that such advanced V1 modeling geometries emerge in specific CNN architectures and in ecker_rotation-equivariant_2019 the relation between group structure and the organization of V1 is explicitly employed to effectively recover actual V1 neuronal activities from stimuli by means of G-CNNs.

(a) Pattern of local orientations
(b) Density on
Figure 1: (a) A face descriptor in terms of low level features (e.g. edges) in a pattern of local orientations (elements in ) relative to an origin and (b) the same pattern embed as a density on that represents idealized neuronal activations in a G-CNN feature map.

G-CNNs are well motivated from both a mathematical point of view (cohen_general_2018; kondor_generalization_2018) and neuro-psychological/neuro-mathematical point of view and their improvement over classical CNNs is convincingly demonstrated by the growing body of G-CNN literature (see Sec. 2). However, their practical implementations are limited to either discrete groups (that leave the grid intact) or continuous, (locally) compact, unimodular groups such as roto-translations (that enable the use of Fourier theory). In this paper we lift these limitations and propose a framework for the design and implementation of G-CNNs for arbitrary Lie groups.

The proposed approach for G-CNNs relies on a definition of B-splines on Lie groups which we use to expand and sample group convolution kernels. B-splines are piece-wise polynomials with local support and are classically defined on flat Euclidean spaces

. In this paper we generalize B-splines to Lie groups and formulate a definition using the differential structure of Lie groups in which B-splines are essentially defined on the (flat) vector space of the Lie algebra obtained by the logarithmic map. The result is a flexible framework for B-splines on arbitrary Lie groups and it enables the construction of G-CNNs with properties that cannot be achieved via traditional Fourier-type basis expansion methods. Such properties include

localized, atrous, and deformable convolutions in G-CNNs by means of respectively localized, sparse and non-uniform B-spline expansions.

Although concepts described in this paper apply to arbitrary Lie groups, we here concentrate on the analysis of data that lives on and consider G-CNNs for affine groups that are the semi-direct product of the translation group with a Lie group that acts on . As such, only a few core definitions about the Lie group (group product, inverse, , and action on ) need to be implemented in order to build full G-CNNs that are locally equivariant to the transformations in .

The impact and potential of our approach is studied on two datasets in which respectively rotation and scale equivariance plays a key role: cancer detection in histopathology slides (PCam dataset) and facial landmark localization (CelebA dataset). In both cases G-CNNs out-perform their classical 2D counterparts and the added value of atrous and localized G-convolutions is studied in detail.

2 Related work

G-CNNs

The introduction of G-CNNs to the machine learning community by

cohen_group_2016 led to a growing body of G-CNN literature that consistently demonstrates an improvement of G-CNNs over classical CNNs. It can be roughly divided into work on discrete G-CNNs (cohen_group_2016; dieleman_exploiting_2016; winkels_3d_2018; worrall_cubenet:_2018; hoogeboom_hexaconv_2018), regular continuous G-CNNs (oyallon_deep_2015; bekkers_training_2015; bekkers_template_2018; weiler_3d_2018; zhou_oriented_2017; marcos_rotation_2017) and steerable continuous G-CNNs (cohen_spherical_2018; worrall_harmonic_2017; kondor_generalization_2018; thomas_tensor_2018; weiler_3d_2018; esteves_learning_2018; andrearczyk_exploring_2019). Since 3D rotations can only be sampled in very restrictive ways (without destroying the group structure) the construction of 3D roto-translation G-CNNs is limited. In order to avoid having to sample all together, steerable (G-)CNNs can be used. These are specialized G-CNNs in which the kernels are expanded in circlar/spherical harmonics and computations take place using the basis coefficients only (chirikjian_engineering_2000; franken_enhancement_2008; almsick_van_context_2007; skibbe_spherical_2017). The latter approach is however only possible for unimodular groups such as roto-translations.

Scale equivariance In this paper we experiment with scale-translation G-CNNs, which is the first direct application of G-CNNs to achieve equivariance beyond roto-translations. Scale equivariance is however addressed in several settings (henriques_warped_2017; esteves_polar_2018; marcos_scale_2018; tai_equivariant_2019; worrall_deep_2019; jaderberg_spatial_2015), of which (worrall_deep_2019) is most related. There, scale-space theory and semi-group theory is used to construct scale equivariant layers that elegantly take care of moving band-limits due to rescaling. Although our work differs in several ways (e.g. non-learned lifting layer, discrete group convolutions via atrous kernels, semi-group theory), the first two layers of deep scale-space networks relate to our lifting layer by treating our B-splines as a superposition of dirac deltas transformed under the semi-group action of (worrall_deep_2019), as we show in App. C.1. Related work by tai_equivariant_2019 relies on the same Lie group principles as we do in this paper (the map) to construct convenient coordinate systems, such as log-polar coordinates esteves_polar_2018

, to handle equivariance. Such methods are however generally not translation equivariant and do not deal with local symmetries as they act globally on feature maps, much like spatial transformer networks

(jaderberg_spatial_2015).

B-splines and vector fields in deep learning

The current work can be seen as a generalization of the B-spline based CNNs of bekkers_training_2015; bekkers_template_2018, see Sec. 3.3. Closely related is also the work of fey_splinecnn:_2018 in which B-splines are used to generalize CNNs to non-Euclidean data (graphs). There it is proposed to perform convolution via B-spline kernels on that take as inputs vectors that relate any two points in the graph to each other. How is constructed is left as a design choice, however, in (fey_splinecnn:_2018) this is typically done by embedding the graph in an Euclidean space where points relate via offset vectors. In our work on Lie G-CNNs, two points in the Lie group relate via the logarithmic map . Another related approach in which convolutions take place on manifolds in terms of ”offset vectors” is the work by cohen_gauge_2019. There, points relate via the exponential map with respect to gauge frames rather than the left-invariant vector fields as in this paper, see App. C.2.

3 Lie group CNNs

3.1 Preliminaries and notation

Group A group is defined by a set together with a binary operator , the group product, that satisfies the following axioms: Closure: For all we have ; Identiy: There exists an identity element ; Inverse: for each there exists an inverse element such that ; and Associativity: For each we have .

Lie group and Lie algebra If furthermore the group has the structure of a differential manifold and the group product and inverse are smooth, it is called a Lie group. The differentiability of the group induces a notion of infinitesimal generators (see also the exponential map below), which are elements of the Lie algebra . The Lie algebra consists of a vector space (of generators), that is typically identified with the tangent space at the identity , together with a bilinear operator called the Lie bracket. In this work the Lie bracket is not of interest and we simply say .

Exponential and logarithmic map In this work we expand vectors in the left-invariant basis and write , with components . This allows us to identify the Lie algebra with . In this work we rely on the logarithmic map as an essential tool to map elements from the typically non-flat manifolds of to a flat Euclidean vector space.

Semi-direct product groups In this paper we specifically consider (affine) Lie groups of type that are the semi-direct product of the translation group with a Lie group that acts on . Let denote the action of on ; it describes how an element in is transformed by . Then the group product of is given by

(1)

with , and . For example the special Euclidean motion group is constructed by choosing , the group of rotation matrices with matrix multiplication as the group product. The group product of is then given by

with and rotation matrices parameterized by a rotation angle , and in which rotations act on vectors in simply by matrix vector multiplication.

Group representations

We consider linear transformations

that transform functions (or feature maps) on some space as representations of a group if they share the group structure via

with denoting function composition. Thus, a concatenation of two such transformations, parameterized by and , can be described by a single transformation parameterized by . For semi-direct product groups such a representation can be split into

(2)

3.2 Group convolutional neural networks

We follow the conventional approach of building artificial neural networks using layers of the form

with the input vector, a linear map parameterized by a weight vector , and with a bias term and a point-wise non-linearity. In classical neural networks and are Euclidean vector spaces and the linear map is a weight matrix.

In this work we focus on structured data and consider feature maps on some domain as functions , the space of which we denote with . In this case and are the spaces of multi-channel feature maps, , and is a kernel operator. It turns out that if we constrain the linear operator to be equivariant under transformations in some group we arrive at group convolutional neural networks. This is formalized in the following theorem on equivariant maps between homogeneous spaces (see (duits_scale_2007; kondor_generalization_2018; cohen_general_2018) for related statements).

Theorem 1.

Let operator be linear and bounded, let be homogeneous spaces on which Lie group act transitively, and a Radon measure on , then

  1. is a kernel operator, i.e.,

  2. with equivariance constraint the map is defined by a one-argument kernel

    (3)

    for any such that for some fixed origin ,

  3. if is the quotient of with then the kernel is constrained via

    (4)
Proof.

See App. A

Corollary 1.

If is a homogeneous space of an affine Lie group and is the Lebesgue measure on then the kernel front-factor simplifies to with denoting the determinant of the matrix representation of , for any . If and is a Haar measure on then .

Standard CNNs are a special case of G-CNNs in which the kernels are constrained to be translation equivariant. In CNNs the domain of the feature maps coincides with the space of translation vectors of the translation group . It is well known that if we want the networks to be translation and rotation equivariant (), but stick to planar feature maps (i.e. ), then the kernels should be rotation invariant (due to Eq. (4)), which of course limits representation power. If we want to maximize representation power (without constrains on ) the feature maps should be lifted to the higher dimensional domain of the group itself (i.e. ). We therefore propose to build G-CNNs with the following 3 types of layers:

  • Lifting layer (): In this layer is defined by lifting correlations

    which by splitting of the representation (Eq. (2)) can be written as

    (5)

    with .

  • Group correlation layer (): In this case is defined by group correlations

    with a Haar measure on . We can again split this cross-correlation into a transformation of followed by a spatial cross-correlation via

    (6)

    with the convolution kernel transformed by and in which we overload to indicate cross-correlation on the part of .

  • Projection layer (): In this case is a linear projection defined by

    (7)

    where we simply integrate over instead of using a kernel that would otherwise be constant over and spatially isotropic with respect to .

3.3 B-Splines on Lie groups

Central in our formulation of G-CNNs is the transformation of convolution kernels under the action of as described above in Eqs. (5) and (6) in the continuous setting. However, for the implementation of G-CNNs the kernels and their transformations need to be sampled on a discrete grid. We expand on the idea’s in (bekkers_training_2015; bekkers_template_2018; weiler_learning_2018) to express the kernels in an analytic form which we can then sample under arbitrary transformations in to perform the actual computations. In particular we generalize the approach of bekkers_training_2015; bekkers_template_2018 to expand group correlation kernels in a basis of shifted cardinal B-splines, which are localized polynomial functions on with finite support. In (bekkers_training_2015; bekkers_template_2018), B-splines on could be used to construct kernels on by identifying the group with the space of positions and orientations and simply using periodic splines on the orientation axis . However, in order to construct B-splines on arbitrary Lie groups, we need a generalization. In the following we propose a new definition of B-splines on Lie groups which enables us to construct the kernels on that are required in the G-correlations (Eq. (6)).

Definition 1 (Cardinal B-spline on ).

The 1D cardinal B-Spline of degree be is defined as

(8)

where denotes -fold convolution of the indicator function . The multi-variate cardinal B-spline on , with coordinates

, is defined via the tensor product

(9)

Cardinal B-splines are piece-wise polynomials and are localized on support . Functions can be expanded in a basis of shifted cardinal B-splines, which we simply refer to as B-splines.

Definition 2 (B-splines on ).

A B-spline is a function expanded in a basis that consists of shifted and scaled copies of the cardinal B-spline

(10)

and is fully characterized by spline degree , scale , set of centers with and corresponding coefficients . The B-spline is called uniform if the set of centers forms a uniform grid on , in which the distance between neighbouring centers is constant along each axis and equal to .

Definition 3 (B-splines on Lie group ).

A B-spline on is a function expanded in a basis that consists of shifted (by left multiplication) and scaled copies of the cardinal B-spline

(11)

with and the logarithmic map on . The B-spline is fully characterized by the spline degree , scale , set of centers with and corresponding coefficients . The spline is called uniform if the distance between neighbouring centers is constant.

, N=8
+ + =
, N=6
+ + =
, N=50
+ + =
                            N=500
+ + =
                            N=5000
+ + =
Figure 2: Left: The sum of all B-spline basis functions add up to one, illustrating partition of unity on the 2D rotation group (row 1), scaling/dilation group (row 2), and the sphere treated as the quotient group , with B-spline centers indicated with green dots (row 3-5). Right: A random B-Spline on (row 1) and (row 2) and reconstruction of a color texture on the sphere at several scales (row 3-5) to illustrate multi-scale properties.

[capbesideposition=right,top,capbesidewidth=4.2cm]figure[]

Figure 3: A B-Spline on (row 1), sampled on a grid (row 2), and a B-spline on the sphere (row 3). From left to right: a localized kernel, scaled kernel by increasing and , atrous kernel, deformable kernel. A green circle is drawn around each B-spline center with radius or to indicate the individual basis functions.

Examples of B-splines on Lie groups are given in Fig. 2. In this paper we choose to expand convolution kernels on as the tensor product of B-splines on and respectively and obtain functions via

(12)

Note that that one could also directly define B-splines on via (11), however, this splitting ensures we can use a regular Cartesion grid on the part. In our experiments we use B-splines as in (12) and consider the coefficients as trainable parameters and the centers ( and/or ) and scales ( and/or ) are fixed by design. Some design choices are the following (and illustrated in Fig. 3).

Global vs localized uniform B-splines The notion of a uniform B-spline globally covering exists only for a small set of Lie groups, e.g. for any 1D group and abelian groups, and it is not possible to construct uniform B-splines on Lie groups in general due to non-zero commutators. Nevertheless, we find that it is possible to construct approximately uniform B-splines either by constructing a grid of centers on that approximately uniformly covers , e.g. by using a repulsion model in which between any two grid points is maximized (as is done in Fig. 2), or by specifying a uniform localized grid on the lie algebra and obtaining the centers via the exponential map. The latter approach is in fact possible for any Lie group and leads to a notion of localized convolution kernels that have a finite support on , see Fig. 3.

Atrous B-splines Atrous convolutions, i.e. convolutions with sparse kernels defined by weights interleaved with zeros (holschneider_real-time_1990), are commonly used to increase the effective receptive field size and add a notion of scale to deep CNNs (yu_multi-scale_2016; chen_deeplab:_2018). Atrous convolution kernels can be constructed with B-splines by fixing the scale factors and , e.g. to the grid size, and increasing the distance between the center points and .

Non-uniform/deformable B-splines In non-uniform B-splines the centers and do not necessarily need to lie on a regular grid. Then, deformable CNNs, first proposed by dai_deformable_2017, are obtained by treating the centers as trainable parameters. For B-spline CNNs on of order this in fact leads to the deformable convolution layers as defined in (dai_deformable_2017).

Modular design The design of G-correlation layers (Eqs. (5-7)) using B-spline kernels (Eqs. (10-12)) results in a generic and modular construction of G-CNNs that are equivariant to Lie groups and only requires a few group specific definitions (see examples in App. B): The group structure of (group product and inverse), the action of on (together with the group structure of this automatically defines the structure of ), and the logarithmic map .

4 Experiments

4.1 Roto-translation CNNs

Data The PatchCamelyon (PCam) dataset (veeling_rotation_2018) consists of 327,680 RGB patches taken from histopathologic scans of lymph node sections and is derived from Camelyon16 (ehteshami_bejnordi_diagnostic_2017). The patches are binary labeled for the presence of metastasis. The classification problem is truly rotation invariant as image features appear under arbitrary rotations at all levels of abstraction, e.g. from edges (low-level) to individual cells to the tissue (high-level).

Experiments G-CNNs ensure roto-translation equivariance both locally (low-level) and globally (high-level) and invariance is achieved by means of pooling. In our experiments we test the performance of roto-translation G-CNNs (with ) against a 2D baseline and investigate the effect of different choices (local, global, atrous) for defining the kernels on the -part of the network, cf. Eq. (12) and Fig. 3. Each network has the same architecture (detailed in App. D) but the kernels are sampled with varying resolution on , denoted with , and with varying resolution of the B-splines, which is achieved by varying and the number of basis functions on , denoted with . Each network has approximately the same number of trainable weights.

The results are summarized in Fig. 4. Here means the kernels are transformed for only one rotation, which coincides with standard 2D convolutions (our baseline). A result labeled ”dense” with and means the convolution kernels are rotated 16 times and the kernels are expanded in a B-spline basis with 8 basis functions to fully cover . The label ”local” means the basis is localized with basis functions with a spacing of between them, with equal to the grid resolution. Atrous kernels are spaced equidistantly on and have .

Results We generally observe that a finer sampling of leads to better results up until after which results slightly degrade. This is line with findings in (bekkers_roto-translation_2018). The degradation after this point could be explained by overfitting; there is a limit on the resolution of the signal generated by rotating 5x5 convolution kernels; at some point the splines are described in more detail than the data and thus an unnecessary amount of coefficients are trained. One could still benefit from sampling coarse kernels (low ) on a fine grid (high ), e.g. compare the cases for fixed . This is in line with findings in (weiler_learning_2018) where a fixed circular harmonic basis is used. Generally, atrous kernels tend to outperform dense kernels as do the localized kernels in the low regime.

[height=4.8cm]FigsResults/rotation_results.pdf [height=4.8cm]FigsResults/scale_results.pdf

Figure 4: Left: results of roto-translation G-CNNs on tumor classification (PCam dataset). Right: results of scale-translation G-CNNs on landmark localization (CelebA dataset).

4.2 Scale-translation CNNs

Data The CelebA dataset (liu_deep_2015)

contains 202,599 RGB images of varying size of celebrities together with labels for attributes (hair color, glasses, hat, etc) and 5 annotated facial landmarks (2 eyes, 1 nose, 2 corners of the mouth). We reformatted the data as follows. All images are isotropically scaled to a maximum width or height of 128 and if necessary padded in the other dimension with zeros to obtain a size of 128x128. For each image we took the distance between the eyes as a reference for the size of the face and categorized each image into above and below average size. For each unique celebrity with at least 1 image per class, we randomly sampled 1 image per class. The final dataset consists of 17,548 images of 128x128 of 8,774 celebrities with faces at varying scales. Each image is labeled with 5 heatmaps constructed by sampling a Gaussian with standard deviation 1.5 centered around each landmark.

Experiments We train a scale-translation G-CNN (with ) with different choices for kernels. The ”dense” networks have kernels defined over the whole discretization of and thus consider interactions between features at all scales. The ”local” networks consider only interaction between neighbouring scales via localized kernels () or no scale interaction at all (). Either way, each G-CNN is a multi-scale network in which kernels are applied at a range of scales. We compared against a 2D baseline with fixed-scale kernels which we tested for several scales. In the G-CNNs, is uniformly sampled (w.r.t. to the metric on ) on a fixed scale range, generating the discrete sets with . Each G-CNN is sampled with the same resolution in with , and each B-spline basis function is centered on the discrete grid (i.e. ). We note that the discretization of is formally no longer a group as it is not closed, however, the group structure still applies locally. The result is that information may leak out of the domain in a similar way as happens spatially in standard zero-padded 2D CNNs (translational G-CNNs), in which the discretized domain of translations is also no longer (locally) compact. This information leak can be circumvented by using localized convolution kernels of size along the axis, as is also done in (worrall_deep_2019).

Results Fig. 4 summarizes the results. By testing our 2D baseline at several scales we observe that there is an optimal scale () that gives a best trade off between the scale variations in the data. This set of experiments is also used to rule out the idea that G-CNNs outperform the 2D baseline simply because they have a larger effective receptive field size. For large scale ranges the G-CNNs start to outperform 2D CNNs as these networks consider both small and large scale features (multi-scale behavior). Comparing the differences between the G-CNNs we observe that neighbouring scale interactions, encoded via localized kernels on (”local”), outperform all-scale interactions (”dense”). This finding is in line with those in (worrall_deep_2019).

5 Conclusion

This paper presents a flexible framework for building G-CNNs for arbitrary Lie groups. The proposed B-spline basis functions, which are used to represent convolution kernels, have unique properties that cannot be achieved by classical Fourier based basis functions. Such properties include the construction of localized, atrous, and deformable convolution kernels. We experimentally demonstrated the added value of localized and atrous group convolutions on two different applications, considering two different groups. In particular in experiments with scale-translation G-CNNs, kernel localization led to improved results. The B-spline basis functions can be considered as smooth pixels on Lie groups and they enable us to design G-CNNs using familiar notions from classical CNN design (localized, atrous, and deformable convolutions). Future work will focus on exploring these options further in new applications that could benefit from equivariance constraints, which now becomes available for a large class of transformation groups via the proposed Lie group B-splines.

Acknowledgments

Remco Duits (Eindhoven University of Technology) is gratefully acknowledged for his contributions to the formulation and proof of Thm. 1 and for helpful discussions on Lie groups. This work is part of the research programme VENI with project number 17290, which is (partly) financed by the Dutch Research Council (NWO).

References

Appendix A Proof of Theorem 1

The following proofs the three sub-items of Thm. 1.

  1. It follows from Dunford-Pettis Theorem, see e.g. (arendt_integral_1994, Thm 1.3), (kantorovich_functional_1982, Ch 9, Thm 5), or (duits_perceptual_2005, Thm 1), that if is linear and bounded it is an integral operator.

  2. The left-equivariance constraint then imposes bi-left-invariance of the kernel as follows, where and :

    (13)

    Since (13) should hold for all we obtain

    (14)

    Furthermore, since acts transitively on we have that such that and thus

    for every such that with arbitrary fixed origin .

  3. Every homogeneous space of can be identified with a quotient group . Choose an origin s.t. , i.e., , then

We further remark that When , with the identify element of , the symmetry constraint of Eq. (4) vanishes. Thus, in order to construct equivariant maps without constraints on the kernel the functions should be lifted to the group .

Appendix B Examples of Lie groups

In the following sub-sections some explicit examples of Lie groups are given, together with their actions on and the operators. The required tools for building B-spline based G-CNNs for Lie groups of the form are then automatically derived from these core definitions. E.g., the action of on a space defines a left-regular representation on functions on via

When is the group itself, the action equals the group product. The group structure of semi-direct product groups is automatically derived from the action of on , see Eq. (1) and is in turn used to define the representations (see Eq. (2)). Some examples are given below.

b.1 Translation group

The group of translations is given by the space of translation vectors with the group product and inverse given by

with with . The identity element is . The left-regular representation on -dimensional functions produces translations of via

The logarithmic map is simply given by

b.2 The 2D rotation group

The special orthogonal group consists of all orthogonal matrix with determinant 1, i.e., rotation matrices of the form

and the group product and inverse is given by the matrix product and matrix inverse:

with with . The identity element is . The action of on is given by matrix vector multiplication:

with . The left-regular representation are then

Note that the latter representation in terms of the rotation parameters represents the periodic shift , see also Fig. (b)b. The determinant of the Jacobian of the action of on , see corollary 1, is . The logarithmic map on is given by the matrix logarithm

Which in terms of the Lie algebra basis with gives a vector with coefficient . The B-spline basis, centered around each with scale , as depicted in Fig. 2, is thus computed via

b.3 Scaling group

We call the positive real line , together with multiplication, the scaling group. The group product and inverse are given by

with with . The identity element is . The action of on is given by scalar multiplication

with . The determinant of the Jacobian of this action is . The logarithmic map on is provided by the natural logarithm as follows

The B-spline basis, centered around each with scale , as depicted in Fig. 2, is thus computed via

b.4 The 3D rotation group

The 3D rotation group is given by space of orthogonal matrices with determinant 1, with the group product and inverse given by matrix product and matrix inverse:

The action of on is given by matrix-vector multiplication

with .

The logarithmic map from the group to the Lie algebra is given by the matrix logarithm and the resulting matrix can be expanded in a basis for the Lie algebra

with

which at the origin represent an infinitisimal rotation around the , , and axis respectively. A cardinal B-spline centered at some with scale can then be computed in terms of these coefficients via .

In practice it is often convenient to rely on a parameterization of the group and define the group structure in terms of these parameters. A common choice is to do this via ZYZ Euler angles via

with a rotation of around a reference axis , and . A Haar measure in terms of this parameterization is then given by . We will use this parameterization in construction of the quotient group next.

b.5 The 2-sphere

The 2-sphere is defined as . Any point on the sphere can be obtained by rotating a reference vector with elements of , i.e., . In other words, the group acts transitively on . In ZYZ Euler angle parameterization of all angles leave the reference vector in place, meaning that for each we have several that map to the same . As such, we can treat as the quotient group , where refers to the sub-group of rotations around the -axis.

In order to define B-splines on the 2-sphere we need a logarithmic map from a point in to the (Euclidean) tangent vector space at the origin. We will construct this logarithmic map using the define for . Let us parameterize the sphere with

Any rotation with arbitrary maps to the same . As such, there are also many vectors that map to a suitable rotation matrix via the exponential map . We aim to find the vector in for which , which via the exponential map generate torsion free exponential curves. The of any with results in such a vector (portegies_new_2015). As such we define

which maps any point in to a 2-dimensional vector space . A B-spline on can then be defined via

(15)

in which individual splines basis functions are centered around points .

We remark that the group product generates different rotations when varying , that however still map to the same . The vectors obtained by taking of the rotation matrices rotate with the choice for . Since the B-splines are approximately isotropic we neglect this effect and simply set in Eq. (15). Finally, we remark that the superposition of shifted splines (as in Eq. (15)) is not isotropic by construction, which is desirable when using the spline as a convolution kernel to lift functions to . When constraining G-CNNs to generate feature maps on , the kernels are constrained to be isotropic. Alternatively on could stay on entirely and resort to gauge-equivariant networks (cohen_gauge_2019), for which the proposed splines are highly suited to move from the discrete setting (as in (cohen_gauge_2019)) to the continuous setting, see also App. C.2. For examples of splines on see Figs. 2 and 3.

Appendix C Related Work

c.1 Deep scale-spacs

c.1.1 Scale space lifting and correlations

In (worrall_deep_2019) images are lifted to a space of positions and scale parameters by constructing a scale space via

with . The kernels and images are sampled on a discrete grid. Let be the support of the kernel. Then the discrete scale space correlation is given by (worrall_deep_2019, Eq. (19))