1 Introduction
Invariance to irrelevant factors of variability is a desirable property of machine learning models, in particular for medical image analysis problems for which models are expected to generalize to unseen shapes, appearances, or to arbitrary orientations. For example, histopathology image analysis problems require processing a digital slide of a stained specimen whose global orientation is strictly arbitrary. Indeed, in the preparation workflow of histology slides, resection of the tissue is done arbitrarily and local structures within the section can have any threedimensional orientation. In this context, models whose output varies with the orientation of the input constitute a source of uncertainty. The output of such image analysis systems should be rotation invariant, meaning that the output of a model should not change when its input is rotated.
Convolutional Neural Networks (CNNs) are the method of choice to solve complex image analysis tasks, in part due to the translation covariance induced by trainable convolution operators. In theory, this structure allows CNNs to learn features in any orientation given sufficient capacity. For example, if a specific edge detector is a relevant filter for the task at hand, it is expected that the CNN learns this filter in all possible directions. Typical solutions to obtain rotation invariance consist in augmenting the dataset by generating additional randomly rotated samples, with the expectation that the model will learn the relevant features that are artificially observed under these additional orientations. Although data augmentation is a way to induce an invariance prior, such approaches do not guarantee conventional CNNs to be rotationinvariant. Furthermore, with such approaches it is common practice to average predictions of the trained model on a set of rotated inputs at test time: this can increase the robustness of the model, however it comes at the cost of a computational overhead.
We propose to replace convolutions in by group convolutions using representations of the special Euclidean motion group SE(2)
(rototranslation of a kernel) so as to explicitly encode the orientation of the learned features. This structure ensures that the learned representation is covariant/equivariant with the orientation of the input for rotations that lay on the pixel grid and to some extent for rotations that are out of the pixel grid. We achieve orientation encoding at resolution levels higher than 90degree via bilinear interpolation of the
SE(2) convolution kernels. Finally rotation invariance can be achieved via a projection operation with respect to the encoded orientation of the learned representation.Contributions
This work builds upon our previous work presented at the MICCAI conference 2018 (Bekkers et al., 2018a). In addition to a more detailed description of the proposed framework, we now present a comparative analysis of models with different angular discretization levels of the SE(2)image representations. Here we focus on three types of histopathology image analysis problems (mitosis detection, nuclei segmentation and tumor classification), for which we conduct experiments on popular and realistic benchmark datasets. With this we also show that the SE(2)image representations can be integrated in other classical CNN architectures such as Unet (Ronneberger et al., 2015). Finally, in a new series of indepth experimental analyses we show an increased robustness of the proposed GCNNs compared to standard CNNs with respect to rotational variations in the data. This includes a quantitative and qualitative assessment of rotational invariance of the trained networks, as well as a data regime analysis in which we investigate the effect of increased angular resolution when the data availability is reduced.
2 Rotation Invariance, Related Work, and Contributions
2.1 Rotation Invariance via GCNNs
We distinguish between invariance and equivariance/covariance as follows. An artificial neural network (NN) is invariant with respect to certain transformations when the output of the network does not change under transformations on the input. We call a NN equivariant, or covariant^{1}^{1}1Terminology changes between fields of study (mathematics, physics, machine learning) and often refer to the same. Following custom in machine learning research we will use the term equivariance., when the output transforms in a predictable way when the input is transformed (we formalize this statement in Subsec. 3.2). The property of equivariance guarantees that no information is lost when the input is transformed. Standard CNNs are equivariant to translations: if the input is translated the output translates accordingly and we do not need to worry about learning how to deal with translated inputs. It turns out that group convolution layers are the only type of linear NN layers that are guaranteed to be equivariant (see e.g. (Bekkers, 2019, Thm. 1)) and that the standard convolution layer is a special case that is translation equivariant. In this paper, we construct SE(2) equivariant group convolution layers and with it build GCNNs with which we solve problems in histopathology that require rotation invariance.
Nowadays, rotation invariance is often still dealt with via data augmentations. In such an approach the data is rotated during training time while keeping the target label fixed, thereby aiming for the network to learn how to classify input samples regardless of their orientation. Downsides of this approach are that 1) valuable network capacity is spend on learning geometric behavior at the cost of descriptive representation learning, 2) rotation invariance is not guaranteed, and 3) augmentation only captures geometric invariance globally. GCNNs solve these problems by hardcoding geometric structure into the network architecture such that 1) geometric behavior does not have to be learned, 2) rotation invariance is guaranteed by construction, and 3) each group convolution layer achieves local equivariance on its own, so that global equivariance is still obtained when the layers are stacked.
The localtoglobal equivariance property means that GCNNs recognize both lowlevel features (e.g. edges), midlevel features (e.g. individual cells), and highlevel features (e.g. tissue structure) independent of their orientations. In this paper we experimentally show that equivariant GCNNs indeed solve all three aforementioned problems and that in fact the added geometric structures leads to networks that significantly outperform classical CNNs trained with dataaugmentation.
2.2 Related Work on GCNNs
2.2.1 GCNN Methods
In the seminal work by Cohen and Welling (2016) a framework is proposed for group equivariant CNNs. In GCNNs, the convolution operator is redefined in terms of actions of a transformation group, and by consistent use of the group structure (rules for concatenating transformations) equivariance is ensured. They showed a significant performance gain of GCNNs over classical CNNs, however, the practical applicability was limited to discrete transformation groups that leave the pixel grid intact (s.a. rotations and reflections). Subsequent work in the field focused on expanding the class of transformation groups that are suitable for GCNNs by:

Working with a grid that has more symmetries than the standard Cartesian grid (Hoogeboom et al., 2018).

Expanding convolution kernels in a special basis, tailored to the transformation group of interest, that enables to build steerable CNNs (Worrall et al., 2017)
Extensions to 3D transformation groups are described in (Worrall and Brostow, 2018; Winkels and Cohen, 2019; Weiler et al., 2018; Andrearczyk et al., 2019), generalization to equivariance beyond rototranslations are described in (Bekkers, 2019; Worrall and Welling, 2019), extension to spherical data are described in (Cohen et al., 2018a; Kondor and Trivedi, 2018; Thomas et al., 2018; Esteves et al., 2018a), and additional theoretical results and further generalizations of GCNNs are described in (Cohen et al., 2018b; Kondor and Trivedi, 2018; Cohen et al., 2019). Applications of GCNN methods in medical image analysis are discussed below in Subsec. 2.2.4.
Although the first of the above generalizations elegantly enables an exact implementation of GCNNs of rototranslations with a finer resolution than the rotation angles of (Cohen and Welling, 2016)
, it is a very specific approach that does not generalize well to other groups. The second approach does not require to sample transformed kernels at all, but works exclusively by manipulations of basis coefficients in a similar way as standard 2D convolutions (and translations) can be described in the Fourier domain. This approach however requires careful bookkeeping of the coefficients, only optimizes over kernels expressible by the basis, and the choice for nonlinear activation functions is limited. In this paper we rely on the third approach. We build upon our previous work
(Bekkers et al., 2018a) and use bilinear interpolation to efficiently transform (unconstrained) convolution kernels. This allows us to build SE(2) equivariant GCNNs at arbitrary angular resolutions.2.2.2 Rotation Equivariant Machine Learning
Prior, and in parallel, to the above discussed GCNN methods, group convolution methods for pattern recognition have been proposed that, at the time, were not regarded as GCNNs or not treated in the full generality of (endtoend) deep learning. E.g.,
Gens and Domingos (2014) redefine the convolution operator and construct sparse (approximative) group convolution layers that are used to build what they called deep symmetry networks. Scattering convolution networks, as proposed by Mallat (2012), involve a concatenation of separable group convolutions with welldesigned handcrafted filters followed by the modulus as activation function. Other examples are orientation score based template matching (Bekkers et al., 2015), cyclic symmetry networks (Dieleman et al., 2016), oriented response networks (Zhou et al., 2017), and vector field networks
(Marcos et al., 2017), which can all be considered instances of rototranslation equivariant GCNNs.Other techniques that focus on equivariance properties of CNNs work via transformations on input feature maps, rather than transformations of convolution kernels as in GCNNs, and are closely related to spatial transformer networks
(Jaderberg et al., 2015). These methods include warped CNNs (Henriques and Vedaldi, 2017), polar transformer networks (Esteves et al., 2018b), and equivariant transformer networks (Tai et al., 2019). Although these methods describe elegant and efficient ways for achieving (global) equivariance, they often break translation equivariance and local symmetries as the transformations act globally on the whole inputs.2.2.3 Group Theory in Medical Image Analysis
Equivariance constraints and group theory take a prominent position in the mathematical foundations of “classical” image analysis, e.g., in scale space and wavelet theory. In medical image analysis, group theoretical algorithms enable to respect natural equivariance constraints and deal with context and the complex geometries that are abundant in medical images. Examples of group theoretical techniques, closely related to GCNNs, are orientation score (Duits et al., 2007; Janssen et al., 2018) methods such as crossing preserving vessel enhancement based on gauge theory on Lie groups (Franken and Duits, 2009; Hannink et al., 2014; Duits et al., 2016), vessel and nerve fiber enhancement (in diffusion imaging) via group convolutions with Gaussian (derivative) kernels (Duits and Franken, 2011; Zhang et al., 2015; Portegies et al., 2015), and anatomical landmark recognition via group convolutions(Bekkers, 2019). In other, nonconvolutional methods in medical image analysis, group theory provides a powerful tool to deal with symmetries and geometric structure, such as in statistical shape atlases (Hefny et al., 2015), shape matching (Hou et al., 2018), registration (Arsigny et al., 2006; Ashburner, 2007) and in general in statistics on nonEuclidean data structures (Pennec et al., 2019). Following this successful line of geometry driven methods in medical image analysis, we propose in this paper to rely on GCNNs to solve tasks in histopathology in an endtoend learning setting.
2.2.4 GCNNs in Medical Image Analysis
For many medical image analysis tasks, the location, reflection or orientation of objects of interest should not affect the output of the developed models. Although typical solutions rely on data augmentation, several studies investigated GCNNs in the context of medical image analysis to leverage this prior into building equivariant models that outperform classical CNNs.
In Winkels and Cohen (2018, 2019); Andrearczyk et al. (2019), GCNNs were used to detect pulmonary nodules in CT scans. GCNNs were also investigated for segmentation tasks in dermoscopy images (Li et al., 2018), retinal images (Bekkers et al., 2018a) and microscopy images (Bekkers et al., 2018a; Chidester et al., 2019a; Graham et al., 2019). Chidester et al. (2019b) proposed a variation of GCNNs for the classification of subcellular protein localization in microscopy images.
Rotationequivariant models have shown to be particularly efficient for problems in histopathology images, at cell level for mitosis detection (Bekkers et al., 2018a), nuclei segmentation (Chidester et al., 2019a), and at higher tissue levels for tumor classification in lymph node sections (Veeling et al., 2018) and glandlumen segmentation in colon histology images (Graham et al., 2019).
3 Material and Methods
We evaluate the proposed framework on three relevant histopathology image analysis tasks: mitosis detection, nuclei classification, and patchbased tumor classification. In this section, we first describe the benchmark datasets corresponding to the analysis tasks, that we used to train and evaluate the models. We then describe the relationship between the proposed framework and group theory, and our proposed implementation via bilinear interpolation of rotated convolution kernels.
3.1 Datasets
We chose three popular benchmark datasets of hematoxylineosin stained histological slides, in order to assess the performances of the proposed framework and its variants in a controlled and reproducible setup. In these datasets, we assume that the orientation of the objects of interest is irrelevant for the classification task.
Therefore we hypothesize that any bias in the orientation information captured by a nonrotationinvariant CNN could be reflected in its performance on the selected benchmarks. This hypothesis will be experimentally confirmed in Sect. 5.
Mitosis Detection
We used the public dataset AMIDA13 (Veta et al., 2015) that consists of high powerfield (HPF) images (resolution ) from breast cancer cases. Eight cases (458 mitotic figures) were used to train the models and four cases (92 mitoses) for validation. Evaluation is performed on a test set of independent cases (533 mitoses), following the evaluation procedure of the AMIDA13 challenge, for details see (Veta et al., 2015).
MultiOrgan Nuclei Segmentation
We used the subset of the public multiorgan dataset introduced by (Kumar et al., 2017), that consists of HPF images (resolution ), selected from WSIs of four different tissue types (Breast, Liver, Kidney and Prostate), provided by The Cancer Genome Atlas (Network and others, 2012), associated with mask annotations of nucleus instances. We used the balanced dataset split proposed in (Lafarge et al., 2019): HPF images for training (7337 nuclei), HPF images for validation (1474 nuclei) and HPF images for testing (4130 nuclei). Given the high staining variability of the dataset, all the images were stain normalized using the method described in (Macenko et al., 2009).
PatchBased Tumor Classification
We used the public PCam dataset introduced by (Veeling et al., 2018), that consists of image patches (resolution ), selected from WSIs of lymph node sections derived from the Camelyon16 Challenge (Ehteshami Bejnordi et al., 2017). The patches are balanced across the two classes (benign or malignant), based on the tumor area provided in (Ehteshami Bejnordi et al., 2017), and we used the dataset split proposed by (Veeling et al., 2018).
Data Regime Analysis
In order to study the behavior of the compared models when data availability is reduced, we analyzed the performances under different data regimes, by using reduced versions of the training sets. We constructed:

Three variations of the mitosis dataset by sequentially removing two cases out of the original eight.

Two variations of the nuclei dataset by sequentially removing one HPF image per organ out of the original three HPF images per organ.

Four variations of the patchbased tumor dataset by randomly removing , , and in each classsubset of the training data.
3.2 Group Representation in CNNs
3.2.1 The RotoTranslation group Se(2)
A group is a mathematical structure that consists of a set , for example a collection of transformations, together with a binary operator called the group product that satisfies four fundamental properties: Closure: For all we have ; Identiy: There exists an identity element ; Inverse: for each there exists an inverse element such that ; and Associativity: For each we have .
The group product essentially describes how two consecutive transformations, e.g. by , result in a single net transformation . Here, we consider the group of rototranslations, denoted^{2}^{2}2It is the semidirect product (denoted by ) of the group of planar translations and rotations , i.e., it is not the direct product since the rotation part acts on the translations in (1) in the group product of . by , which consists of the set of all planar translations (in ) and rotations (in ), together with the group product given by
(1) 
with group elements , with translations and planar rotations by . The group acts on the space of positions and orientations via
Since , we can identify the group with the space of positions and orientations . As such we will often write , instead of . Note that since .
3.2.2 Group representations
The structure of the group can be mapped to other mathematical objects (such as 2D images) via representations. Representations of a group
, parameterized by group elements that transform vectors, e.g. signals/images on a space , and which share the group structure viaWe use different symbols for the representations of on different type of data structures. In particular, we write for the leftregular representation of on 2D images , and it is given by
(2) 
with . It corresponds to a rototranslation of the image. We write for the leftregular representation on functions on , which we refer to as images, and it is given by
(3) 
with . In Sec. 3.3 we define the GCNN layers in terms of these representations.
3.2.3 Equivariance
Given the above definitions, we can formalize the notation of equivariance. An operator is equivariant with respect to a group if
(4) 
with and representations of on respectively functions the domains and . I.e., if we transform the input by , then we know that the output transforms via . To ensure that we maintain the equivariance property (4) of linear operators it is required that we define such in terms of representations of , that is, via group convolutions (see e.g. (Bekkers, 2019, Thm. 1), (Duits, 2005, Thm. 21), or (Cohen et al., 2018b, Thm. 6.1)).
3.3 SE(2) Group Convolutional Network Layers
3.3.1 Notation and 2D Convolution Layers
In the following we denote the space of multichannel feature maps on a domain by , with the number of channels. The feature maps themselves are denoted by , with each channel . The inner product between such feature maps on is denoted by
with the standard inner product between realvalued functions on . Then, with these notations we note that the classical 2D crosscorrelation^{3}^{3}3In CNNs one can take a convolution or a crosscorrelation viewpoint and since these operators simply relate via a kernel reflection, the terminology is often used interchangeably. We take the second viewpoint, our GCNNs are implemented using crosscorrelations. operator can defined in terms of inner products of input feature map with translated convolution kernels via
(5)  
with the translation operator, the leftregular representation of the translation group . It is well known that convolution layers , mapping between 2D feature maps (i.e. functions on ), are equivariant with respect to translations. I.e. in Eq. (4) we let be the leftregular representation of the translation group with .
3.3.2 RotoTranslation Equivariant Convolution Layers
Next we define two types of convolution layers that are equivariant with respect to rototranslations. We do so simply by replacing the translation operator in Eq. (5) with a representation of . When the input is a 2D feature map we need to rely on the representation of on 2D images, and define the lifting correlation:
(6)  
These correlations lift 2D image data to data that lives on the 3D position orientation space by matching convolution kernels under all possible translations and rotations.
We define the lifting layer, recall Fig. 1, as an operator that maps a 2D feature map with channels to an feature map with channels via lifting correlations with a collection of kernels, denoted with , each kernel with channels, via
(7) 
where we overload the symbol defined in Eq. (6) to also denote the lifting correlation between a set of convolution kernels and a vector valued feature map via . Note that such operators are equivariant with respect to rototranslations when in (4) we let and be the representations of given respectively in (2) and (3), indeed .
The lifting layer thus generates higherdimensional feature maps on the space of rototranslations. An equivariant layer that takes such feature maps as input is then again obtained by taking inner products of the input feature map with (3D) rototranslated convolution kernels , where the kernels are transformed by application of the representation of on ). Group correlations are then defined as
(8)  
Note here, that a rotation of an convolution kernel is obtained via a shifttwist, a planar rotation and shift along the axis, see Eq. (3) and Fig. 1. The convolution kernels are 3dimensional and they assign weights to activations at positions and orientations relative to a central position and orientation (relative to ). A set of kernels then defines a group convolution layer, which we denote with , and which maps from feature maps at layer , with channels, to feature maps at layer , with channels, via
(9) 
where we overload the group correlation symbol , defined in (8), to also denote correlation between a set of convolution kernels and a vector valued feature map on via
Finally, we define the projection layer as the operator that projects a multichannel feature map back to via
(10) 
Here we define the projection layer as taking the mean over the orientation axis, however, we note that any permutation invariant operator (on the axis) could be used to ensure local rotation invariance, such as e.g. the commonly used operator (Cohen and Welling, 2016; Bekkers et al., 2018a).
3.4 Discretized Se(2,n) Group Convolutional Network
Discretized 2D images are supported on a bounded subset of and the kernels live on a spatially rectangular grid of size in , with the kernel size. We discretize the group , with the space of 2D rotations in sampled with rotation angles , with .
The discrete lifting kernels at layer , are used to map a 2D input image with channels to an image with channels, and thus have a shape of (the discretization of is illustrated in Fig.1 as a set of rotated kernels, distributed on a circle). Likewise, the kernels have a shape of .
The lifting and group convolution layers require rotating the spatial part of the kernels and shift along the axis for the kernels. We obtain the rotated spatial parts of each kernel via bilinear interpolation. The discretization of a single lifting kernel and its rotated versions is illustrated in the topleft part of Fig.1. The discretization of a single group correlation kernel and its rotated and shifted versions is illustrated in the bottom part of Fig.1.
In order to construct the rotated sets of effective kernels or we rely on bilinear interpolation. We first define a set of vectors containing base weights that are used to generate rotated versions of the same 2D kernel via bilinear interpolation (that we implemented with a sparse matrix multiplication). Although these sets of rotated kernels are used in the computational pipeline, only the base weights are updated during the network optimization. By construction, the effective kernels are differentiable with respect to their base weight, enabling their update in backpropagation of gradients.
4 Experiments
In this section, we present the GCNN architectures that we build using the layers defined in Sec. 3.3 and we describe the experiments that we used to analyze and validate them. In the construction of the GCNNs we adhere to the following principle of group equivariant architecture design.
GCNN design principle
A sequence of layers starting with a lifting layer (Eq. (7)) and followed by one or more group convolution layers (Eq. (9)), possibly intertwined with pointwise nonlinearities, results in the encoding of rototranslation equivariant feature maps. If such a block is followed by a projection layer (Eq. (10)) then the entire block results in a encoding of features that is guaranteed to be rotationally invariant. Our implementation of the GCNN layers is available at https://github.com/tueimage/se2cnn.
SE(2,N) Groups  
Layers  N=1 ()  N=4 (p4)  N=8  N=16 
Input  
Lifting Layer
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Max. Proj.  
FC Layer
Sigmoid 
()  
Total
Weights 
4.1 Applications and Model Architectures
For each task introduced in Sect. 3.1 we conducted two experiments: first, we trained a set of variations of a baseline CNN, by changing the orientation sampling level of their SE(2,N) layers, while keeping the total number of weights of each model approximately the same. Second, we trained each model with the reduced data regime counterparts of the training sets introduced in Sect. 3.1.
Mitosis Detection
We used the mitosis classification model originally described in Bekkers et al. (2018a) as a baseline: a 6layer CNN with three downsampling steps, such that the overall receptive field is of size .
We designed the GCNN variants of this baseline described in Table 1, by replacing the first convolution layer by a lifting layer, replacing the following convolution layers by group convolution layers and inserting a projection layer before the last fully connected layer.
The models were trained with batches of size balanced across classes. Nonmitosis class patches were sampled based on a hard negative mining procedure (Cireşan et al., 2013) using a first baseline model trained with random negative patches. The models were trained to minimize the crossentropy of the binaryclass predictions.
Nuclei Segmentation
For the nuclei segmentation task, we opted for a 7layer Unet that corresponds to two spatial down/upsampling operations with an overall receptive field of size . The sequence of operations defining this GCNN architecture is given in the first column of Table 2.
The label associated with each input image is a 3class mask corresponding to the foreground, background and border of the nuclei it contains (these masks can then be used to retrieve an individual nucleus using a segmentation procedure such as described in Sect. 5).
The models were trained with batches of size balanced across patients, to minimize the classweighted crossentropy of the softmax activated output maps corresponding to the three target masks.
Tumor Classification
The baseline architecture we used for the tumor classification model is a 6layer CNN with three downsampling steps, such that the overall receptive field is of size (see Table 3 for the detailed architecture).
The models were trained with batches of size balanced across classes. We refined both classes by running a hard negative mining procedure (Cireşan et al., 2013) using a first baseline model trained with the original dataset of the benchmark. The models were trained to minimize the crossentropy of the binaryclass predictions.
SE(2,N) Groups  
Layers  N=1 ()  N=4 (p4)  N=8  N=16 
Input  
Lifting Layer
BN + ReLU MaxPool() 

() 
() 
() 
Group Conv.
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Upsampling
Concat(HL.2) Group Conv. BN + ReLU 
() 
() 
() 
() 
Upsampling
Concat(HL.1) Group Conv. BN + ReLU 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Max. Proj.  
FC Layer
Softmax 
()  
Total
Weights 
SE(2,N) Groups  
Layers  N=1 ()  N=4 (p4)  N=8  N=16 
Input  
Lifting Layer
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU MaxPool() 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Group Conv.
BN + ReLU 
() 
() 
() 
() 
Mean Proj.  
FC Layer
Sigmoid 
()  
Total
Weights 
4.2 Implementation details
For all three baseline architectures, convolution kernels are of size with circular masking and fully connected layers are implemented as convolutional layers with kernels of shape to enable dense application (the resulting models can efficiently be applied on larger input sizes).
Batch Normalization (Ioffe and Szegedy, 2015) is used throughout the networks. Batch statistics are normally computed across batch and spatial dimensions of the activations, but we also included the orientationaxis of the SE(2,N)image activation maps in the statistic computation to ensure their invariance with respect to the orientation of the input.
All models were trained with Stochastic Gradient Descent with momentum (learning rate
, momentum) and a epochwise learning rate decay using a factor of
was applied. Training was stopped after convergence of the loss computed on the validation sets. All models were regularized with decoupled weight decay (coefficient ). Baseline augmentation transformations were applied to the training image patches (random spatial transposition, random 90degreewise rotation, random channelwise brightness shifting).4.3 Experiment: Orientation Sampling
In order to assess the effect of using the proposed SE(2,N) GCNN structure on the benchmark performances, we trained every model with . In order to allow fair comparison we adjusted the number of channels in every layer involving SE(2,N)image representation such that the total number of weights in the models stay close to the count of the corresponding baselines. The detailed distributions of the weights are shown in Tables 1, 2 and 3: for each SE(2,N) group, the dimensions of the output of the layers are shown with the format , with the number of output channels in the layer.
Each model was trained three times with random initialization seeds. We report the mean and standard deviation of the performances across three random intializations.
4.4 Experiment: Data Regime Experiments
In order to assess the effect of using the proposed SE(2,N) with varying sampling factor N when data is availability is reduced, we trained each model on the dataregime subsets presented in Sect. 3.1. Likewise, each model was trained three times with random initialization seeds so as to report the variability of the performances.
5 Results
This section summarizes the qualitative and quantitative results of the experiments we conducted. Each trained model was evaluated on the test set of its corresponding benchmark dataset based on standard performance metrics.
Mitosis Detection
For the mitosis detection task, models were densely applied on test images, followed by a smoothing operation before extracting all local maxima to be considered candidate detections. We computed the F_{1}score of the set of detections using an operating point that is optimized on the validation set, as described in the scoring protocol used in (Veta et al., 2015).
Nuclei Segmentation
To quantify the performances of the nuclei segmentation model, generation of segmented candidate objects is obtained by following the protocol used in (Kumar et al., 2017; Lafarge et al., 2019). First, marker seeds are derived from thresholded foreground and background predictions, border predictions are used as the watershed energy landscape. Then, candidate objects that overlap the nuclei groundtruth masks by at least 50% of their area are considered hits, enabling objectlevel detection quantification to be calculated using the F_{1}score. Thresholds to generate marker seeds were selected such that the F_{1}score is maximized on the validation set.
Patchbased tumor classification
To evaluate the tumor classification model, we computed the class probability of every patch of the test dataset and calculated the accuracy of the model given the groundtruth labels as in
Veeling et al. (2018) after selection of the operating point that maximizes the accuracy on the validation set.5.1 Qualitative Results
We qualitatively investigated the robustness of the prediction of different models to controlled rotations of the input. We see that the model predictions can be very inconsistent for our best baseline model, in comparison to GCNN models (see Fig. 2, Fig. 4 and 3) in particular for cell or tissue morphologies that are typically asymmetric. For example, the mitotic figures (h) and (i) shown in Fig. 2 are in telophase (directed separation of the pair of chromosomes) and the variance of the prediction of the baseline model is higher for these cases (green curve) compared to the GCNN models (blue and red curves). We also observe that for the SE(2,4) model, predictions that are obtained for an input image rotated with an angle below rad also produce some variance, but present a radperiod cyclic pattern.
5.2 Quantitative Results
The performances of the trained models for both orientation sampling experiments and data regime experiments are summarized in the box plots of Fig. 5, 6 and 7.
Effect of orientation sampling
For all three studied tasks, we observed an increase of performance with the number of sampled orientations from to . For the full data regime of the mitosis detection experiments, the use of a SE(2,8) GCNN improves the F_{1}score to on average compared to for the baseline model without testtime rotation augmentation (see Fig. 5). A similar increase of performances is observed for the nuclei segmentation experiments with an improvement of the F_{1}score from to (see Fig. 6), and for the tumor classification experiments with an improvement of the accuracy from to (see Fig. 7).
We remark that the performances of the SE(2,4) GCNN models are better than the baseline with testtime rotation augmentation as was previously reported in literature for similar tasks (Bekkers et al., 2018a; Veeling et al., 2018). We also report that for all three tasks, SE(2,16) GCNN models perform worse than the SE(2,8) GCNN models.
Effect of reduced data regime with orientation sampling
For all three tasks, we see a global consistent decrease of performances when less training data is available. In Fig. 7, the performances of the SE(2,4) and SE(2,8) GCNN models trained with the 25%, 50% and 75% data regimes, are higher than for the baseline model at full data regime using testtime rotation augmentation. This reveals that under experimental conditions, data availability is not the only reason for limited performances since this experiment shows that the SE(2,N) GCNN models enable achieving higher performances than the baseline models, even if less data is available.
6 Discussion and Conclusions
The presented study investigated the effects of embedding the SE(2) group structure in CNNs, in the context of histopathology image analysis, across multiple controlled experimental setups.
The comparative analysis we conducted shows a consistent increase of performances for three different histopathology image analysis tasks when using the proposed SE(2,N) GCNN architecture compared to conventional CNNs acting in evaluated with testtime rotation augmentation. This is in line with previously reported results when using GCNNs with groups that lay on the pixel grid (p4, p4m) (Cohen and Welling, 2016; Veeling et al., 2018), but we also show that these performances can be surpassed when using groups with higher discretization levels of SE(2).
This confirms that conventional CNNs struggle to learn a rotation equivariant representation based on data solely and that enforcing equivariant representation learning enables reaching higher performances. GCNNs with SE(2,N) structure have the advantage to guarantee higher robustness to input orientation without requiring trainingtime or testtime rotation augmentation. Furthermore, the slight computational overhead for computing rotated convolutional operators and their gradient, at training time, can be canceled at testtime by computing and fixing all final oriented SE(2,N) kernels, resulting in a model that is computationally equivalent to conventional CNNs.
We show that these performances can be surpassed when using representations with higher angular resolution levels, as shown with experiments involving SE(2,8) GCNNs and when the training data is of sufficient amount. This conclusion corroborates the results we reported on other medical image analysis tasks (Bekkers et al., 2018a) and in studies that investigated models with rotated operators that lay outside of the pixel grid (Hoogeboom et al., 2018).
However, we also identified consistent lower performances for SE(2,16) GCNNs compared to SE(2,8) GCNNs at full data regime. We assume that this phenomenon is in part related to the model architectures we chose to enforce fixed model capacity, resulting in a number of channels in the representation of the SE(2,N) models being reduced when increases. This reduced number of channels might affect the diversity of the features learned by the models, to the point that this limits their overall performances. Therefore, it appears there is a tradeoff between performances and angular resolution at fixed capacity, further work would be necessary to confirm this hypothesis.
For the tumor classification task, we observed that the performances of the baseline models (with or without testtime rotation augmentation) reached a plateau, whatever the regime of available training data was among 25%, 50%, 75% or 100%. This indicates that in the conditions of the PCam dataset, the amount of available training data does not significantly influence the performances. However, the rotationequivariant models were able to achieve better performances with increased data regime.
This behavior was not evidenced for the mitosis detection and nuclei segmentation experiments. We assume this result may be taskdependent or might be due to the fact that the plateau of performances observed for the tumor classification models was not reached yet for the two other tasks.
We qualitatively showed that in some cases, the predictions of conventional CNNs are inconsistent when inputs are rotated, whereas SE(2) GCNNs show better stability in that sense. This suggests that the anisotropic learned features of conventional models only get activated when the input is observed in a specific orientation. On the shown examples (Sect. 5.1), the SE(2) models are more robust to the input orientation since their SE(2) structure guarantees the features to be expressed in multiple orientations. We also see that SE(2) models with a limited angular resolution can yet produce some variance for rotation angles lower than this resolution. This is also supported by the fact that higher performances were obtained for the experiments that compare SE(2,4) models to SE(2,8) models.
Still, variation of performances for these models was also observed when the input was rotated out of the pixel grid. We explain this limit from the approximation errors caused by two of the operators we used, and that have a weaker rotation equivariance property. First, the interpolationbased computation of the rotated kernels can cause small variations in the output when the input is rotated. Second, the pooling operators are not rotation equivariant by construction (since they lay on fixed downsampled versions of the pixel grid), and so are another source of error.
In conclusion, we proposed a framework for SE(2) groupconvolutional network and showed its advantages for histopathology image analysis tasks. This framework enables the learned models to be invariant to the natural rototranslational symmetry of histology images. We showed that GCNNs models whose representation have a SE(2) structure yield better performances than conventional CNNs and our experiments suggest the ability of GCNNs models to fully exploit the data amount of large datasets. Our results suggest the existence of a tradeoff between network capacity and the chosen angular resolution of the SE(2,N) operators. Directions for future work include further analysis of the relationship between the newly introduced architecturerelated hyperparameters and their effect on model performances, as well as studying other prior structures that can improve model stability to other families of input transformations.
References
 Exploring local rotation invariance in 3d cnns with steerable filters. In MIDL, pp. 15–26. Cited by: §2.2.1, §2.2.4.
 A logeuclidean framework for statistics on diffeomorphisms. In MICCAI, pp. 924–931. Cited by: §2.2.3.
 A fast diffeomorphic image registration algorithm. Neuroimage 38, pp. 95–113. Cited by: §2.2.3.

Training of Templates for Object Recognition in Invertible Orientation Scores: Application to Optic Nerve Head Detection in Retinal Images.
In
Energy Minimization Methods in Computer Vision and Pattern Recognition
, Vol. 8932, pp. 464–477. Cited by: §2.2.2.  Rototranslation covariant convolutional networks for medical image analysis. In MICCAI, pp. 440–448. Cited by: §1, item 3, §2.2.1, §2.2.4, §2.2.4, §3.3.2, §4.1, §5.2, §6.
 BSpline CNNs on Lie groups. arXiv preprint arXiv:1909.12057. Cited by: §2.1, §2.2.1, §2.2.3, §3.2.3.
 Template matching via densities on the rototranslation group. tPAMI 40, pp. 452–466. Cited by: item 3.
 Enhanced rotationequivariant unet for nuclear segmentation. In CVPR Workshops, Cited by: §2.2.4, §2.2.4.
 Rotation equivariant and invariant neural networks for microscopy image analysis. Bioinformatics 35, pp. i530–i537. Cited by: §2.2.4.
 Mitosis detection in breast cancer histology images with deep neural networks. In MICCAI, pp. 411–418. Cited by: §4.1, §4.1.
 Spherical cnns. In ICLR, Cited by: §2.2.1.
 A General Theory of Equivariant CNNs on Homogeneous Spaces. arXiv preprint arXiv:1811.02017. Cited by: §2.2.1, §3.2.3.
 Gauge equivariant convolutional networks and the icosahedral cnn. In ICML, Cited by: §2.2.1.
 Group equivariant convolutional networks. In ICML, pp. 2990–2999. Cited by: §2.2.1, §2.2.1, §3.3.2, §6.
 Exploiting cyclic symmetry in convolutional neural networks. arXiv preprint arXiv:1602.02660. Cited by: §2.2.2.
 Locally Adaptive Frames in the RotoTranslation Group and Their Applications in Medical Imaging. Journal of Mathematical Imaging and Vision 56, pp. 367–402. Cited by: §2.2.3.
 Perceptual organization in image analysis. Ph.D. Thesis, Eindhoven University of Technology, the Netherlands. Cited by: §3.2.3.
 Image analysis and reconstruction using a wavelet transform constructed from a reducible representation of the Euclidean motion group. International Journal of Computer Vision 72, pp. 79–102. Cited by: §2.2.3.
 Leftinvariant diffusions on the space of positions and orientations and their application to crossingpreserving smoothing of HARDI images. International Journal of Computer Vision 92, pp. 231–264. Cited by: §2.2.3.
 Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. Jama 318, pp. 2199–2210. Cited by: §3.1.
 Learning SO(3) Equivariant Representations with Spherical CNNs. In ECCV, Cited by: §2.2.1.
 Polar Transformer Networks. In ICLR, Cited by: §2.2.2.
 Crossingpreserving coherenceenhancing diffusion on invertible orientation scores. International Journal of Computer Vision 85, pp. 253. Cited by: §2.2.3.
 Deep symmetry networks. In NeurIPS, pp. 2537–2545. Cited by: §2.2.2.
 Rotanet: rotation equivariant network for simultaneous gland and lumen segmentation in colon histology images. In ECDP, pp. 109–116. Cited by: §2.2.4, §2.2.4.
 CrossingPreserving Multiscale Vesselness. In MICCAI, pp. 603–610. Cited by: §2.2.3.
 A liver atlas using the special euclidean group. In MICCAI, pp. 238–245. Cited by: §2.2.3.
 Warped convolutions: efficient invariance to spatial transformations. In ICML, pp. 1461–1469. Cited by: §2.2.2.
 HexaConv. In ICLR, Cited by: item 1, §6.

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry
. In MICCAI, pp. 756–764. Cited by: §2.2.3.  Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, pp. 448–456. Cited by: §4.2.
 Spatial transformer networks. In NeurIPS, pp. 2017–2025. Cited by: §2.2.2.
 Design and Processing of Invertible Orientation Scores of 3d Images. Journal of Mathematical Imaging and Vision, pp. 1–32. Cited by: §2.2.3.
 On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups. In ICML, pp. 2747–2755. Cited by: §2.2.1.
 A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging 36, pp. 1550–1560. Cited by: §3.1, §5.
 Learning domaininvariant representations of histological images. Frontiers in Medicine 6, pp. 162. Cited by: §3.1, §5.
 Deeply supervised rotation equivariant network for lesion segmentation in dermoscopy images. In OR 2.0 ContextAware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical ImageBased Procedures, and Skin Image Analysis, pp. 235–243. Cited by: §2.2.4.
 A method for normalizing histology slides for quantitative analysis. In ISBI, pp. 1107–1110. Cited by: §3.1.
 Group invariant scattering. Communications on Pure and Applied Mathematics 65, pp. 1331–1398. Cited by: §2.2.2.
 Rotation Equivariant Vector Field Networks. In CVPR, pp. 5048–5057. Cited by: §2.2.2.
 Comprehensive molecular portraits of human breast tumours. Nature 490, pp. 61–70. Cited by: §3.1.
 Riemannian Geometric Statistics in Medical Image Analysis. Elsevier. Cited by: §2.2.3.
 Improving fiber alignment in HARDI by combining contextual PDE flow with constrained spherical deconvolution. PloS one 10. Cited by: §2.2.3.
 Unet: convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §1.
 Equivariant Transformer Networks. arXiv preprint arXiv:1901.11399. Cited by: §2.2.2.
 Tensor Field Networks: Rotationand TranslationEquivariant Neural Networks for 3d Point Clouds. arXiv preprint arXiv:1802.08219. Cited by: §2.2.1.
 Rotation equivariant cnns for digital pathology. In MICCAI, pp. 210–218. Cited by: §2.2.4, §3.1, §5, §5.2, §6.
 Assessment of algorithms for mitosis detection in breast cancer histopathology images. Medical Image Analysis 20, pp. 237–248. Cited by: §3.1, §5.
 3d steerable cnns: learning rotationally equivariant features in volumetric data. In NeurIPS, pp. 10381–10392. Cited by: §2.2.1.
 Learning steerable filters for rotation equivariant cnns. In CVPR, Cited by: item 3.
 Pulmonary nodule detection in ct scans with equivariant cnns. Medical Image Analysis 55, pp. 15 – 26. Cited by: §2.2.1, §2.2.4.
 3d gcnns for pulmonary nodule detection. In MIDL, Cited by: §2.2.4.
 Cubenet: equivariance to 3d rotation and translation. In ECCV, pp. 567–584. Cited by: §2.2.1.
 Harmonic networks: deep translation and rotation equivariance. In CVPR, pp. 5028–5037. Cited by: item 2.
 Deep Scalespaces: Equivariance Over Scale. arXiv preprint arXiv:1905.11697. Cited by: §2.2.1.
 Robust and Fast Vessel Segmentation via Gaussian Derivatives in Orientation Scores. In ICIAP, pp. 537–547. Cited by: §2.2.3.
 Oriented response networks. In CVPR, pp. 4961–4970. Cited by: §2.2.2.
Comments
There are no comments yet.