Geometric Operator Convolutional Neural Network

09/04/2018
by   Yangling Ma, et al.
0

The Convolutional Neural Network (CNN) has been successfully applied in many fields during recent decades; however it lacks the ability to utilize prior domain knowledge when dealing with many realistic problems. We present a framework called Geometric Operator Convolutional Neural Network (GO-CNN) that uses domain knowledge, wherein the kernel of the first convolutional layer is replaced with a kernel generated by a geometric operator function. This framework integrates many conventional geometric operators, which allows it to adapt to a diverse range of problems. Under certain conditions, we theoretically analyze the convergence and the bound of the generalization errors between GO-CNNs and common CNNs. Although the geometric operator convolution kernels have fewer trainable parameters than common convolution kernels, the experimental results indicate that GO-CNN performs more accurately than common CNN on CIFAR-10/100. Furthermore, GO-CNN reduces dependence on the amount of training examples and enhances adversarial stability. In the practical task of medically diagnosing bone fractures, GO-CNN obtains 3 improvement in terms of the recall.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

01/16/2019

Extension of Convolutional Neural Network with General Image Processing Kernels

We applied pre-defined kernels also known as filters or masks developed ...
07/14/2017

Generalizing the Convolution Operator in Convolutional Neural Networks

Convolutional neural networks have become a main tool for solving many m...
05/05/2021

A Novel Multi-scale Dilated 3D CNN for Epileptic Seizure Prediction

Accurate prediction of epileptic seizures allows patients to take preven...
09/22/2019

Using machine learning to construct velocity fields from OH-PLIF images

This work utilizes data-driven methods to morph a series of time-resolve...
04/28/2020

3D Solid Spherical Bispectrum CNNs for Biomedical Texture Analysis

Locally Rotation Invariant (LRI) operators have shown great potential in...
04/22/2018

Decoupled Networks

Inner product-based convolution has been a central component of convolut...
02/15/2018

Convolutional Analysis Operator Learning: Acceleration, Convergence, Application, and Neural Networks

Convolutional operator learning is increasingly gaining attention in man...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Convolutional Neural Networks have been successfully applied in many fields during recent decades, but the theoretical understanding of the deep neural network is still in the preliminary stages. Although Convolutional Neural Networks have strong expressive abilities, they have to two clear deficiencies. First, as complex functional mappings, Convolutional Neural Networks, like black boxes, cannot take full advantage of domain knowledge and prior information. Second, when little data is available for a certain task, Convolutional Neural Networks’ generalization ability weakens. This is due to overfitting, which may occur due to the large number of parameters and the large model size. Stemming from these two defects, a great deal of research has been done to modify CNNs [Dai et al.2017] [Wang et al.2018] [Sarwar, Panda, and Roy2017].

Before CNNs were applied, traditional geometric operators had developed quite well. Each geometric operator represents the precipitation of domain knowledge and prior information. For example, the Sobel operator [Works] is a discrete difference operator, which can extract image edge information for edge detection. The Schmid operator [Schmid2001]

is an isotropic circular operator, which extracts texture information from images for face recognition. The Histogram of Oriented Gradients (HOG)

[Dalal and Triggs2005] is a statistic operator of gradient direction, which extracts edge direction distributions from images for pedestrian detection and other uses.

Many computer vision tasks require domain knowledge and prior information. Geometric operators can make use of domain knowledge and prior information, but cannot automatically change parameter values by learning from data. Convolutional Neural Networks have strong data expression abilities and learning abilities, but they struggle to make use of domain knowledge. For better data learning, we have combined the two. It is natural to directly use geometric operators for pre-processing, and then classify the data through a Convolutional Neural Network

[Yao et al.2016]. However, this method uses human experience to select geometric operator parameter values, and then carries out the Convolutional Neural Network learning separately. This method is a kind of two-stage technique, and without reducing parameter redundancy in a Convolutional Neural Network, it is difficult to achieve global optimization. The method proposed in this paper directly constructs geometric operator convolution and then integrates geometric operator convolution into a Convolutional Neural Network to form a new framework - the Geometric Operator Convolutional Neural Network. This method achieves global optimizations and utilizes the properties of geometric operators.


In summary, the contributions of this work are as follows:

  • This framework can integrates many conventional geometric operators, which reveals its broad customization capabilities when handling diverse problems.

  • In theory, the same approximation accuracy and generalization error bounds are achieved when geometric operators meet certain conditions.

  • The Geometric Operator Convolutional Neural Network not only reduces the redundancy of the parameters, but also reduces the dependence on the amount of the training samples.

  • The Geometric Operator Convolutional Neural Network enhances adversarial stability.

We organize the remaining chapters in the following sequence. We first briefly introduce related work in Sec. 2. In Sec. 3, we describe our framework for the Geometric Operator Convolutional Neural Network, and in Sec. 4, we introduce theoretical analyses. Experiments and conclusion are presented in Sec. 5, Sec. 6, respectively.

2 Related Work

In recent years, Convolutional Neural Networks have been widely used in various classification and recognition applications [Krizhevsky, Sutskever, and Hinton2012] [Hu et al.2014]. Convolutional Neural Networks have achieved advanced success in various problems. All CNNs adopt an end-to-end approach to learning; however, each unique task is associated with its own distinctive domain knowledge and prior information. Thus, to improve classification accuracy, researchers use priori information that is tailored to each specific task and each specific Convolutional Neural Network. One way to do this is to use the traditional image processing algorithm as a preprocessing step. Another way is to use the traditional image processing algorithm to initialize convolution kernels.

Classification accuracy is a primary concern for researchers in the machine-learning community. Different pre-processing models, such as filters or feature detectors, have been employed to improve the accuracy of CNNs. One example of this is the Gabor filter with CNN

[Daugman1988]

. The Gabor filter is a feature extractor based on human vision. Besides the Gabor filter, some people also use Fisher vectors

[Cimpoi, Maji, and Vedaldi2014], sparse filter Banks [Pfister and Bresler2015], and the HOG algorithm [Lu, Wang, and Zhang2018] combined with a CNN to improve accuracy. Based on the human visual system, these filters are found to be remarkably well-suited for texture representation and discrimination. In the works by Bogdan et al. [Kwolek2005] and Mounika et al. [Mounika, Reddy, and Reddy2012], the Gabor filter is used to extract features from the input image in a pre-processing step. However, these methods require a kind of two-stage procedure that may not reach the optimal global solution.

In addition, some scholars use traditional image processing algorithms to initialize convolutional kernels, such as building a Feature Pyramid Network with an image pyramid for multi-scale feature extraction

[Lin et al.2017]. Geometric operators are widely used in traditional image processing algorithms. Many researchers use the Gabor filter to fix the first convolution layer, while other layers, which are common convolution layers, can be trained to improve their accuracy [Yao et al.2016] [Sarwar, Panda, and Roy2017]. Vijay et al. [John, Boyali, and Mita] simultaneously adopted the weight of the first layer convolution with the Gershgorin circle theorem and the Gabor filter constraint to improve the classification accuracy when Convolutional Neural Networks propagated backward. In [Calderón, Roa, and Victorino2003] [Chang and Morgan2014], the authors have attempted to get rid of the pre-processing overhead by introducing Gabor filters in the first convolutional layer of a CNN. In addition, some researchers use filters to initialize multiple convolutional kernels. Shangzhen et al. [Lu, Wang, and Zhang2018] only used the Gabor function to create kernels in four directions to initialize the convolutional kernels from a Convolutional Neural Network. These methods change the initialization weight and use domain knowledge, but they do not reduce the redundancy of model parameters, and they do not enhance the transformation ability of the model.

In summary, the above two ways only use domain knowledge and prior information to improve Convolutional Neural Networks and classification accuracy. In this paper, a new network, the Geometric Operator Convolutional Neural Network, is proposed. This method integrates geometric operators, namely the filters, into a convolutional neural network. This network can not only make use of domain knowledge and prior information, but also reduce the redundancy of network parameters and enhance the ability of model transformation.

3 The framework of the Geometric Operator Convolutional Neural Network

Traditional geometric operators have many properties. By convolving geometric operators and integrating them into a deep neural network to form a Geometric Operator Convolutional Neural Network, we not only retain the characteristics of geometric operators, but also give play to the powerful feature expression ability of deep neural networks. This framework renders image classification tasks more effective. The method’s construction is described in detail in the following section.

3.1 Geometric operators

Before the development of deep Convolutional Neural Networks, traditional image feature extraction methods were based on traditional image processing algorithms, primarily geometric operators. At present, a large number of geometric operators have been applied, such as the Scale Invariant Feature Transform (SIFT) [Lowe1999], the Roberts operator [Rosenfeld1981], the Laplace operator [van Vliet, Young, and Beckers1989], the Gabor operator [Han and Ma2007]

, and so on. Each operator has different characteristics. Therefore, different geometric operators are used in different application scenarios, according to the characteristics of each unique problem. For example, SIFT looks for feature points in different scale spaces for pattern recognition and image matching. The Roberts operator uses local differences to find edges for edge detection, and the Laplace operator uses isotropic differentials to retain details for image enhancement.

Geometric operators represent the precipitation of domain knowledge and prior knowledge. The Geometric Operator Convolutional Neural Network is proposed in this paper, which uses the characteristics of geometric operators. The first step in this framework is to convolve geometric operators. In this paper, the Gabor operator and the Schmid operator are mainly used as examples to illustrate how to carry out convolutions and integrate these convolutions into Convolutional Neural Networks. Other geometric operators in subsequent studies employ similar concepts.

3.2 Convolution of geometric operators

Gabor operator

In order to study the frequency characteristics of local range signals, Dennis Gabor [Gabor1946]

proposed the famous “Window” Fourier transform (also called the short-time Fourier transform, STFT) in the paper “Theory of communication” in 1946. This is now known as the Gabor operator; when combined with images, it is referred to as the Gabor filter. Until now, the Gabor filter has undergone many developments, and its primary characteristics are listed below. First, the Gabor filter has the advantages of both spatial and frequency signal processing. As shown in Eqn.

1. 0, the Gabor operator is essentially a Fourier transform with a gaussian window. For an image, the window function determines its locality in the spatial domain, so the spatial domain information from different positions can be obtained by moving the center of the window. In addition, since the gaussian function remains the same after the Fourier transform, the Gabor filter can extract local information in the frequency domain. Second, the Gabor filter’s response to biological visual cells may be an optimal feature extraction method. In 1985, Daugman [Daugman1985] extended the Gabor function to a 2-dimensional form and constructed a 2D Gabor filter on this basis. It was surprising to find that the 2D Gabor filter was also able to obtain both the minimum uncertainty of time and frequency domain at the same time, while maintaining consistency with the mammalian model of retinal nerve cell reception. Third, the Gabor kernels are similar to the convolution kernels from the first convolutional layer in the CNN. An illustration of this similarity is shown in Fig. 1. From the visualization of the first convolutional layer in AlexNet, which was proposed by Alex et al. [Krizhevsky, Sutskever, and Hinton2012]. Some convolution kernels present geometric properties, as in the kernel function from the Gabor filter. From this feature, it can also be explained that there are parameter redundancies in the Convolutional Neural Network, and the Gabor operator can be convoluted and integrated into CNN. Lastly, the Gabor filter can extract directional correlation texture features from an image. As shown in Fig. 2, there are 40 Gabor kernels from five scales and eight directions convolving with an image. Texture feature maps in different directions can be obtained from the original image.

(1. 0)
(a) Convolution kernels from the first layer of ResNet50
(b) Gabor kernels
Figure 1: The similarities between the CNN’s first convolutional kernels and Gabor kernels.
Figure 2: The results of the Gabor operator on an image

Since the Gabor operator combines with the CNN in the image, better feature expressions can be obtained. There are two main binding methods. First, the image is preprocessed by the Gabor operator, and then its features are extracted by the CNN. Next, the gabor operator is convoluted to form a convolution layer, and then we integrate this convolution into the common Convolutional Neural Network. The second approach is used in this article. As shown in Eqn. 1. 0, the Gabor kernel function has 5 parameters, which are obtained by learning and then regenerated into an kernel. We replace the common convolution kernels with these Gabor kernels to form a convolutional layer. However, for the common convolutional layer, an convolution kernel is generated by an identity mapping, which requires parameters. So, our method reduces the number of trainable parameters in the convolutional layer.

Schmid operator

In 2001, Schmid et al. [Schmid2001] proposed a Gabor-like image filter, namely the Schmid operator. As shown in Eqn. 2. 0, its composition is similar to the kernel function of the Gabor operator, so it retains the properties of the Gabor operator. In addition, as shown in Fig. 3, when the original image and a version of that image that has been rotated 90 degrees are both convolved with the same Schmid kernel, the resulting characteristic graph exhibits only 90 degrees of rotation; in other words, the Schmid operator has rotation invariance. The Schmid operator is then convoluted, and we integrate this convolution into common Convolutional Neural Network. This network improves the model’s adversarial stability to rotation and improve the image feature extraction effect. Similar to the convolution of the Gabor operator, as shown in Eqn. 2. 0, the Schmid kernel function has two parameters, which are obtained by learning and then generated by the Schmid kernel. Finally, we replace common convolution kernels with Schmid kernels to form a convolutional layer.

Figure 3: The results of the Schmid operator on an image
(2. 0)

In this paper, only two geometric operator convolutions are explained. Similarly, for other geometric operators, operator kernels are generated by operator kernel functions, which replace common convolution kernels to form a convolutional layer. Due to the diversity of geometric operators, different geometric operators can be replaced with geometric operator convolutions, so the geometric operator convolution is customizable. There is a kind of geometric operator to form any kind of geometric operator convolution. Consequently, a question that must be addressed is how we combine multiple geometric operators with common Convolutional Neural Networks to form the Geometric Operator Convolutional Neural Network.

3.3 Geometric Operator Convolutional Neural Network

Only a visualization of the first layer of convolution kernels maintains some geometric characteristics, so the Geometric Operator Convolutional Neural Network proposed in this paper only replaces kernels from the first convolutional layer with geometric operator kernels. The framework of the Geometric Operator Convolutional Neural Network is introduced in Fig. 4. First, kernels from the first convolutional layer are calculated by the parameters of various geometric operators. Then, we concatenate all the calculated convolutional kernels in the last dimension to obtain a complete convolutional kernel. This convolution kernel is used as the weight of the first convolution layer in the Geometric Operator Convolutional Neural Network, and then the common convolution layer and output layer are connected. In this way, we have defined the forward propagation of the whole Geometric Operator Convolutional Neural Network. So, in reverse propagation, the gradient of loss is transferred to the convolution kernel; this process is different from the usual convolution. Here, the convolution kernel generated by the geometric operator needs to further use the chain derivative rule (i. e., Eqn. 3. 0, where

is the loss function,

is each convolution kernel, and is the parameter to generate each convolution kernel) to transfer the gradient to the parameters of each convolution kernel. Then, trainable parameters are updated by gradient descent algorithms, and the whole Geometric Operator Convolutional Neural Network is complete.

Figure 4: The architecture of our framework
(3. 0)

4 Theoretical analyses

The whole framework of the Geometric Operator Convolutional Neural Network has been introduced above. Next, we describe how to theoretically analyze the Geometric Operator Convolutional Neural Network. It is theoretically proved that although the number of trainable parameters in the Geometric Operator Convolutional Neural Network decreases, the effectiveness for computer vision tasks does not decrease.

4.1 Definition of data and loss function

  • We denote the input by , the corresponding label is .

  • The loss function is Mean Square Error.

  • The output of the neural network is for each input , and the empirical loss function is defined as follows:

    (4.0)
Lemma 1.

[Cybenko1989] Define a functional class , where each can be approximated with error at most by a one hidden-layer neural network , that is:

(5.0)
Lemma 2.

[Bartlett and Mendelson2002] let and be two hypothesis classes and let be a constant, we have:

(6.0)
Lemma 3.

[Mohri, Rostamizadeh, and Talwalkar2012] Let

be a random variable of support

and distribution . Let be a data set of i.i.d. samples drawn from . Let be a hypothesis class satisfying . Fix

. With probability at least

over the choice of , the following holds for all :

(7.0)
Definition 1 (Parametric Convolutional Kernel Space).

Let be a function that maps vector from to matrix in , and we call this function as convolution kernel generator function. Then we define Parametric Convolutional Kernel Space as:

(8.0)

We call the parameter number, the kernel size, (short for output dimension) the output dimension. Since a convolutional kernel in a parametric convolutional kernel spaces is generated by function , we call as the generator function, and as the pixel generator function.

Once we have defined the parametric convolutional kernel space, we can use parameters to generate sized convolution kernel through the generator function , which means that the parameters in a convolutional layer can be reduced if . However, the reduction in parameters often causes loss of performance as the hypothesis space becomes smaller. Therefore, how the reduction of parameters affects the performance is the key point. We want to study the simplest situation, that is to say, we want to replace the ordinary kernel in the first convolutional layer with the parameter kernel generated from a parametric convolutional kernel space.

Definition 2 (Geometric Operator CNN).

Assume that is a parametric convolutional kernel space. If the kernel in the first convolutional layer of a convolutional neural network is generated from , we call this network Geometric Operator CNN. We denote the set of Geometric Operator CNN by .

Geometric Operator CNN is almost exactly the same as common CNN, except for the kernel in the first convolutional layer. We treat the first convolutional layer as a function from images to outputs, which then act as input of the following layer. If this function is not an injective function, meaning that different inputs can be mapped to identical outputs, then the network takes these identical outputs as the input of the following layers, meaning that the final outputs are still the same. However, the image inputs of the first convolutional layer are different, and corresponding labels can also be different. Thus, when the final outputs are the same, errors must occur.

Therefore, we need to choose kernel carefully to make the function be an injective function. Since the convolution operator is a linear operator, we have the following proposition.

Proposition 1.

If the kernel of a convolutional layer, denoted by , satisfies the following:

(9.0)

where is the layer input and is the convolution operation, then this convolutional layer is an injective function.

We find a necessary and sufficient condition for a convolutional layer to be an injective function. But which kernel satisfies this condition? In the proposition below, we show that kernel generated by Gabor filter function satisfies this condition.

Proposition 2.

Let be the Gabor filter function, that is , where . Let be the corresponding parametric convolutional kernel space with kernel size equal to 3 and sufficient output dimension . Then, there exists kernel in satisfies the condition (9.0).

As the kernel generated from could not meet the (9.0), we have the following definition:

Definition 3 (Well-Defined Geometric Operator CNN).

Let , if there is a kernel generated by that satisfies (9.0), we call a well-defined Geometric Operator CNN. We denote the set of all well-defined Geometric Operator CNNs as .

Corollary 1.

If the generator function is Gabor filter function, the Geometric Operator CNN is well-defined.

Now, let us consider a Convolutional Neural Network with one convolutional layer and two fully-connected layers, and we will study the convergency of common CNN and Geometric Operator CNN. For the common CNN, denoted by , we define the convolution kernel as . The weights of the rest of fully-connected layers are , and the biases of three layers are . Let stand for

sigmoid activation function

, then the convolutional layer and the fully-connected layer can be defined as follows:

(10.1)

Then, the last two fully-connected layers can be defined as:

(10.2)

Therefore, the output before activation, denoted by , and after activation, denoted by , are defined as:

(10.3)

We denote the set of common CNN as , that is, , and the output before activation and after activation of input as .

For a Geometric Operator CNN , we similarly define the convolutional kernel to be , and the weights and biases are . Then, we have the following shorthand when the input is :

(10.4)

We denote the output before activation and after activation of input as as well.

We maintain the same neuron number for each corresponding layer in common CNN and Geometric Operator CNN, that is to say,

, since the approximation ability is different when the neuron number is different. We define the width of each layer as

Then, the empirical loss function for common CNN and Geometric Operator CNN is:

(10.5)

We have the following theorem on the difference of these two loss functions.

Theorem 1.

For any , where is the set of common CNN, if the first fully-connected layer is wide enough, the empirical loss of a well-defined Geometric Operator CNN can be that of common CNN controls. That is, for an arbitary , there exists and , such that when , the following inequality holds:

(11.0)
Theorem 2.

For any , where is the set of common CNN, if the first fully-connected layer is wide enough, the generalization error of a well-defined Geometric Operator CNN can be that of common Convolutional Neural Network controlled. That is, for an arbitary , there exists and , such that when , the following inequality holds:

(12.0)

In Theorem. 2, we know that well defined Geometric Operator CNNs have almost the same generalization error as common CNNs. Therefore, we need to find which Geometric Operator CNNs are well defined.


As Geometric Operator CNN with Gabor filter function as the generator function is well defined, we have the following corollary.

Corollary 2.

Let be Gabor filter function, for any , if the first fully-connected layer is wide enough, the generalization error Geometric Operator CNN , which applies as the generator function can be that of controlled. That is, for an arbitary , there exists and , such that when , the following inequality holds:

(13.0)

More generally, if there are many generator functions in the first convolutional layer of a Geometric Operator CNN, when the number of kernels generated by Gabor fiter function is sufficient enough, this Geometric Operator CNN is also well defined. Therefore, we have the following corollary.

Corollary 3.

Let be the set of generator functions. Suppose that there are convolution kernels in the first convolutional layer of a Geometric Operator CNN, denoted by , and each is generated by function , where . If there exists such that is Gabor filter function, and the number of kernels generated by , denoted by , is sufficient big enough, then is well defined, so that (12.0) holds.

5 Experiments

In the previous chapter, we give theoretical assurance for the Geometric Operator Convolutional Neural Network. The following section includes an explanation of the experiments conducted on the geometric Operator Convolutional Neural Network. All experiments are performed on a single machine with CPU Intel Core i7-7700 CPU @ 3.60GHz × 8, GPU TITAN X (Pascal), and RAM 32G.

5.1 Approximation accuracy, generalization error, and feature visualization

Approximation accuracy and generalization error Theoretical analyses ensures that the Geometric Operator Convolutional Neural Network has the same approximation accuracy and the same upper bound for generalization error as the common Convolutional Neural Network. We verify this using two kinds of experiments on CIFAR-10/100. The generalization error refers to the performance of the model on the test set, and the approximation accuracy refers to the performance of the model on the training set.

Figure 5: CIFAR-10
Figure 6: The framework of ResNet
94.79% 77.06%
95.17% 77.59%
95.27% 78.26%
95.77% 78.72%
94.44% 78.45%
94.72% 79.50%
Table 1: The model’s accuracy rates averaged over five experiments on the test set

Recognizing objects in an actual scene is not dependent on corresponding domain knowledge but on humans’ prior information. For object recognition tasks, the Geometric Operator Convolutional Neural Network’s recognition effect is worth exploring. The commonly used public data sets for common object recognition are CIFAR-10 (ten categories, as shown in Fig. 5) and CIFAR-100 (100 categories). They are all three-channel color images with a resolution of 3232. The train set contains 50,000 images and the test set contains 10,000 images. As shown in Fig. 6

, ResNet18, ResNet34, and ResNet50 were used on these two public datasets. In the experiment, four paddings were added on the four edges. Then, a random 32

32 cropping was performed, and a data enhancement method was carried out, which involved turning the image up and down. For both testing and training, the images’ pixels are normalized to a 0-1 distribution. The Stochastic gradient descent optimization algorithm with 0.9 the momentum

[Loshchilov and Hutter2016]

was used during the training process. The batch size was 100, the initial learning rate was 0.1, and the weight decay was 0.0005. The learning rate was reduced by one fifth per 60, 120, and 160 epochs. We report the performance of our algorithm on a test set after 200 epochs based on the average over five runs.

(a) CIFAR-10: ResNet18
(b) CIFAR-10: ResNet34
(c) CIFAR-10: ResNet50
(d) CIFAR-100: ResNet18
(e) CIFAR-100: ResNet34
(f) CIFAR-100: ResNet50
Figure 7: Log of cross entropy curve during training in common ResNet 18-34-50 and GO-ResNet 18-34-50.

As shown in Fig. 7, according to the cross-entropy curve of the CIFAR-10 and CIFAR-100 train sets, GO-CNN’s value initially fell faster than the common CNN’s, eventually almost reaching the same value. It is verified that Geometric Operator Convolutional Neural Network achieves the same approximation accuracy as the common Convolutional Neural Network. According to the error rate curve of the CIFAR-10 and CIFAR-100 verification set (Fig. 8), the value of Geometric Operator Convolutional Neural Network is lower than that of the common Convolutional Neural Network. In addition, as shown in Tab. 1, the Geometric Operator Convolutional Neural Network on the CIFAR-10 test set was 0.4% more accurate than the common Convolutional Neural Network. On the CIFAR-100 test set, the GO-CNN was 0.5% more accurate than the common CNN. It is verified that Geometric Operator Convolutional Neural Network achieves the same generalization error bound as the common Convolutional Neural Network.

(a) CIFAR-10: ResNet18
(b) CIFAR-10: ResNet34
(c) CIFAR-10: ResNet50
(d) CIFAR-100: ResNet18
(e) CIFAR-100: ResNet34
(f) CIFAR-100: ResNet50
Figure 8: Error rate curve during training in common ResNet 18-34-50 and GO-ResNet 18-34-50.

Feature visualization One way to evaluate a model is through visualizing the features that the model extracts; this is called feature visualization. T-SNE [Maaten and Hinton2008] or PCA [Jolliffe2011]

are generally used for visualization. The T-SNE visualization maps data points to a two-dimensional or three-dimensional probability distribution through affinitie transformation. Then, the data points are displayed with a two-dimensional or three-dimensional plane.

In this paper, a two-dimensional T-SNE visualization is adopted to display the CIFAR-10 features extracted by the model. As shown in Fig. 9, the CIFAR-10 features extracted by the Geometric Operator Convolutional Neural Network are evenly separated from each other in the two-dimensional visualization of T-SNE, while the features extracted from the common Convolutional Neural Network are mixed. It is apparent that the features extracted by the Geometric Operator Convolutional Neural Network are more separable; in other words, the features learned by the Geometric Operator Convolutional Neural Network are more distinguishable and easy to classify with the last fully connected layer.

(a) Common CNN
(b) GO-CNN
Figure 9: T-SNE two-dimensional visualization of CIFAR-10

The numerical experimental results and the feature visualizations of the two datasets reveal that the Geometric Operator Convolutional Neural Network achieves the same approximation accuracy and the same upper bound for the generalization error as the common Convolutional Neural Network. Moreover, the features extracted by Geometric Operator Convolutional Neural Network are more distinguishable.

5.2 Generalization

In many practical applications, such as the military, medical care, and so on, annotated data are often insufficient. Thus, a model’s generalization ability for small data sets is of great importance. The generalization ability refers to the ability of a model to predict unknown data when it has been learned by a certain method.

For the open datasets CIFAR-10/100 and MNIST, their train sets are large and their test sets are small. MNIST is a public, handwritten recognition dataset with a total of ten classes. This dataset is shown in Fig. 10 as a channel image with 2828 resolution and a clean background. There are 50,000 train sets, 5,000 verification sets, and 10,000 testing sets. In these numerical experiments, the test set is directly used to train the model, and the train set is used to evaluate the model. These experiments assess the generalization ability of the Geometric Operator Convolutional Neural Network and the common Convolutional Neural Network.

Many training techniques have been used in numerical experiments with CIFAR and MNIST. For numerical experiments with the CIFAR-10/100, the techniques and models used are the same as in Sec. 5.1

. For numerical experiments with the MNIST data set, the adaptive moment estimation (Adam

[Kingma and Ba2014]) optimization algorithm was used. In addition, as an image enhancement strategy, the image padding was increased to 3232 during the training process. The batch size was set to 11, the initial learning rate was 0.001, and the weight decay was 0.0005. The learning rate stays the same until reaching 20,000 iterations. Consequently, we complete 20,000 iterations on one test set and average the performance over five runs in order to report the final performance evaluation of our algorithm. The basic network structure used in the experiment is LeNet [LeCun et al.1998] as shown in Fig. 11. There are two convolution layers and two fully-connection layers in the network. Similarly, in the Geometric Operator Convolutional Neural Network, the first convolutional layer is replaced by the operator convolutional layer. The convolution kernels from the first layer are composed of trainable Gabor kernels and Schmid kernels. The other convolutional layers are the common convolutional layers.

Figure 10: MINIST
Figure 11: The framework of LeNet
84.96%(94.79%) 44.97%(77.06%)
86.21%(95.17%) 47.03%(77.59%)
82.33%(95.27%) 44.74%(78.26%)
86.36%(95.77%) 49.00%(78.72%)
83.86%(94.44%) 45.93%(78.45%)
85.64%(94.72%) 47.09%(79.50%)
97.75%(99.22%)
97.97%(99.24%)
Table 2: The accuracy of the test sets for the small train set and the large train set (in brackets) as averaged over five experiments

As shown in Tab. 2, from the perspective of the accuracy of MNIST and CIFAR-10/100, after the train set drops to one-fifth of the original train set, the accuracy of the common Convolutional Neural Network falls faster than the Geometric Operator Convolutional Neural Network. Moreover, the Geometric Operator Convolutional Neural Network more accurate than the common Convolutional Neural Network on the original train set. That is to say, the GO-CNN is better at predicting unknown data than the common CNN. The geometric Operator Convolutional Neural Network not only reduces the redundancy of the parameters, but also reduces the dependence on the amount of training samples.

5.3 Adversarial stability

Although the Geometric Operator Convolutional Neural Network reduces the number of trainable parameters, it enhances adversarial stability. The current machine learning model, including the neural network and other models, is vulnerable to attacks from adversarial samples. In addition, Convolutional Neural Network shows instability under attacks against adversarial samples [Goodfellow, Shlens, and Szegedy2014]. Adversarial samples are produced when an attacker misleads a classifier by slightly disturbing the original sample. It is very important to study the stability of adversarial samples in practice. The false alarm rate of existing intelligent video analysis technology is as much as 30% to 60%, which greatly affects the actual application and deployment. For example, the identification system in Tiananmen Square was also removed due to high false alarm rates.

The geometric operator has its own characteristics, and the Schmid operator has rotation invariance. It is worth exploring whether the Geometric Operator Convolutional Neural Network, which is formed by the Schmid operator, enhances the adversarial stability of the adversarial sample when rotated at a certain angle. Geometric operators use domain knowledge and prior knowledge to extract image features. It is worth investigating the Geometric Operator Convolutional Neural Network’s ability to enhance the adversarial stability of the adversarial samples against noise interference. The stability of the model is measured by the difference between the accuracy of the original test set and the adversarial sample generated by the test set.

The open handwriting recognition data set (MNIST) is the primary dataset used in this experiment. The techniques and models are the same as those used for MNIST in Sec. 5.2. Both models are trained on the MNIST train set. original images, adversarial samples of gaussian interference, and adversarial samples from random rotation were used to evaluate the two models.

It can be seen from Tab. 3 that when the test set is randomly rotated within 90 degrees, the difference of the Geometric Operator Convolutional Neural Network is 1.21% lower than that of the common Convolutional Neural Network. This verifies that the Geometric Operator Convolutional Neural Network enhances the adversarial stability of rotated samples. As can be seen from Tab. 4

, when the small Gaussian disturbance (the mean is 0, the standard deviation is 0.3) is applied to the test set, the difference of the Geometric Operator Convolutional Neural Network is 0.6% lower than that of the common Convolutional Neural Network. This indicates that the Geometric Operator Convolutional Neural Network enhances the adversarial stability of Gaussian disturbance adversarial samples. In sum, the Geometric Operator Convolutional Neural Network enhances the adversarial stability of certain adversarial samples.

99.22% 99.24%
58.97% 60.20%
40.25% 39.04%
Table 3: The adversarial stability of rotated samples (the average accuracy over five experiments)
99.22% 99.24%
95.69% 96.31%
3.53% 2.93%
Table 4: The adversarial stability of Gaussian disturbance samples (the average accuracy over five experiments)

5.4 Application

Medical images in China are developing rapidly, but specialist doctors are short of resources, and they are mainly concentrated in big cities and big hospitals. Many small and medium-sized cities do not have sufficient diagnostic imaging capacities, so many patients have to go to big cities in order to access better medical resources and obtain better treatment. Similarly, there are few orthopaedic surgeons in China. Fractures often occur in real life due to accidents, such as falls and car accidents. There are many ways to obtain medical data, such as X-ray images, CT images, MRI images, and representational images; however, orthopedists usually use X-ray images to diagnose fractures. With the development of artificial intelligence technology, many scholars use Convolutional Neural Networks to assist doctors in determining whether a bone image reveals a fracture.

[Chung et al.2018].

Figure 12: The framework of the two-stage method

Doctors usually judge whether a fracture has occurred based on whether there is a fracture line (texture) in the image. In [Cao et al.2015], the texture information from the image is used for an auxiliary diagnosis of a fracture. With prior information from the Schmid operator, we do pre-processing by Schmid operators to enhance the texture information from an image. Then, we use the deep Convolutional Neural Network to conduct classification (as shown in Fig. 12). However, this method, which is preprocessed by geometric operators, can be considered a two-stage method. The parameters of geometric operators are preset by human experience. At this point, it is difficult for the local parameters obtained by the respective optimization to reach the global optimum. Thus, one may consider integrating the preprocessing of geometric operators into the deep network for global parameter learning without prior artificial empirical design parameters. In other words, this would mean using the Geometric Operator Convolution Neural Network proposed in this paper, wherein the convolution kernels from the first layer are all trainable Schmid kernels.

Figure 13: Bone images

Around 2,000 samples from X-rays taken at the Hainan Peoples Hospital were used as the data for the three kinds of intelligent fracture diagnosis models. Each sample was manually divided into bone regions, as shown in Fig. 13, with a total of 5,743 bone regions, including 723 bone fracture regions. The above three models are used for numerical experiments. The basic network framework used in the experiment is ResNet50 [He et al.2016], which mainly consists of a new residual structure unit (Fig. 6). To balance the data during training, the number of fracture patches is increased to 4,016 by rotating the images and changing the background of the images. In the test set, there were 145 fracture patches and 1,004 non-fracture patches. Then, five experiments were conducted to evaluate each model. The stochastic gradient descent optimization algorithm and the finetune strategy were used during the training process, with a batch size of 50. The initial learning rate was 0.001 and the weight decay was 0.0005. The learning rate is reduced by one fifth every 4,000 iterations. Each data class is queued, and the data from each batch is averaged out of each data class during training. We report the performance of our algorithm on the test set after 12,000 iterations based on the average over five runs.

According to Tab. 5, the Geometric Operator Convolutional Neural Network is the most accurate. Moreover, the fracture recall of the two-stage method is 0.77% higher than that of the Convolutional Neural Network, indicating that domain knowledge from the field of medicine is important for intelligent diagnosis. The fracture recall of the Geometric Operator Convolutional Neural Network is 2.21% higher than that of the two-stage method, which indicates that the Geometric Operator Convolutional Neural Network does make use of medical knowledge for fracture diagnosis. The integration of geometric operator into the deep neural network indeed achieve global optimization.

92.38% 93.05% 93.98%
87.97% 88.74% 90.95%
96.57% 96.17% 96.87%
Table 5: Experimental results of intelligent diagnosis

In the above experiments, the Geometric Operator Convolutional Neural Network uses a priori knowledge from the field of medicine and provides a better recognition effect. Although the trainable parameters decrease, GO-CNN still reaches the same approximation accuracy and a slightly lower generalization error upper bound when compared with the common CNN. The features extracted from the Geometric Operator Convolutional Neural Network are more distinguishable, and the Geometric Operator Convolutional Neural Network reduces the dependence on training samples and enhances the adversarial stability of certain adversarial samples. Moreover, the GO-CNN also uses medical knowledge for the practical purpose of assisting in intelligent medical diagnoses of bone fractures.

6 Conclusion and Future Research

In this paper, we present a new framework named the Geometric Operator Convolution Neural Network, where the kernel in the first convolutional layer is replaced with kernels generated by geometric operator functions. This new network boasts several contributions. Firstly, the Geometric Operator Convolution Neural Network is customizable for diverse situations. Other geometric operators may be convolved using the convolution process of the convolution of the Gabor operator and the Schmid operator. Whereas the geometric operator convolution in the GO-CNN can be replaced by different geometric operators, so the GO-CNN is highly versatile. Second, there is a theoretical guarantee in the learning framework of the Geometric Operator Convolutional Neural Network. In this paper, the universal approximation theorem and multiple lemmas are used to prove that GO-CNN reaches the same approximation accuracy and the same generalization error upper bound as the common CNN when certain conditions (i.e., when the training sample is singular) are satisfied. In addtion, through experiments on CIFAR-10/100, we verify that GO-CNN reaches the same approximation accuracy with a smaller generalization error when compared to the common Convolutional Neural Network. Thirdly, the Geometric Operator Convolutional Neural Network reduces the dependence on training samples. The train set and the test set of CIFAR-10/100 and MNIST were exchanged and then re-trained and tested. The Geometric Operator Convolutional Neural Network improved the generalization performance by achieving a higher testing accuracy with the same training loss. In other words, GO-CNN has less dependence on the training samples. Lastly, the Geometric Operator Convolutional Neural Network enhances adversarial stability. Gaussian perturbation and random rotation were performed on the MINIST test set and then tested. The experimental results show that the Geometric Operator Convolutional Neural Network enhances adversarial stability. Furthermore, the GO-CNN improves diagnostic efficiency by offering intelligent medical diagnostic assistance based on domain knowledge acquired from images of bone fractures.

In this paper, only the convolutions of two kinds of geometric operators are considered. In the future, we can explore more submodules suitable for the Geometric Operator Convolutional Neural Network, namely, the convolutions of more geometric operators with better performances. We can explore a more appropriate geometric operator convolution block. In addition, we can analyze the internal relations of the Geometric Operator Convolution Network from the theoretical analysis provided in this paper.

7 Acknowledgments

We thank Professor Zhouwang Yang for his technical guidance. Notably, we would like to express our full thanks to Shiwei Wang and Haikou People’s Hospital for providing medical data.

Appendix A Appendix

Proof of Proposition1:.

Assume that the proposition is not true, then there exist , such that . Thus, if we set , we have , since is a linear operator, which means that according to the condition. Therefore, the assumption is not true, and the conclusion is proved. ∎

Proof of Proposition2:.

Assume that there exists , such that holds for .

We write in the following matrix way:

(14.1)

We define the pixel generator function to be . Then, we have the following equivalence:

(14.2)

We will choose a variety of different parameters to discuss.


I. .

Since , we have , and the following:

(14.3)

We make the following shorthands for conveniency:

(14.4)

From Eqn.14.2, we can get:

(14.5)

The equation above means that, , such that

(14.6)

Differentiate on both sides of parameter and get:

(14.7)

Since Eqn.14.7 holds for , which indicates that:

(14.8)

In the same way, we can get the following equation from Eqn.14.8:

(14.9)

Therefore, we have the following equations:

(14.10)

II. .

In the same way, we have the following equations:

(14.11)

And we can get:

(14.12)

which indicates that:

(14.13)

From Eqn.14.10, we can get:

(14.14)

III. .

We can get the following equations in the way just the same as discussed in situaltion II:

(14.15)

IV. .

We have this time, and we can get the following equations as the way discussed in situaltion II III, :

(14.16)

Combine equations 14.14 14.15 14.16, we can get:

(14.17)

V. .

We have and the following:

(14.18)

Therefore, we have

(14.19)

Combine equations 14.10 14.17 14.19, we can find that , which means that . Therefore, the assumption that is not true.

For an arbitary sized input , we can focus on the sized submatrix that will do inner product with the convolution kernel and get the same conclusion.

Proof of Corollary1:.

From Prop.2, the conclusion is obvious. ∎

Proof of Theorem1:.

Notice that

(15.1)

Apply absolute value on both sides

(15.2)

The last inequality holds as .

We can fix parameters of ordinary CNN, so that there is a mapping between input and output , and the mapping function is as we have defined.

We can also fix parameters of , and choose the convolution kernel of that satisfies (9.0) since is a well-defined Geometric Operator CNN, so that is an injective function, which means that exists. In the same time, can be treated as a one hidden layer neural network.

Define a new hypothesis ranges in , according to Lemma.1, we can find paramters , such that

(15.3)

Replace by we can get

(15.4)

Combine with (15.2), we can get

(15.5)

Proof of Theorem2:.

From Theorem.1, we know that satisfies the following inequality:

(16.1)

From Lemma.3, we know that

(16.2)

Since , we have the following inequality from Lemma.2:

(16.3)

Combined with (16.2), we have

(16.4)

The conclusion is proved!

Proof of Corollary2:.

From Theorem.2 and Corollary.1, this conclusion is obvious. ∎

Proof of Corollary3:.

Let be the set of such that the generator function of is and denote the concatenation of all these as .

Suppose that there exists an input , satisfies that , then . Therefore, holds for any paramters. However, it is conflict with Prop.2.

Therefore, the conclusion is proved! ∎

References

  • [Bartlett and Mendelson2002] Bartlett, P. L., and Mendelson, S. 2002. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3(Nov):463–482.
  • [Calderón, Roa, and Victorino2003] Calderón, A.; Roa, S.; and Victorino, J. 2003. Handwritten digit recognition using convolutional neural networks and gabor filters. Proc. Int. Congr. Comput. Intell.
  • [Cao et al.2015] Cao, Y.; Wang, H.; Moradi, M.; Prasanna, P.; and Syeda-Mahmood, T. F. 2015.

    Fracture detection in x-ray images through stacked random forests feature fusion.

    In Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on, 801–805. IEEE.
  • [Chang and Morgan2014] Chang, S.-Y., and Morgan, N. 2014. Robust cnn-based speech recognition with gabor filter kernels. In Fifteenth annual conference of the international speech communication association.
  • [Chung et al.2018] Chung, S. W.; Han, S. S.; Lee, J. W.; Oh, K.-S.; Kim, N. R.; Yoon, J. P.; Kim, J. Y.; Moon, S. H.; Kwon, J.; Lee, H.-J.; et al. 2018.

    Automated detection and classification of the proximal humerus fracture by using deep learning algorithm.

    Acta orthopaedica 1–6.
  • [Cimpoi, Maji, and Vedaldi2014] Cimpoi, M.; Maji, S.; and Vedaldi, A. 2014. Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836.
  • [Cybenko1989] Cybenko, G. 1989.

    Approximation by superpositions of a sigmoidal function.

    Mathematics of control, signals and systems 2(4):303–314.
  • [Dai et al.2017] Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; and Wei, Y. 2017. Deformable convolutional networks. CoRR, abs/1703.06211 1(2):3.
  • [Dalal and Triggs2005] Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, 886–893. IEEE.
  • [Daugman1985] Daugman, J. G. 1985. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A 2(7):1160–1169.
  • [Daugman1988] Daugman, J. G. 1988. Complete discrete 2-d gabor transforms by neural networks for image analysis and compression. IEEE Transactions on acoustics, speech, and signal processing 36(7):1169–1179.
  • [Gabor1946] Gabor, D. 1946. Theory of communication. part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering 93(26):429–441.
  • [Goodfellow, Shlens, and Szegedy2014] Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  • [Han and Ma2007] Han, J., and Ma, K.-K. 2007.

    Rotation-invariant and scale-invariant gabor features for texture image retrieval.

    Image and vision computing 25(9):1474–1481.
  • [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  • [Hu et al.2014] Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems, 2042–2050.
  • [John, Boyali, and Mita] John, V.; Boyali, A.; and Mita, S. Gabor filter and gershgorin disk-based convolutional filter constraining for image classification.
  • [Jolliffe2011] Jolliffe, I. 2011. Principal component analysis. In International encyclopedia of statistical science. Springer. 1094–1096.
  • [Kingma and Ba2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [Krizhevsky, Sutskever, and Hinton2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105.
  • [Kwolek2005] Kwolek, B. 2005. Face detection using convolutional neural networks and gabor filters. In International Conference on Artificial Neural Networks, 551–556. Springer.
  • [LeCun et al.1998] LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
  • [Lin et al.2017] Lin, T.-Y.; Dollár, P.; Girshick, R. B.; He, K.; Hariharan, B.; and Belongie, S. J. 2017. Feature pyramid networks for object detection. In CVPR, volume 1,  4.
  • [Loshchilov and Hutter2016] Loshchilov, I., and Hutter, F. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.
  • [Lowe1999] Lowe, D. G. 1999. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, 1150–1157. Ieee.
  • [Lu, Wang, and Zhang2018] Lu, T.; Wang, D.; and Zhang, Y. 2018. Fast object detection algorithm based on hog and cnn. In Ninth International Conference on Graphic and Image Processing (ICGIP 2017), volume 10615, 1061509. International Society for Optics and Photonics.
  • [Maaten and Hinton2008] Maaten, L. v. d., and Hinton, G. 2008. Visualizing data using t-sne. Journal of machine learning research 9(Nov):2579–2605.
  • [Mohri, Rostamizadeh, and Talwalkar2012] Mohri, M.; Rostamizadeh, A.; and Talwalkar, A. 2012. Foundations of machine learning. MIT press.
  • [Mounika, Reddy, and Reddy2012] Mounika, B.; Reddy, N.; and Reddy, V. 2012. A neural network based face detection using gabor filter response. International Journal of Neural Networks (ISSN: 2249-2763 & E-ISSN: 2249-2771) 2(1):06–09.
  • [Pfister and Bresler2015] Pfister, L., and Bresler, Y. 2015. Learning sparsifying filter banks. In Wavelets and Sparsity XVI, volume 9597, 959703. International Society for Optics and Photonics.
  • [Rosenfeld1981] Rosenfeld, A. 1981. The max roberts operator is a hueckel-type edge detector. IEEE Transactions on Pattern Analysis and Machine Intelligence (1):101–103.
  • [Sarwar, Panda, and Roy2017] Sarwar, S. S.; Panda, P.; and Roy, K. 2017. Gabor filter assisted energy efficient fast learning convolutional neural networks. arXiv preprint arXiv:1705.04748.
  • [Schmid2001] Schmid, C. 2001. Constructing models for content-based image retrieval. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 2, II–II. IEEE.
  • [van Vliet, Young, and Beckers1989] van Vliet, L. J.; Young, I. T.; and Beckers, G. L. 1989. A nonlinear laplace operator as edge detector in noisy images. Computer vision, graphics, and image processing 45(2):167–195.
  • [Wang et al.2018] Wang, X.; Girshick, R.; Gupta, A.; and He, K. 2018. Non-local neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • [Works] Works, H. I. Sobel edge detector. cse. secs. oakland. edu.
  • [Yao et al.2016] Yao, H.; Chuyi, L.; Dan, H.; and Weiyu, Y. 2016. Gabor feature based convolutional neural network for object recognition in natural scene. In Information Science and Control Engineering (ICISCE), 2016 3rd International Conference on, 386–390. IEEE.