Alpha-Pooling for Convolutional Neural Networks

11/08/2018 ∙ by Hayoung Eom, et al. ∙ 0

Convolutional neural networks (CNNs) have achieved remarkable performance in many applications, especially image recognition. As a crucial component of CNNs, sub-sampling plays an important role, and max pooling and arithmetic average pooling are commonly used sub-sampling methods. In addition to the two pooling methods, however, there could be many other pooling types, such as geometric average, harmonic average, and so on. Since it is not easy for algorithms to find the best pooling method, human experts choose types of pooling, which might not be optimal for different tasks. Following deep learning philosophy, the type of pooling can be driven by data for a given task. In this paper, we propose alpha-pooling, which has a trainable parameter α to decide the type of pooling. Alpha-pooling is a general pooling method including max pooling and arithmetic average pooling as a special case, depending on the parameter α. In experiments, alpha-pooling improves the accuracy of image recognition tasks, and we found that max pooling is not the optimal pooling scheme. Moreover each layer has different optimal pooling types.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has achieved remarkable performance in many applications, especially in image related tasks [1, 2, 3]. In image recognition, convolutional neural networks (CNNs) are heavily used [4], based on a few components like convolutional layers and pooling (or sub-sampling) layers. Pooling layers not only reduce the size of the feature-map but also extract features that are more robust against position or movement for object recognition [4].

Despite of its importance, to our best knowledge, max pooling and arithmetic average pooling are often selected for the pooling layers without much consideration. Max pooling selects the highest value in the pooling window, and arithmetic average pooling takes the arithmetic average in the window area. However, the two pooling methods are not optimal [5]. The arithmetic average pooling degrades the performance in CNNs by losing crucial information in strong activation values. Also, max pooling has a problem by ignoring all information except the largest value.

In addition to max and arithmetic average, there are many other average methods including geometric and harmonic average. Then, how can we find the best average method? The approach of human selection is not in line with the philosophy of deep learning. Moreover, it is not practically possible to find a proper average method for the layers whenever the network architecture is reshaped. This might be the fundamental reason why diverse pooling is not used in practice. In order to avoid such limitations, it is desirable to find an optimal average method for pooling layers automatically from training data.

On the other hand, as a general data integration framework, -integration was proposed [6]. -integration integrates positive values, and the characteristics of integration is determined by the parameter . It finds out the optimal integration of the input values in the sense of minimizing the -divergence. Many average models such as the mixture (or product) of experts model [7, 8] can be considered as special cases of -integration [6]. In addition, a training algorithm was proposed to find the best value from training data for a given tasks [9].

In this paper, we propose a new pooling algorithm, alpha-pooling which applies -integration to the pooling layers. Alpha-pooling finds the optimal

values for the pooling layers automatically from training data by backpropagation, since

values for the layers are parameters like other parameters (i.e., weights or bias of the network). So, when we need sub-sampling, we do not have to predefine a specific pooling type. With alpha-pooling, the model finds the optimal pooling from training data for the task.

In experiments, alpha-pooling improves significantly the accuracy of image recognition. In other words, max pooling is not the best pooling method. After training models, we found that the layers have different values, which means optimal average types for layers should be different.

The rest of the paper is organized as follows. In section 2, we briefly review -integration and the pooling methods in CNNs. In section 3, we propose alpha-pooling. In section 4, experiment results confirm our method, followed by conclusion in section 5.

2 Background

In this section, we briefly review -integration [6] and pooling methods in CNNs.

2.1 -Integration

Given two positive measures of random variable

, and , -means is defined by


where is a differential monotone function given by


-mean includes many means as a special case. That is, arithmetic mean, geometric mean, harmonic mean, minimum, and maximum are specific cases of

-mean with or , respectively, as shown in Fig. 1. As decreases, the -mean approaches to the larger of or , and when , -mean behaves as the max operation.

Figure 1: -integration of two values, 1 and 2. As increases or decreases, the -integration value decreases or increases monotonically and converges 1 or 2, respectively.

Given positive values , -mean can be generalized to -integration, which is defined by


where we assume that the values have the same weights.

In most existing works on -integration [6, 10], the value of is given in advance rather than learned. To find the optimal value automatically based on training data, a gradient descent algorithm was proposed for a given task [9].

2.2 Max Pooling and Average Pooling

CNNs are composed of convolutional layers, nonlinear function and pooling layers. Convolutional layers extract patterns from local regions of the input images [4]. Filters in convolution layers have high values when each portion of input image matches to the feature. Then, nonlinear function

is applied to the values, which is ReLU for the most cases. The output of the nonlinear function moves to the pooling layer. Pooling provides positional invariance to the feature, which becomes less sensitive to the precise locations of structures within the image than the original feature maps. This is a crucial transformation for classification tasks.

Despite pooling is an important component for CNNs, max pooling (or sometimes arithmetic average pooling) is selected for the most cases without much consideration. Max pooling chooses only the highest value in the pooling window. Arithmetic average pooing chooses the arithmetic average of all values in the window area. However, it is not guaranteed that the two pooling methods are perfect for all the times.

In general, the arithmetic average pooling degrades the performance in CNNs when ReLU is used. Averaging ReLU activation values reduces the high values which might be crucial information, because many zero elements are included in the average. The use of makes the problems worse by averaging out strong positive and negative activation values with mutual cancellation of each other. Although max pooling does not suffer from this problem, it has another problem. Max pooling gives the weight 1 only to the largest value, but gives weights 0 to the other values. That is, it ignores all information except the largest value. Moreover, max pooling easily overfits to the training dataset [5]. Thus, there have been many attempts to find better pooling methods [5, 11, 12, 13, 14].

3 Alpha Pooling

To find an optimal average method for pooling layers automatically from training data, we propose a generalized pooling method, alpha-pooling which applies -integration to the pooling layers. We treat the alpha-pooling’s value as a parameter like other parameters (i.e., weights or bias of the network model), so that can be trained by backpropagation, although we can train in different ways from other network parameters.

For using

-integration to the pooling layers, we should meet one constraint: all input values to alpha-pooling must be positive. However, this constraint is not a big problem because CNNs use ReLU as an activation function for the most cases. There are no negative values in the output of ReLU. Now, we just need to be careful to avoid zeros, because it is impossible to calculate

-integration when zero is included. Therefore, we slightly revise the ReLU function by adding to the output of ReLU, which leads to a new activation function, as follows.


where is a small number, which is set to in our experiments.

After applying , we -integrate the activation values with current , assuming all the values are positive. Fig. 2 shows an example of how alpha-pooling works. With different values, the output of the pooling layer is different. Note that when , the integration works as the max operation.

Now, with alpha-pooling, the model can find the optimal pooling from training data for the task. All pooling layers can share single value, or each layer can have different pooling type with a different value by training.

Figure 2: Example of alpha-pooling results of four positive values: 2, 1, 1, and 8, for a 22 pooling window. The outputs of the pooling are different depending on the value.

4 Experiments

We present experiment results with different models on two datasets: MNIST and CIFAR10. Experiment results confirm that alpha-pooling outperforms max pooling. It implies that max pooling is not the optimal pooling type. Also, layer have different optimal pooling policies which means there is no single optimal pooling type.

4.1 Model


(a)                            (b)


(d)                            (c)

Figure 3: Experiment models: (a,b) simple CNN models with max pooling or alpha-pooling, respectively, (c) the VGG model with 5 max pooling layers, and (d) a variant of the VGG model with alpha-pooling.

As shown in Fig. 3, we take two CNN models for experiments. First, we have set a simple CNN model to minimize impact of other techniques and to confirm the impact of alpha-pooling on image recognition. This model consists of two convolutional layers and two pooling layers. Second, we take the VGG model to check whether alpha-pooling works well in complex models. The VGG model has 5 max pooling layers.

4.2 Data

To evaluate different pooling methods, we train CNN models described above on the MNIST and CIFAR10 datasets. MNIST includes hand written digit images of 10 classes (0-9 digits). MNIST splits into two sets: training(60K images) and testing(10K images). The CIFAR10 dataset includes images of 10 classes, which has 50K training image and 10K testing images.

4.3 Results

Since models with alpha-pooling find optimal pooling type, we can observe meaningful discoveries based on the values. First, max pooling may not be the optimal pooling method. As presented in Table. 1, alpha-pooling outperforms max pooling for image recognition tasks. In other words, alpha-pooling with a specific value is more optimal for the given tasks, compared to the previous max pooling method. If max-pooling is optimal for our experiment models, the value should converge to . However, in Fig. 4, all values are converging to some values between -10 and 0, and not going down to make alpha-pooling to be max pooling.

Accuracy (%)
Datasets Models Max-pooling alpha-pooling
MNIST CNNs 99.37 99.49
CIFAR10 CNNs 72.52 74.07
CIFAR10 VGG 92.38 93.71
Table 1: Performance of the pooling methods on MNIST and CIFAR10.



Figure 4: Training curves for

values with training epochs. (a) Two

values for the two alpha-pooling layers in the simple CNN model, and (b) Five values for the five alpha-pooling layers in the VGG model.

In addition, Fig. 4 shows that the values for different layers converge to different values. This implies that each layer has different optimal pooling, because its role is different. Also, it implies that there is no single optimal pooling type.

5 Conclusion

In this paper, we questioned about the pooling methods, which find different optimal representative value within window. Then, we proposed alpha-pooling to include the previous pooling methods as special cases. The parameter of alpha-pooling is trainable from training data, and the converged value determines the best pooling type automatically.

Experiment results confirm that alpha-pooling improves performance, implying max pooling is not the optimal. Also, pooling layers have different values, so that there is no single optimal pooling type for all cases. As future works, we can analyze the meaning of different values in detail.

6 Acknowledgement

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT)(No.2018-0-00749,Development of virtual network management technology based on artificial intelligence) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1B03033341).