Papers with code. Sorted by stars. Updated weekly.
Neural networks in the real domain have been studied for a long time and achieved promising results in many vision tasks for recent years. However, the extensions of the neural network models in other number fields and their potential applications are not fully-investigated yet. Focusing on color images, which can be naturally represented as quaternion matrices, we propose a quaternion convolutional neural network (QCNN) model to obtain more representative features. In particular, we redesign the basic modules like convolution layer and fully-connected layer in the quaternion domain, which can be used to establish fully-quaternion convolutional neural networks. Moreover, these modules are compatible with almost all deep learning techniques and can be plugged into traditional CNNs easily. We test our QCNN models in both color image classification and denoising tasks. Experimental results show that they outperform the real-valued CNNs with same structures.READ FULL TEXT VIEW PDF
Convolutional neural networks (CNNs) have achieved breakthrough performa...
While convolutional neural networks (CNNs) have recently made great stri...
Convolutional neural networks (CNNs) are the cutting edge model for
It is arguable that whether the single camera captured (monocular) image...
Background: Convolutional Neural Networks can be effectively used only w...
Modern neural network modules which can significantly enhance the learni...
In contrast to fully connected networks, Convolutional Neural Networks (...
Papers with code. Sorted by stars. Updated weekly.
有代码的论文 Papers with code. Sorted by stars. Updated weekly.
As a powerful feature representation method, convolutional neural networks (CNNs) have been widely applied in the field of computer vision. Since the success of AlexNet, many novel CNNs have been proposed, , VGG , ResNet , and DenseNet , etc., which achieved state-of-the-art performance in almost all vision tasks [12, 4, 23]. One key module of CNN model is the convolution layer, which extracts features from high-dimensional structural data efficiently by a set of convolution kernels. When dealing with multi-channel inputs (, color images), the convolution kernels merges these channels by summing up the convolution results and output one single channel per kernel accordingly, as Fig. 1(a) shows.
Although such a processing strategy performs well in many practical situations, it congenitally suffers from some drawbacks in color image processing tasks. Firstly, for each kernel it just sums up the outputs corresponding to different channels and ignores the complicated interrelationship between them. Accordingly, we may lose important structural information of color and obtain non-optimal representation of color image 
. Secondly, simply summing up the outputs gives too many degrees of freedom to the learning of convolution kernels, and thus we may have a high risk of over-fitting even if imposing heavy regularization terms. How to overcome these two challenges is still not fully-investigated.
Focusing on the problems mentioned above, we propose a novel quaternion convolutional neural network (QCNN) model, which represents color image in the quaternion domain. Fig. 1 illustrates the scheme of QCNN model. In particular, each color pixel in a color image (, the yellow dot in Fig. 1) is represented as a quaternion, and accordingly, the image is represented as a quaternion matrix rather than three independent real-valued matrices. Taking the quaternion matrix as the input of our network, we design a series of basic modules, , quaternion convolution layer, quaternion fully-connected layer. While the traditional real-valued convolution is only capable to enforce scaling transformation on the input, specifically, the quaternion convolution achieves the scaling and the rotation of input in the color space, which provides us with more structural representation of color information. Based on these modules, we can establish fully-quaternion CNNs to represent color images in a more effective way. Moreover, we study the relationship between our QCNN model and existing real-valued CNNs and find a compatible way to combine them together in a same algorithmic framework.
Essentially, our QCNN imposes an implicit regularizer on the architecture of network, which ensures that the representations of color image under the guidance of quaternion operations. Such a strategy considers more complicated relationships across different channels while suppress the degrees of freedom of model’s parameters during training. As a result, using quaternion CNNs, we can achieve better learning results with fewer parameters compared with real-valued CNNs. Additionally, a color image is represented as a quaternion matrix in our QCNN, so that we can transform a color pixel throughout the color space using independent and physically-meaningful parameters (, the magnitude and the angle on the color cone shown in Fig. 1(b)), which enhances the interpretability of the model. As Fig. 1
shows, our QCNN preserves more color information than real-valued CNN, which is suitable for color image processing, especially for low-level color feature extraction.Experimental results show that our QCNN model provides benefits for both high-level vision task (, color image classification) and low-level vision task (, color image denoising), which outperforms its competitors.
Quaternion is a kind of hyper complex numbers, which is first described by Hamilton in 1843 and interpreted as points in three-dimensional space. Mathematically, a quaternion in the quaternion domain , , , can be represented as , where for , and the imaginary units , , obey the quaternion rules that .
-dimensional quaternion vector can be denoted as. Similar to real numbers, we can define a series of operations for quaternions:
Scalar multiplication: .
These quaternion operations can be used to represent rotations in a three-dimensional space. Suppose that we rotate a 3D vector to get a new vector , with an angle and along a rotation axis , . Such a rotation is equivalent to the following quaternion operation:
where and are pure quaternion representations of these two vectors, and
Since its convenience in representing rotations of 3-D vectors, quaternion is widely used in mechanics and physics . In recent years, the theory of quaternion-based harmonic analysis has been well developed and many algorithms have been proposed,
, quaternion Fourier transform (QFT), quaternion wavelet transform (QWT) [35, 1]
, and quaternion Kalman filter[39, 2]. Most of these algorithms have been proven to work better for 3D objects than real-valued ones. In the field of computer vision and image processing, quaternion-based methods also show its potentials in many tasks. The advantages of quaternion wavelet transform [17, 1]
, quaternion principal component analysis and other quaternion color image processing techniques 
have been proven to extract more representative features for color images and achieved encouraging results in high-level vision tasks like color image classification. In low-level vision tasks like image denoising and super-resolution, the quaternion-based methods[8, 38] preserve more interrelationship information across different channels, and thus, can restore images with higher quality. Recently, a quaternion-based neural network is also put forward and used for classification tasks [27, 30, 3]. However, how to design a quaternion CNN is still an open problem.
Convolutional neural network is one of the most successful models in many vision tasks. Since the success of LeNet  in digit recognition, great progresses have been made. AlexNet  is the first deep CNN that greatly outperforms all past models in image classification task. Then, a number of models with deep and complicated structures are proposed, such as VGG  and ResNet , which achieve incredible success in ILSVRC . Recently, the CNN models are also introduced for low-level vision tasks. For example, SRCNN  applies convolutional neural networks to image super-resolution and outperforms classical methods. For other tasks like denoising  and inpainting , CNNs also achieve encouraging results.
Some efforts have been made to extend real-valued neural networks to other number fields. Complex-valued neural networks have been built and proved to have advantage on generalization ability  and can be more easily optimized . Audio signals can be naturally represented as complex numbers, so the complex CNNs are more suitable for such a kind of tasks than real-valued CNNs. It has been proven that deep complex networks can obtain competitive results with real-valued models on audio-related tasks . In , a deep quaternion network is proposed. However, its convolution simply replaces the real multiplications with quaternion ones, and its quaternion kernel is not further parameterized. Our proposed quaternion convolution, however, is physically-meaningful for color image processing tasks.
Focusing on color image representation, our quaternion CNN treats a color image as a 2D pure quaternion matrix, denoted as , where represents the size of the image.111Without the loss of generality, we assume that both the width and the height of image are equal to in the following content. In particular, the quaternion matrix is
where , , represent red, green and blue channels, respectively.
Suppose that we have an quaternion convolution kernel . We aim to design an effective and physically-meaningful quaternion convolution operation, denoted as “”, between the input and the kernel . Specifically, this operation should apply rotations and scalings to color vectors in order to find the best representation in the whole color space; play the same role as real-valued convolution when processing grayscale images. To achieve this aim, we take advantage of the rotational nature of quaternion shown in (1,2) and propose a quaternion convolution in a particular form. Specifically, we set the element of the quaternion convolution kernel as
where and . is the gray axis with unit length(, ). As shown in Eq. 2, we want a unit quaternion to perform rotation. Accordingly, the quaternion convolution is defined as
The collection of all such convolution kernels formulates the proposed quaternion convolution layer.
Different from real-valued convolution operation, whose elementary operation is the multiplication between real numbers, the elementary operation of quaternion convolution in (6) actually applies a series of rotations and scalings to the quaternions ’s in each patch. The rotation axis is set as (, grayscale axis in color space) for all operations, while the rotation angle and the scaling factor are specified for each operation by and , respectively.
The advantage of such a definition is interpretable. As shown in Fig. 1(a), the convolution in traditional CNNs operates triple scaling transforms to each pixel independently to walk through three color axes and it needs to find the best representation in the whole color space accordingly. For our QCNNs, one pixel is a quaternion or a 3D vector in color space, but the proposed convolution find its best representation in a small part of the color space because we restrict the convolution to apply only a rotate and a scaling transform. Such a convolution actually impose implicit regularizers on the model, such that we can suppress the risk of over-fitting brought by too many degrees of freedom to the learning of kernels. Additionally, in real-valued CNNs, the input layer transfers 3-channel images to single-channel feature maps, ignoring the interrelationship among channels, which causes information loss. Although the loss can be recovered with multiple different filters, the recovery requires redundant iterations, and there’s no guarantee that the loss can be recovered perfectly. In QCNNs, the convolution causes no order reduction in the input layer, thus the information of interrelationship among channels can be fully conserved.
Although our convolution operation is designed for color image, it can be applied to grayscale image as well. For grayscale images, they can be seen as color images whose channels are the same. Because all the corresponding color vectors are parallel to the gray axis, the rotate transform equals to identical transformation, thus the quaternion convolution performs the same function as real-valued convolution. From this viewpoint, real-valued convolution is a special case of quaternion convolution for grayscale image.
According to the rule of quaternion computations, if we represent each as a 3D vector , then the operation in (6) can be represented by a set of matrix multiplications:
where is a vectorized representation of quaternion , and
The detailed derivation from (6) to (7) is given in the supplementary file. Additionally, because the inputs and outputs of quaternion convolutions are both pure quaternion matrices, quaternion convolution layers can be stacked like what we do in real-valued CNNs and most architectures of real-valued CNNs can also be used in QCNNs. In other words, the proposed quaternion convolution is compatible with traditional real-valued convolution.
According to (7), we can find that a quaternion convolution layer has twice as many parameters as the real-valued convolution layer with same structure and same number of filtering kernels since an arbitrary element of quaternion convolution kernel has two trainable parameters and . Denote as the number of kernels, as kernel size and as the number of input channels. A real-valued convolution layer with kernels has parameters, and we require multiplications to process feature maps. A quaternion layer with kernels has parameters: each kernel has angle parameters and scaling parameters . To process color feature maps, we need multiplications because each output quaternion requires multiplications, as shown in the (7). By reducing the number of the kernels and channels to and , the number of the quaternion layer’s parameters is halved and equal to that of real-valued layer. Since the number of channels in one layer is equal to the number of kernels in the previous layer, by reducing the number of the kernels in all layers with the ratio , we half the number of QCNN’s parameters and half the number of operations to 4.5 times as that of the real-valued CNN. Note that the matrix multiplication in the (7
) can be optimized and parallelized when implemented by Tensorflow. In our experiments, our QCNNs only takes about twice as much time as real-valued CNNs with same number of parameters. According to our following experiments, such a simplification will not do harm to our QCNN model — experimental results show that the QCNNs with comparable number of parameters to real-valued CNNs can still have superior performance.
The quaternion convolution layer mentioned above preserves more interrelationship information and extracting better features than real-valued one. However, if we had to connect it to a common fully-connected layer, that kind of information preserved would be lost. Therefore, here we design a quaternion fully-connected layer that performs same operation as quaternion convolution layer to keep the interrelationship information between channels. Specifically, similar to the real-valued CNNs, whose fully-connected layers can be seen as special cases of one-dimensional convolution layers with kernels having same shapes with inputs, our quaternion fully-connected layers follow the same rule. Suppose that the input is an -dimensional quaternion vector , for . Applying 1D quaternion filtering kernels, , for , we obtain an output with element
where is the magnitude of .
Similar to our quaternion convolution layer, the computation of the proposed quaternion fully-connected layer can also be reformulated as a set of matrix multiplications, and thus, it is also compatible with real-valued CNNs.
Pooling and activation are import layers to achieve nonlinear operations. For our QCNN model, we extend those widely-used real-valued nonlinear layers to quaternion versions. For average-pooling, the average operation of quaternion is same as averaging the 3 imaginary parts respectively. For max-pooling, we can define various criterions such as magnitude or projection to gray axis to judge which element to choose.
In our experiments, we find that simply applying max-pooling to 3 imaginary parts respectively can provides us with good learning results. Similarly, we use same activation functions with real-valued CNNs for each channel respectively in QCNNs. For ReLU, if a vector of quaternion is rotated out of valid value range in color space, e.g. negative color value for RGB channels, we reset it to the nearest point in color space.
Using the modules mentioned above, we can establish arbitrary fully-quaternion CNNs easily. Moreover, because of the compatibility of these modules, we can also build hybrid convolutional neural networks using both quaternion-based layers and common real-valued layers. In particular,
Connect to real-valued convolution layer: The feature map that a quaternion layer outputs can be split into 3 grayscale feature maps, each corresponding to one channel. Then, we can connect each of these three maps to real-valued convolution layers independently, or concatenate them together and connect with a single real-valued convolution layers.
Connect to real-valued fully-connected layer: Similarly, we flatten the output of a quaternion layer and treat each quaternion element as 3 real numbers. Thus, we obtain a real-valued and vectorized output which can be connected to real-valued fully-connected layer easily.
Proper weight initialization is essential for a network to be successfully trained. This principle is also applicable to our QCNN model. According to our analysis above, the scaling factor corresponds to the parameters in real-valued CNNs, which controls the magnitude of transformed vector, while the rotation angle is an additional parameter, which only makes the transformed vector an rotation of input vector. Additionally, when transformed vectors are added together, though the magnitude is affected by , its projection to gray axis is still independent of . Therefore, we follow the suggestion proposed in 
and perform normalized initialization in order to keep variance of the gradients same during training. Specifically, for each scaling factor and each rotation factor of the-th layer, , and
, and we initialize them as two uniform random variables:
represents a uniform distribution, andmeans the dimension of the -th layer’s input.
as the real-valued loss function used to train our quaternion CNN model.and are two pure quaternion variables. For the operation we perform in the QCNN, , , it can be equivalently represented by a set of matrix multiplications. So is the corresponding quaternion gradient. Particularly, we have:
where and are vectors corresponding to and . When and are arbitrary elements of feature maps and filtering kernels, corresponding to and in (7), we have
where , , is defined as (8) does. The matrix of ’s is exactly same as that in (7), but the operation switches from left multiplication to right multiplication. In other words, the backward process can be explained as a rotate transform with the same axis and a reverse angle.
In neural networks, loss and activation functions must be differentiable for the gradient to generate and propagate. For fully-quaternion CNNs, any functions which are differentiable with respect to each part of the quaternion variables also make the quaternion chain rule hold, and thus, can be used as loss (and activation) functions. For hybrid CNNs, we select loss functions according to the category of tasks. In classification tasks, the top of the networks are real-valued fully-connected layers, before which the quaternion inputs are flattened as section 3.4 suggested, and the loss function is cross entropy loss. In other tasks (, regression tasks) that the network outputs images, quaternion outputs of the top layer are regarded as the 3-channel images, and the loss function can be mean square error (MSE) or other similar functions.
|Shallow real network||Cifar-10||0.7546|
|Shallow quaternion network||Cifar-10||0.7778|
|Real-valued VGG-S||102 flowers||0.7308|
|Quaternion VGG-S||102 flowers||0.7695|
|Quaternion VGG-S with fewer filters||102 flowers||0.7603|
To demonstrate the superiority and the universality of our QCNN model, we test it on two typical vision tasks: color image classification and color image denoising. These two tasks represent typical high-level and low-level vision tasks. Compared with real-valued CNN models in these two tasks, our QCNN models show improvements on learning results consistently. Some typical experimental results are shown and analyzed below, and more representative results and details are given in the supplementary file.
We have tested two QCNN architectures in our research, a shallow network for cifar-10 , and a relatively deep one for 102 Oxford flowers . For comparison, real-valued networks with same structure and comparable number of parameters are also trained in the same datasets. Both quaternion and real-valued networks use a real-valued fully-connected layer with softmax function, or a softmax layer to classify the input images. The real-valued networks use ReLU as activation functions, while the quaternion ones adapt ReLU for each imaginary part separately. All those networks are trained with cross entropy loss. Input data is augmented by shifting and flipping.
The proposed shallow network for cifar-10 contains 2 convolution blocks, each has 2 convolution layers and a max-pooling layer, and ends with 2 fully-connected layers. In the experiment, each layer of real-valued CNN and QCNN are of same number of filters, so actually QCNN has more parameters. Both models are optimized using RMSProp
with learning rate set at 0.0001, and learning rate decay set at 1e-6. The training ends at epoch 80.
The network for 102 Oxford flowers is VGG-S , which has 5 convolution layers, 3 pooling layers and 3 fully-connected layers. In this experiment, a QCNN with same number of filters as real-valued one and another one with fewer filters to keep the similar number of parameters are both tested. Models are optimized using Adam  with learning rate set at 0.0001. The training ends at epoch 50.
In Fig. 2, we can find that the performance of our QCNNs is consistently better than that of real-valued CNNs. For each data set, the loss function of our QCNN converges more quickly than that of real-valued CNNs in the training phase and reaches smaller loss finally. The classification accuracy on the testing set obtained by our QCNN is also better than that of real-valued CNN even in the very beginning of training phase. Moreover, even if we reduce the number of QCNN’s parameters, the proposed QCNN model is still superior to the real-valued CNN with the same size. These phenomena verify our claims before. Firstly, although a QCNN can have more parameters than real-valued CNN, it can suffer less from the risk of over-fitting because of the implicit regularizers imposed by the computation of quaternions. Secondly, the quaternion convolution achieves both the scaling and the rotation of inputs in color space, which preserves more discriminative information for color images, and this information is beneficial for classifying color images, especially for classifying those images in which the objects have obvious color attributes (, the flowers in 102 Oxford flower data set). The quantitative experimental results are given in Table 1, which further demonstrates the superiority of our model.
|Model||Dataset||Test PSNR (dB)||Dataset||Test PSNR (dB)|
|Real-valued CNN||102 flowers||30.9792||subset of COCO||30.4900|
|Quaternion CNN||102 flowers||31.3176||subset of COCO||30.7256|
Besides the high-level vision tasks like image classification, the proposed QCNN can also obtain improvements in the low-level vision tasks. In fact, because our QCNN model can obtain more structural representation of color information, it is naturally suitable for extracting low-level features and replacing the bottom convolution layers of real-valued CNNs. To demonstrate our claim, we test our QCNN model in color image denoising task. Inspired from the encoder-decoder networks with symmetric skip connections for image restoration 33], a U-Net-like  encoder-decoder structure with skip connections is used for denoising in our research. The encoder contains two average-pooling layers, each following after two convolution layers, then two convolution layers and a fully-connected layer. The decoder is symmetrical to the encoder, containing up-sampling and transposed convolution layers. The layers before pooling and that after up-sampling are connected by shortcuts. A QCNN and a real-valued CNN with this structure are both built, and the QCNN has fewer filters each layer to ensure a similar number of parameters to the real-valued CNN. Similar to networks for classification, both networks use ReLU as activation functions except the top layer, whose activation function is “tanh” function. Both networks are trained with MSE loss.
We trained and tested these two models on two data sets: the 102 Oxford flower data set and a subset of COCO data set . These two data sets are representative for our research: the flower data set is a case having colorful images, which is used to prove the superiority of our QCNN model conceptually; while the COCO subset is a more general set of natural images, which have both colorful and colorless images and can be used to prove the performance of our model in practice.
In our experiments, both the training and the testing images are cut and resized to pixels with values normalized to . Then a salt and pepper noise which corrupts of pixels and a Gaussian noise with zero mean and
variance are added. The inputs of networks are corrupted images, and target outputs are original images. For both real-valued CNN and our QCNN model, the optimizer is Adam with 0.001 learning rate, and the batch size is 64 for the 102 Oxford flower data set and 32 for the COCO subset, respectively.
Table 2 shows quantitative comparisons for the real-valued CNN model and the proposed QCNN model. We can find that our QCNN model obtains higher PSNR values consistently on both data sets. The change of loss function and that of PSNR on testing set are given in Fig. 3 for the two data sets. Similar to the experiments in color image classification task, the loss function of our QCNN converges more quickly to a smaller value and its PSNR on testing images becomes higher than that of the real-valued CNN after 100 epochs. Furthermore, we show a visual comparison for the denosing results of the real-valued CNN and our QCNN in Fig. 4. We can find that our QCNN preserves more detailed structures in the image (, the pattern on the plate) than the real-valued CNN does. Suffering from information loss during feature encoding, real-valued CNNs cannot perfectly preserve the details of color images, especially when the structure presents sharp color variations. Our QCNN, on the contrary, can avoid this information loss and learn more texture features even in bottom layers, so it outputs images of higher fidelity. High-resolution visual comparisons can be found in the supplementary file.
In the denoising task, QCNN shows at least 0.5dB higher PSNR than CNN for images in (a). For images in (b), CNN offers better result. (c) The quantile-quantile plot of saturation versus PSNR difference. (d) The quantile-quantile plot of average angle between color vectors and gray axis versus PSNR difference.
As aforementioned, our QCNN is motivated for color image representation. When it comes to the images with little variety of colors, our QCNN degrades to a model similar to real-valued CNN,222As we mentioned in section 3.1, for grayscale images, QCNNs perform exactly the same as real-valued CNNs with same number of filters. and thus, obtains just comparable or slightly worse results in the denoising task, which is confirmed on the COCO subset.
In particular, according to the results shown above, we can find that the superiority of our QCNN on the COCO subset is not so significant as that on the 102 Oxford flower data set. To further analysis this phenomenon, we pick up those COCO images for which our QCNN shows great advantage as well as those for which our QCNN shows no advantage in the denoising task, and compare them visually in Fig. 5. We can find that the images on which our QCNN shows better performance are often colorful, while images where our QCNN is inferior to the real-valued CNN are close to grayscale images.
To further investigate QCNN’s advantages, we use two metrics as quantitative descriptions of “colorful images”. The first metric is the mean saturation of color image, denoted as . For an image, a low indicates that this image is similar to a grayscale image, while a high value implies this image is with high color saturation ( many colorful parts). The second metric is the averaged angle between the pixel (color vector) of color image and grayscale axis, denoted as . For an image, the larger the averaged angle is, the colorful the image is. We show the quantile-quantile plots of these two metrics with respect to the difference between PSNR value of real-valued CNN and that of our QCNN (denoted as ) in Fig. 5(c) and Fig. 5(d), respectively. We can find that both and are correlated with positively. It means that our QCNN can show its dominant advantages over real-valued CNNs when the target images are colorful. Otherwise, its performance is almost the same with that of real-valued CNNs.
In this paper, we introduce QCNN, a quaternion-based neural network, which obtains better performance on both color image classification and color image denoising than traditional real-valued CNNs do. A novel quaternion convolution operation is defined to represent color information in a more structural way. A series of quaternion-based layers are designed with good compatibility to existing real-valued networks and reasonable computational complexity. In summary, the proposed model is a valuable extension of neural network model in other number fields. In the future, we plan to explore more efficient algorithms for the learning of QCNNs. For example, as we mentioned in section 4.2, for QCNNs their backpropagation of gradients can be represented by reverse rotations of color vectors with respect to the forward propagation of inputs. Such a property provides us a chance to reduce the computation of the backpropagation given the intermediate information of forward propagation and accelerate the learning of QCNNs accordingly. Additionally, we will extend our QCNN model to large-scale data and more applications.
This work was supported in part by National Science Foundation of China (61671298, U1611461, 61502301, 61521062), STCSM (17511105400, 17511105402, 18DZ2270700), China’s Thousand Youth Talents Plan, the 111 project B07022, the MoE Key Lab of Artificial Intelligence, AI Institute of Shanghai Jiao Tong University, and the SJTU-UCLA Joint Center for Machine Perception and Inference. The corresponding author of this paper is Yi Xu (firstname.lastname@example.org).
Bayro-Corrochano, E., Zhang, Y.: The motor extended kalman filter: A geometric approach for rigid motion estimation. Journal of Mathematical Imaging and Vision13(3), 205–228 (2000)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. pp. 248–255. IEEE (2009)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research9, 249–256 (2010)
Jones, C.F., Abbott, A.L., Chair, R.W., Conners, R.W., Ehrich, I., Jacobs, S., Midkiff: Color face recognition using quaternionic gabor filters (2003)