Log In Sign Up

FFCNet: Fourier Transform-Based Frequency Learning and Complex Convolutional Network for Colon Disease Classification

by   Kai-Ni Wang, et al.

Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequency Complex Network (FFCNet) for colon disease classification in this study. Specifically, FFCNet is a novel complex network that enables the combination of complex convolutional networks with frequency learning to overcome the loss of phase information caused by real convolution operations. Also, our Fourier transform transfers the average brightness of an image to a point in the spectrum (the DC component), alleviating the effects of uneven brightness by decoupling image content and brightness. Moreover, the image patch scrambling module in FFCNet generates random local spectral blocks, empowering the network to learn long-range and local diseasespecific features and improving the discriminative ability of hard samples. We evaluated the proposed FFCNet on an in-house dataset with 2568 colonoscopy images, showing our method achieves high performance outperforming previous state-of-the art methods with an accuracy of 86:35 4.46


page 2

page 8


Efficient Long-Range Convolutions for Point Clouds

The efficient treatment of long-range interactions for point clouds is a...

Frequency learning for image classification

Machine learning applied to computer vision and signal processing is ach...

Global Filter Networks for Image Classification

Recent advances in self-attention and pure multi-layer perceptrons (MLP)...

ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Despite achieving state-of-the-art performance, deep learning methods ge...

Quantum spectral analysis: frequency at time (a lecture)

A quantum time-dependent spectrum analysis, or simply, quantum spectral ...

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

Conventional 3D convolutional neural networks (CNNs) are computationally...

Phase Harmonics and Correlation Invariants in Convolutional Neural Networks

We prove that linear rectifiers act as phase transformations on complex ...

1 Introduction

Accurate classification of early colon lesions plays an important role in diagnosis and treatment [25]. Colorectal cancer is usually diagnosed at an advanced stage due to insignificant early clinical symptoms, resulting in a high mortality rate [10, 5]. In clinical practice, colonoscopy is the most commonly used method to diagnose colorectal lesions [1, 17]. However, manual lesion classification is generally time-consuming and potentially subjective. Therefore, automated classification of colorectal lesions from colonoscopy images is critical in clinical analysis because it: 1) helps physicians determine the type of colonic disease; 2) formulates the most appropriate treatment options; 3) compresses the duration of colonoscopy [12].

Figure 1: Frequency-domain learning has potential and limitations in colon disease classification tasks. a) Potential: Spectrum is brightness-insensitive and displacement-invariant. b) Limitations: Lack of image local information and phase information.

Existing research has achieved progress in classifying colon diseases

[26, 2], but brightness imbalance and location variability remain intractable challenges (Fig.1 a)). The unbalanced illumination from the endoscope probe induces apparent color and brightness differences even in normal images, degrading the classification performance and increasing the difficulty in the model generalization. Some studies rely on data augmentation to resolve brightness imbalances [15]. Wei et al. designed the color exchange operation to force the model to focus more on the target shape and structure by generating images of various colors [23]. However, the significant differences in brightness and color of endoscopic images prevent data augmentation from encompassing all distributions, resulting in poor performance in some distinct regions. On the other hand, the locational variability of intestinal lesions is reflected in their appearance in various regions of the lumen wall. As a result, it is difficult for the network to learn all positional changes on small datasets and is prone to overfitting [11]

. Several works applied transfer learning to alleviate the location variability issues by acquiring features from large datasets

[13, 22]. Nonetheless, different feature distributions between source and target domain in transfer learning can lead to insufficient adaptability in network training.

Frequency learning has great potential for its brightness insensitivity and displacement invariance, which is recognized to improve the ability to discriminate the colon diseases (Fig.1 a)) [18, 24]. As we know, the DC component (only one value) in the spectrum represents the average brightness of the image [3]. This breaks the correlation between image content and brightness, prompting the model to focus more on the target shape and structure. The magnitude map has displacement invariance and remains unchanged with the spatial movement of the time-domain image, which avoids the influence of changes in the lesion location. Moreover, the phase map provides contour and structural information of the image. However, the spectrum obtained by the direct Fourier transform of the image only contains global information, which limits the model to learn the local information of the lesion. Besides, the lack of phase information is caused by the fact that the network only learns from the magnitude spectrum due to the real number operation (Fig.1 b)).

In this study, we propose a novel frequency learning framework (FFCNet) for colon disease classification. Our work has the following contributions: 1) Our method is one of the first to study the automatic classification of colon diseases (normal, polyps, adenomas, cancers) in the whole process. This four-level classification helps colonoscopists to determine the type of lesions accurately and advance the clinical diagnosis of early colorectal cancer. 2) For the first time, we propose a framework that can be trained directly in the frequency domain by combining complex convolutional networks and frequency learning. The convolution kernels, blocks, and architectures in our complex network are modified into complex operations to enable direct learning of the spectrum with complex numbers, thus avoiding the loss of phase features caused by real network operations. Moreover, the spectral brightness insensitivity and displacement invariance have the ability to resolve the uneven brightness and positional variability of time-domain images. 3) We innovatively present an image patch scrambling module embedded in FFCNet to generate local spectrograms. Spectral blocks provide local features of lesions so that the model has stronger discriminative ability of inter-class similarity. Also, through random shuffling operations, spectral patches that appear at different locations in the image will reveal long-range information to the model. 4) The proposed method is competitive against well-known CNN architectures in experiments. This work has also sparked discussions on how classical CNN architectures can exploit spatial and frequency features in solving real-world problems to improve performance.

2 Methodology

FFCNet (Fig.2) is composed of a patch scrambling module and a frequency-domain complex network. The patch scrambling module (Sect.2.1) obtains a complex spectrogram by slicing the time-domain image after the Discrete Fourier Transform (DFT) and then scrambling, which effectively aggregates local information and improves the learning ability of non-local information. The frequency-domain complex network (Sect.2.2

) is capable of handling complex numbers based on the original network architecture. Specifically, replacing convolution, ReLU, batch normalization (BN) with complex convolution, complex ReLU, and complex BN enables the network to calculate complex spectrum to extract richer feature information.

Figure 2: FFCNet is an end-to-end architecture consisting of a patch scrambling algorithm and a frequency complex CNN. (a) An overview of the proposed method. (b) The patch scrambling module which is introduced in Sect.2.1. (c) The frequency-domain complex CNN which is introduced in Sect.2.2.

2.1 Patch Shuffling Module (PSM)

The proposed PSM transforming time-domain images into frequency-domain representations consists of image dicing, DFT, and random shuffling. Compared with the image spectrum without dicing, the network will only learn global information due to the spectral characteristics that a point in the frequency domain can affect the entire image, which neglects the necessary local features for colon classification. Hence, as shown in Fig.2 b), we perform DFT after slicing images to guide the network to focus on recognizable local features. On the other hand, the scrambled spectrum block further improves the long-distance feature learning of the model.

Given an input image I, we first uniformly partition the image into patches denoted by matrix R. denotes an image patch where i and j are the horizontal and vertical indices, respectively (, ). After the dicing, each block will be transformed to the frequency domain. Then, for each image block m, denoted as , with size , the DFT is computed according to the following expression:


Finally, the spectral patch will be randomly shuffled with probability

. Since the neatly arranged spectrum has been corrupted, in order to identify these randomly arranged spectral blocks, the classification network has to find discriminative regions and identify small differences between classes.

Summarized advantages: Our PSM compensates for both local and long-range features of the image while preserving the advantages of the frequency domain. In addition, the local spectrogram has a smaller numerical distribution range than the original spectrogram, which improves the convergence degree of the gradient and speeds up the training of the model.

2.2 Frequency-domain complex network

The proposed frequency learning network can directly learn complex-valued spectrograms through complex operations during training. The backbone architecture of the complex network adopts ResNet [7], and the internal operations are replaced by complex sub-components (complex convolution, ReLU, batch normalization). Each residual block consists of two 3*3 convolutional layers and one connection path. Thus, the proposed network takes into account the advantages of frequency information features and the rich expressive power of complex operations.

Complex convolution. In order to perform an operation equivalent to the traditional real-valued 2D convolution in the complex domain, the real part and the imaginary part of the complex matrix in the spectrogram are respectively input into the network. Meanwhile, two sets of convolution kernels and are inserted to simulate the real and imaginary parts of the complex convolution kernel . Complex convolution can be expressed as:


Where a,b,c,d are all real numbers.

Complex ReLU.

The neural network relies on the ReLU function to introduce nonlinearity to promote the sparsity of the network. The ReLU function sets all negative values in the matrix to zero and does not change the remaining values. The complex ReLU is the addition of the real and imaginary parts after applying ReLU respectively. Complex ReLU satisfies the Cauchy-Riemann equation when both the real and imaginary parts are strictly positive or strictly negative

[21]. The specific formula is as follows:


Complex BN.

BN is often employed to accelerate learning in neural networks. BN forcibly pulls the distribution of the input values of each layer of neural network back to a standard normal distribution with a mean of 0 and a variance of 1. For BN of complex numbers, it is unreasonable to translate and scale it so that it has a mean of 0 and a variance of 1. This normalization does not ensure that the variances of the real and imaginary parts are equal. It will be oval, possibly with high eccentricity. Hence, we treat it as a two-dimensional vector to change the data distribution.

Given a batch input to compute the mean and variance, the normalized is expressed as:


Where is the covariance matrix and is the mean of the data. is a matrix represented as: where and represent the real and imaginary parts of , respectively. Similar to real number normalization, learnable reconstruction parameters and are introduced to restore the feature distribution to be learned by the network. The difference is that the shift parameter is a complex parameter with two learnable components (real and imaginary). The scaling parameter is a

positive semi-definite matrix with only three degrees of freedom. There are three learnable components. The complex BN is defined as:


Summarized advantages: Our elaborate complex CNN implements a full range of frequency-domain analysis, learning both magnitude and phase information in a unified architecture. Not only that, complex convolution, ReLU, and BN maintain the strength of easier optimization of complex numbers, and further improve the expressive ability of frequency features.

3 Experiments and Results

Experiment protocol. 1)Datasets. The study included 3568 standard white-light endoscopic images including 865 normal, 843 polyps, 896 adenomas, and 964 cancers. We randomly split the dataset into training, validation and testing in a 6:2:2 ratio. 2)Settings.

We use version 1.1 of PyTorch


to perform all experiments on NVIDIA TITAN X (PASCAL) GPU machines and evaluate our proposed method on the widely used classification backbone network ResNet-18. Input images are resized to a fixed size of 400×400. Random horizontal and vertical flips were applied for data augmentation. During training, we trained all network by SGD optimizers with a learning rate of 0.1 and a mini-batch size of 32 for 600 epochs. The probability p of patch shuffling was set to 0.3. During testing, data augmentation and patch scrambling algorithms were disabled. The diced spectrum of the original image was fed into the complex classification network for final prediction.

3)Evaluation metrics.

We evaluate the classification performance using four metrics: Accuracy, Precision, Recall and F1-score. More details are in our Supplementary Material.

Accuracy Precision Recall F1-score
ResNet [7] 81.89 81.96 81.89 81.91
MobileNet [8] 81.11 81.14 81.11 81.07
EfficientNet [20] 84.44 84.76 84.44 84.55
DenseNet [9] 84.86 84.86 84.86 84.80
GoogLeNet [19] 85.42 85.72 85.42 85.52
CoAtNet [4] 85.93 86.08 85.93 85.94
Fast [4] 81.57 81.86 81.57 81.60
GFNet [16] 84.81 84.92 84.81 84.86
K-Space [6] 83.28 83.42 83.28 83.24
FFCNet 86.35 86.61 86.35 86.44
Table 1: FFCNet yields higher performance than different classical classification methods on each metric (%).

Comparative experiments show the superiority of our FFCNet: The comparison of our network with several classical classification methods shows great potential for application in colonoscopy classification scenarios. Compared with other methods (Table 1), FFCNet has the highest accuracy (), precision (), recall () and F1-score (). In particular, the accuracy of FFCNet is higher than that of the backbone network ResNet, indicating that the frequency features provide a significant improvement to the architecture.

Figure 3:

The results of the confusion matrix illustrate that FFCNet obtains excellent performance in all categories. The numbers in the confusion matrix represent the percentage of predicted classes.

Comparisons with frequency, complex and transform networks also indicate the superiority of our FFCNet: a joint CNN and transformer network (CoAtNet,

), a CNN network with added frequency modules (Fast, ), a transformer network with added frequency modules (GFNet, ), and other complex networks (K-Space, ). FFCNet outperforms them without requiring pre-training, indicating that the designed model is more efficient.

The results of the confusion matrix demonstrate the superiority of FFCNet in discriminating similarity between classes. Our network obtains excellent performance in all categories Fig.3. Not only that, the network achieves the highest accuracy in the most difficult to distinguish polyps () and adenomas (), respectively. This finding is attributed to our PSM guiding the network to acquire disease-specific information by learning local and long-range features.

Ablation experiments demonstrate the contribution of the proposed module: Fig.4 a) shows the average accuracy of the ablation experiments quantitatively, demonstrating that each submodule contributes to the performance improvement. We first build a baseline model using magnitude maps with ResNet and gradually incorporate each of the submodules discussed in Section 3 into the baseline model. After adding PSM alone, local information is supplemented resulting in an accuracy improvement of and , compared to the baseline and baseline models with complex networks. Incorporating our proposed complex network to the baseline model improved by verifies the ability of complex networks to mine phase information. The last two columns validate the importance of random shuffling significantly improving classification accuracy by learning distance information.

Figure 4:

Ablation experiments and hyperparameter experiments on the test set illustrate the necessity of submodules and the influence of parameters in the network, respectively. In Figure (a), ’C’ and ’PSM’ represent the complex network and the patch scrambling module, respectively. ’PM’ is the PSM removal scrambling.

Hyperparameter experiments analyze the superiority of the architecture: We evaluate the effect of patch number and random scramble probability on the network (Fig.4 b) c)). The slicing operation brings local information of the image, so the accuracy of the network gradually increases as the number of slices grows. However, if the image patch is too small, it will destroy the advantages of frequency domain features and reduce model performance. Likewise, random shuffling allows the network to learn from information over long distances, yet the difficulty of network learning also increases. Therefore, we shuffle the image to stabilize the network performance with a certain probability.

4 Conclusion

In this study, we propose a novel method to advance colon disease classification in colonoscopy images from a frequency domain perspective. The proposed framework, FFCNet, introduces complex convolution operations, enabling the network to directly operate on complex spectra to obtain rich texture features and eliminate the influence of brightness imbalance. Furthermore, the patch scrambling algorithm we developed preprocesses the spectrogram so that the network can learn both long-range and local information. Finally, we compare the performance of the framework with other methods. The results show that our frequency-domain complex number framework is competitive with time-domain models in diagnosing colon diseases.

4.0.1 Acknowledgements

This work was supported by the National Key R&D Program Project (2018YFA0704102).


  • [1] Bibbins-Domingo, K., Grossman, D.C., Curry, S.J., Davidson, K.W., Epling, J.W., García, F.A., Gillman, M.W., Harper, D.M., Kemper, A.R., Krist, A.H., et al.: Screening for colorectal cancer: Us preventive services task force recommendation statement. Jama 315(23), 2564–2575 (2016)
  • [2]

    Carneiro, G., Pu, L.Z.C.T., Singh, R., Burt, A.: Deep learning uncertainty and con- fidence calibration for the five-class polyp classification from colonoscopy. Medical image analysis

    62, 101653 (2020)
  • [3] Chi, L., Jiang, B., Mu, Y.: Fast fourier convolution. Advances in Neural Information Processing Systems 33, 4479–4488 (2020)
  • [4] Dai, Z., Liu, H., Le, Q.V., Tan, M.: Coatnet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems 34, 3965–3977 (2021)
  • [5] Elbediwy, A., Vincent-Mistiaen, Z.I., Spencer-Dene, B., Stone, R.K., Boeing, S., Wculek, S.K., Cordero, J., Tan, E.H., Ridgway, R., Brunton, V.G., et al.: Integrin signalling regulates yap and taz to control skin homeostasis. Development 143(10), 1674–1687 (2016)
  • [6] Han, Y., Sunwoo, L., Ye, J.C.: k-space deep learning for accelerated mri. IEEE transactions on medical imaging 39(2), 377–386 (2019)
  • [7]

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  • [8] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An- dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  • [9] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017)
  • [10] Ladabaum, U., Dominitz, J.A., Kahi, C., Schoen, R.E.: Strategies for colorectal cancer screening. Gastroenterology 158(2), 418–432 (2020)
  • [11] Liu, X., Guo, X., Liu, Y., Yuan, Y.: Consolidated domain adaptive detection and localization framework for cross-device colonoscopic images. Medical image analy- sis 71, 102052 (2021)
  • [12] Mármol, I., Sánchez-de-Diego, C., Pradilla Dieste, A., Cerrada, E., Rodriguez Yoldi, M.J.: Colorectal carcinoma: a general overview and future perspectives in colorectal cancer. International journal of molecular sciences 18(1), 197 (2017)
  • [13]

    Misawa, M., Kudo, S.e., Mori, Y., Cho, T., Kataoka, S., Yamauchi, A., Ogawa, Y., Maeda, Y., Takeda, K., Ichimasa, K., et al.: Artificial intelligence-assisted polyp detection for colonoscopy: initial experience. Gastroenterology

    154(8), 2027–2029 (2018)
  • [14] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high- performance deep learning library. Advances in neural information processing sys- tems 32 (2019)
  • [15] Qadir, H.A., Shin, Y., Solhusvik, J., Bergsland, J., Aabakken, L., Balasingham, I.: Toward real-time polyp detection using fully cnns for 2d gaussian shapes prediction. Medical Image Analysis 68, 101897 (2021)
  • [16] Rao, Y., Zhao, W., Zhu, Z., Lu, J., Zhou, J.: Global filter networks for image classification. Advances in Neural Information Processing Systems 34, 980–993 (2021)
  • [17] Rex, D.K., Boland, C.R., Dominitz, J.A., Giardiello, F.M., Johnson, D.A., Kaltenbach, T., Levin, T.R., Lieberman, D., Robertson, D.J.: Colorectal cancer screening: recommendations for physicians and patients from the us multi-society task force on colorectal cancer. Gastroenterology 153(1), 307–323 (2017)
  • [18] Stuchi, J.A., Boccato, L., Attux, R.: Frequency learning for image classification. CoRR abs/2006.15476 (2020)
  • [19] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)
  • [20]

    Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR (2019)

  • [21] Trabelsi, C., Bilaniuk, O., Serdyuk, D., Subramanian, S., Santos, J.F., Mehri, S., Rostamzadeh, N., Bengio, Y., Pal, C.J.: Deep complex networks. CoRR abs/1705.09792 (2017)
  • [22] Wang, Y., Feng, Z., Song, L., Liu, X., Liu, S.: Multiclassification of endoscopic colonoscopy images based on deep transfer learning. Computational and Mathe- matical Methods in Medicine 2021 (2021)
  • [23] Wei, J., Hu, Y., Zhang, R., Li, Z., Zhou, S.K., Cui, S.: Shallow attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 699–708. Springer (2021)
  • [24] Xu, K., Qin, M., Sun, F., Wang, Y., Chen, Y.K., Ren, F.: Learning in the frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1740–1749 (2020)
  • [25] Zhang, R., Zheng, Y., Mak, T.W.C., Yu, R., Wong, S.H., Lau, J.Y., Poon, C.C.: Automatic detection and classification of colorectal polyps by transferring low- level cnn features from nonmedical domain. IEEE journal of biomedical and health informatics 21(1), 41–47 (2016)
  • [26] Zhang, R., Zheng, Y., Poon, C.C., Shen, D., Lau, J.Y.: Polyp detection during colonoscopy using a regression-based convolutional neural network with a tracker. Pattern recognition 83, 209–219 (2018)