I Introduction
Road damages in the form of cracks may reduce the road performance and pose potential road safety hazards [1]. Every year, government bodies across the globe allocate funds to enhance the quality of their road networks [2]. Road safety should be taken very seriously and authorities are fully aware of the need for suitable road inspection and maintenance techniques [3]. Crack detection is an essential part of road maintenance systems, and it has attracted growing interest from researchers in this field over the past few years [4]. Traditional manual road crack detection approaches are known to be very timeconsuming, dangerous, laborintensive and subjective [5]. Therefore, the slow and subjective traditional methods have been gradually replaced by automated crack detection systems which provide fast and reliable analysis in intelligent transportation systems (ITS) [6]. Automated crack detection systems can effectively assess the quality of the road surfaces and help governments plan and prioritize the maintenance of the road network, thereby keeping the roads in good condition and extending their service life [7].
With the development of image analysis techniques, road crack detection and recognition have been widely investigated over the past few decades [8, 9, 10, 11]. The traditional framework for crack detection consists of defining a variety of gradient features using gradient filters, such as Sobel [12, 13], for each image pixel, and then using a binary classifier to determine whether an image pixel is part of a crack region or not [8]. In early methods, such as [14] and [15], the authors used thresholdbased approaches to find crack regions based on the assumption that a pixel lying in a crack area is consistently darker than others [7, 16]. Furthermore, many researchers [10, 11, 17, 18]
tried to suppress the inference of noise by considering additional local features, such as the mean and the standard deviation of an image region. However, these methods are still very sensitive to noise because only the brightness features are taken into consideration.
In recent years, some novel algorithms, such as minimal path selection (MPS) [19, 20, 21], minimum spanning tree (MST) [22, 23] and crack fundamental element (CFE) [24, 25], have been proposed to improve the existing crack detection approaches. In addition, Hu and Zhao [26] proposed a crack detection algorithm based on local binary patterns (LBP), whereas the authors of [27] utilized Gabor filter for the same purpose. In [22], an automated crack detection algorithm based on a tree structure, referred to as CrackTree, was introduced. Moreover, Oliveira et al. [7, 28] utilized a comprehensive set of image analysis algorithms to detect and characterize cracks from road pavements. Although the abovementioned algorithms have been widely used in crack detection and they perform well on highquality datasets [22, 28, 29], it is important to note that these algorithms are not accurate enough to distinguish cracks from the complex background in lowquality images.
Furthermore, some machine learningbased crack detection approaches
[30, 31, 32, 33, 34, 35] have been proposed in recent years, and the features produced by neural network are very likely to replace the local features utilized in traditional methods [36]. For example, restricted Boltzmann machine (RBM) anto encoder and their variants are capable of detecting cracks, when the training samples are limited
[37]. In addition, deep convolutional neural networks (DCNNs) are popular for featurelearning and supervised classification [38]. Zhang et al. [36] trained a neural network to determine whether the patches in an images contain cracks or not. Hence in this paper, we build on the recent successful application of deep neural network to image classification and train a convolutional neural network (CNN) to find the images that contain cracks. Then, we present a novel thresholding method to extract cracks from classified color images.The remainder of this paper is structured as follows: Section II introduces the proposed crack detection algorithm. In Section III, we present our experimental results and discuss the performance of the proposed method. Finally, Section IV concludes the paper and provides some recommendations for future work.
Ii Methodology
The proposed crack detection method consists of two steps: image classification and image segmentation. For notational convenience, images showing the presence and absence of cracks are referred to as positive and negative images, respectively. Firstly, an image is classified as either positive or negative using a deep convolutional neural network. The positive images are then processed using an adaptive thresholding method. The cracks in the positive images can therefore be extracted. The rest of this section gives a detailed description of these two steps.
Iia Image Classification
The structure of the proposed deep convolutional neural network is shown in Fig. 1
, where ReLU represents a rectified linear unit, which is the most popular activation function for deep neural networks, due to its better performance than both sigmoid function and hyperbolic tangent function in terms of training and evaluation
[38]. A CNN is generally considered as a hierarchical feature extractor [36]. A convolutional layer performs a convolution operation on the image input and passes the extracted features to the next layer [38]. Batch normalization is then performed on the output of the convolutional layer, whereby the extracted features are normalized by adjusting and scaling the activations.
[39]. The structure of the green block in Fig. 1 is shown in Fig. 2. Max pooling downsamples the input representations
[38], whereas the softmax function translates a vector into a probability distribution. Finally, a fully connected layer computes the score of each class and infers the category of the input image
[36]. Therefore, the proposed network is also referred to as a fully connected network (FCN). More details on the training process are provided in Section III.IiB Image Segmentation
Since the images have already been classified using our proposed deep neural network, only the positive images are considered for processing in this subsection. Before performing image segmentation, we first utilize a bilateral filter [40, 41] to smooth the input images. Bilateral filter outperforms other image filters in terms of edge preservation [40]. A general expression for bilateral filtering is as follows:
(1) 
where
(2) 
(3) 
represents the intensity of a pixel at in the input image. denotes the intensity of a pixel at in the filtered image. and are based on spatial distance and color similarity, respectively. Their values are controlled by two parameters and , respectively. In our experiments, the values of and are set to 300 and 0.1, respectively. is set to 5. The filtered image is shown in Fig. 3.
To further reduce noise in the filtered image, the latter is downsampled as shown in Fig. 4. The downsampled image is approximately nine times smaller than the original filtered image and it is utilized as the threshold for image segmentation. It is to be noted that the intensity of a pixel in the downsampled image is normalized.
The proposed thresholding method hypothesizes that the downsampled image is composed of two parts: foreground (cracks) and background (road surface), and they can be separated using one threshold . Furthermore, we assume that a pixel lying in a crack area is consistently darker than others. To find the best threshold , we formulate the thresholding problem as a 2D vector quantization problem, where each pixel and its neighborhood system provide a vector , where represents the intensity of , and denotes the mean intensity of . The vectors are stored in a 2D histogram, as shown in Fig. 5. The relationship between and is as follows:
(4) 
where dictates the size of the neighborhood system. The threshold can be determined by partitioning the vectors into two clusters , where and correspond to the foreground and background, respectively.
According to the Markov Random Field [42], the intensity of a pixel which is not located near the boundary between foreground and background is similar to those of its neighbors in all directions. Hence, we search for the threshold along the principal diagonal of the 2D histogram using mean clustering [43]. Given a threshold , the 2D histogram can be divided into four regions, as shown in Fig. 5. Regions 1 and 2 store the vectors of foreground and background, respectively. On the other hand, regions 3 and 4 store the noisy vectors. In our method, the vectors in regions 3 and 4 are not considered into the clustering process. The best threshold is computed by minimizing the withincluster sum squares as follows:
(5) 
where denotes the mean of the vectors in . The withincluster sum squares with respect to different are shown in Fig. 5, and the corresponding segmentation result is shown in Fig. 3, where the crack and road surface are in white and black, respectively. The performance of the crack detection algorithm is evaluated in Section III.
Iii Experimental Results
In our experiments, the proposed deep neural network is trained on an NVIDIA GTX 1080 Ti GPU^{1}^{1}1https://www.nvidia.com/enus/geforce/products/10series/geforcegtx1080ti/, which has 3584 CUDA cores and 11 GB GDDR5X memory. The GPU memory bandwidth is 484 GB/s. The training is implemented on Matlab R2018b. Our trained neural network is publicly available at: https://github.com/ruirangerfan/road_crack_detection_net.git.
The dataset^{2}^{2}2http://dx.doi.org/10.17632/5y9wdsg2zt.1 utilized for training the proposed network was created by the researchers from Middle East Technical University. The dataset contains 40000 RGB images (resolution: 227227). The number of positive and negative images are both 20000.
In our practical experiments, we randomly select 15000 positive images and 15000 negative images from the dataset, to train the neural network. The rest of the images are utilized to evaluate the performance of the proposed approach. The initial learning rate, the maximum number of epochs and the validation frequency are set to 0.01, 16 and 60, respectively. The stochastic gradient descent with momentum (SGDM) is utilized as the optimizer, and the value of momentum is set to 0.9.
To quantify the accuracy of our proposed image classier, we compute , , and , which represent the number of testing images that are true positive, false positive, false negative and true negative, respectively. The precision, recall, accuracy and Fmeasure can be computed using the following equation:
(6) 
(7) 
(8) 
(9) 
The values of precision, recall, accuracy and Fmeasure achieved using the proposed network are (the number of false positive images and false negative images are both four). The true positive and true negative examples are shown in Fig. 6, while the false positive and false negative results are shown in Fig. 7. The image classification takes about 4.8 ms on an Intel Core i78700K CPU processed with a single core (3.7 GHz).
Furthermore, we evaluate the performance of the image segmentation at pixel level. Some experimental results of image filtering and segmentation are shown in Fig. 8. Since the dataset we use does not contain pixellevel ground truth, we manually label the crack areas in a set of images and use these ground truth images to quantify the performance of our proposed segmentation method. Moreover, we compare our method with Otsu’s thresholding method [44] with respect to pixellevel precision, recall, accuracy and Fmeasure. The notations , , and in (6), (7) and (8) represent the number of pixels that are true positive, false positive, false negative and true negative, respectively. The comparison between these two methods is shown in Table I. It is to be noted here that the crack areas with less than 100 pixels are ignored in our experiments. From Table I, it can be observed that our method outperforms Otsu’s thresholding method in terms of precision, accuracy and Fmeasure, and the proposed segmentation method achieves the best performance when is set to 1.
Method  Precision  Recall  Accuracy  Fmeasure 
Otsu’s thresholding  0.9590  0.9339  0.9848  0.9462 
Proposed ()  0.9774  0.9331  0.9870  0.9548 
Proposed ()  0.9854  0.9246  0.9867  0.9541 
Proposed ()  0.9967  0.9046  0.9848  0.9484 
Proposed ()  0.9955  0.8952  0.9831  0.9427 
Iv Conclusion and Future work
A novel crack detection approach was proposed in this paper. The main novelties include a fully connected neural network for image classification and a mean clustering based image segmentation algorithm. Firstly, our neural network classified the input images as either positive (crack present) or negative (crack absent). The positive images were then processed using a bilateral filter, which not only minimized the number of noisy pixels but also preserved the edges between the cracks and road surface. Finally, the filtered images were downsampled, and an adaptive threshold was computed by minimizing the withincluster sum squares. The cracks can therefore be detected by segmenting the filtered images using the adaptively determined threshold. The experimental results showed that the precision of the image classification is and the pixellevel segmentation accuracy is around .
Although the proposed image segmentation algorithm performs better than Otsu’s thresholding method in terms of distinguishing between foreground (cracks) and background (road surface), some color images with a large number of noisy pixels cannot be properly segmented. Therefore, as a future work, a deep neural network can be trained to segment the positive images into a set of semantically meaningful regions, i.e., cracks and road surface.
Acknowledgment
This work is supported by grants from the Research Grants Council of the Hong Kong SAR Government, China (No. 11210017 and No. 21202816) awarded to Prof. Ming Liu. This work is also supported by grants from the Shenzhen Science, Technology and Innovation Commission, JCYJ20170818153518789, and National Natural Science Foundation of China (No. 61603376) awarded to Dr. Lujia Wang.
References
 [1] R. Fan, Y. Liu, X. Yang, M. J. Bocus, N. Dahnoun, and S. Tancock, “Realtime stereo vision for road surface 3d reconstruction,” in 2018 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 2018, pp. 1–6.
 [2] H. Oliveira and P. L. Correia, “Automatic road crack segmentation using entropy and image dynamic thresholding,” in Signal Processing Conference, 2009 17th European. IEEE, 2009, pp. 622–626.

[3]
R. Fan, X. Ai, and N. Dahnoun, “Road surface 3D reconstruction based on dense subpixel disparity map estimation,” vol. 27, no. 6, pp. 3025–3035, Jun. 2018.
 [4] T. S. Nguyen, M. Avila, and S. Begot, “Automatic detection and classification of defect on road pavement using anisotropy measure,” in Signal Processing Conference, 2009 17th European. IEEE, 2009, pp. 617–621.
 [5] R. Fan, “Realtime computer stereo vision for automotive applications,” Ph.D. dissertation, University of Bristol, 2018.
 [6] R. Fan, J. Jiao, J. Pan, H. Huang, S. Shen, and M. Liu, “Realtime dense stereo embedded in a uav for road inspection,” arXiv preprint arXiv:1904.06017.
 [7] H. Oliveira and P. L. Correia, “Automatic road crack detection and characterization,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 155–168, 2013.
 [8] H. Oh, N. W. Garrick, and L. E. Achenie, “Segmentation algorithm using iterative clipping for processing noisy pavement images,” in Imaging Technologies: Techniques and Applications in Civil Engineering. Second International ConferenceEngineering Foundation; and Imaging Technologies Committee of the Technical Council on Computer Practices, American Society of Civil Engineers., 1998.
 [9] M. Petrou, J. Kittler, and K. Song, “Automatic surface crack detection on textured materials,” Journal of materials processing technology, vol. 56, no. 14, pp. 158–167, 1996.
 [10] Y. Huang and B. Xu, “Automatic inspection of pavement cracking distress,” Journal of Electronic Imaging, vol. 15, no. 1, p. 013017, 2006.
 [11] M. Gavilán, D. Balcones, O. Marcos, D. F. Llorca, M. A. Sotelo, I. Parra, M. Ocaña, P. Aliseda, P. Yarza, and A. Amírola, “Adaptive road crack detection system by pavement classification,” Sensors, vol. 11, no. 10, pp. 9628–9657, 2011.
 [12] U. Ozgunalp, R. Fan, X. Ai, and N. Dahnoun, “Multiple lane detection algorithm based on novel dense vanishing point estimation,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 3, pp. 621–632, 2017.
 [13] R. Fan, V. Prokhorov, and N. Dahnoun, “Fasterthanrealtime linear lane detection implementation using soc dsp tms320c6678,” in 2016 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 2016, pp. 306–311.
 [14] M. S. Kaseko and S. G. Ritchie, “A neural networkbased methodology for pavement crack detection and classification,” Transportation Research Part C: Emerging Technologies, vol. 1, no. 4, pp. 275–291, 1993.
 [15] Q. Li and X. Liu, “Novel approach to pavement image segmentation based on neighboring difference histogram method,” in Image and Signal Processing, 2008. CISP’08. Congress on, vol. 2. IEEE, 2008, pp. 792–796.
 [16] R. Fan, M. J. Bocus, and N. Dahnoun, “A novel disparity transformation algorithm for road segmentation,” Information Processing Letters, vol. 140, pp. 18–24, 2018.
 [17] N. Tanaka and K. Uematsu, “A crack detection method in road surface images using morphology.” MVA, vol. 98, pp. 17–19, 1998.
 [18] H. Oliveira and P. L. Correia, “Supervised strategies for cracks detection in images of road pavement flexible surfaces,” in Signal Processing Conference, 2008 16th European. IEEE, 2008, pp. 1–5.
 [19] J. Jiao, R. Fan, H. Ma, and M. Liu, “Using dp towards a shortest path problemrelated application,” arXiv preprint arXiv:1903.02765, 2019.
 [20] M. Avila, S. Begot, F. Duculty, and T. S. Nguyen, “2d image based road pavement crack detection by calculating minimal paths and dynamic programming,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 783–787.
 [21] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack detection on 2d pavement images: An algorithm based on minimal path selection, accepted to ieee trans,” Intell. Transp. Syst, 2015.
 [22] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “Cracktree: Automatic crack detection from pavement images,” Pattern Recognition Letters, vol. 33, no. 3, pp. 227–238, 2012.
 [23] K. Fernandes and L. Ciobanu, “Pavement pathologies classification using graphbased features,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 793–797.
 [24] Y.C. Tsai, C. Jiang, and Y. Huang, “Multiscale crack fundamental element model for realworld pavement crack classification,” Journal of Computing in Civil Engineering, vol. 28, no. 4, p. 04014012, 2012.
 [25] Y. J. Tsai, C. Jiang, and Z. Wang, “Implementation of automatic crack evaluation using crack fundamental element,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 773–777.
 [26] Y. Hu and C.x. Zhao, “A novel lbp based methods for pavement crack detection,” Journal of pattern Recognition research, vol. 5, no. 1, pp. 140–147, 2010.
 [27] M. Salman, S. Mathavan, K. Kamal, and M. Rahman, “Pavement crack detection using the gabor filter,” in 16th international IEEE conference on intelligent transportation systems (ITSC 2013). IEEE, 2013, pp. 2039–2044.
 [28] H. Oliveira and P. L. Correia, “Crackitan image processing toolbox for crack detection and characterization,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 798–802.

[29]
S. Varadharajan, S. Jose, K. Sharma, L. Wander, and C. Mertz, “Vision for road
inspection,” in
Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on
. IEEE, 2014, pp. 115–122.  [30] H. R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, and R. M. Summers, “Improving computeraided detection using convolutional neural networks and random view aggregation,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1170–1181, 2016.

[31]
D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in
Advances in neural information processing systems, 2012, pp. 2843–2851.  [32] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Mitosis detection in breast cancer histology images with deep neural networks,” in International Conference on Medical Image Computing and Computerassisted Intervention. Springer, 2013, pp. 411–418.

[33]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, pp. 1097–1105.  [34] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Improving object detection with deep convolutional networks via bayesian optimization and structured prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 249–258.
 [35] J. Kivinen, C. Williams, and N. Heess, “Visual boundary prediction: A deep neural prediction network and quality dissection,” in Artificial Intelligence and Statistics, 2014, pp. 512–521.
 [36] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection using deep convolutional neural network,” in Image Processing (ICIP), 2016 IEEE International Conference on. IEEE, 2016, pp. 3708–3712.
 [37] Y. Xu, S. Li, D. Zhang, Y. Jin, F. Zhang, N. Li, and H. Li, “Identification framework for cracks on a steel structure surface by a restricted boltzmann machines algorithm based on consumergrade camera images,” Structural Control and Health Monitoring, vol. 25, no. 2, p. e2075, 2018.
 [38] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
 [39] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
 [40] R. Fan and N. Dahnoun, “Realtime stereo visionbased lane detection system,” Measurement Science and Technology, vol. 29, no. 7, p. 074005, 2018.
 [41] R. Fan, Y. Liu, M. J. Bocus, L. Wang, and M. Liu, “Realtime subpixel fast bilateral stereo,” arXiv preprint arXiv:1807.02044.
 [42] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” vol. 23, no. 11, pp. 1222–1239, Nov. 2001.

[43]
A. Ahmad and L. Dey, “A kmean clustering algorithm for mixed numeric and categorical data,”
Data & Knowledge Engineering
, vol. 63, no. 2, pp. 503–527, 2007.  [44] N. Otsu, “A threshold selection method from graylevel histograms,” and Cybernetics IEEE Transactions on Systems, Man, vol. 9, no. 1, pp. 62–66, Jan. 1979.