Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural Network

02/01/2018 ∙ by Zhun Fan, et al. ∙ 0

Automated pavement crack detection is a challenging task that has been researched for decades due to the complicated pavement conditions in real world. In this paper, a supervised method based on deep learning is proposed, which has the capability of dealing with different pavement conditions. Specifically, a convolutional neural network (CNN) is used to learn the structure of the cracks from raw images, without any preprocessing. Small patches are extracted from crack images as inputs to generate a large training database, a CNN is trained and crack detection is modeled as a multi-label classification problem. Typically, crack pixels are much fewer than non-crack pixels. To deal with the problem with severely imbalanced data, a strategy with modifying the ratio of positive to negative samples is proposed. The method is tested on two public databases and compared with five existing methods. Experimental results show that it outperforms the other methods.



There are no comments yet.


page 2

page 3

page 4

page 5

page 6

page 7

page 8

Code Repositories


This project was undertaken as a part of evaluation process at Xcaliber Infotech while I was interviewing there

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Pavement distress detection is a significant step for pavement management [1]. The goal is to obtain the pavement condition information for maintenance. Cracking is the most common type of pavement distress. Since automated crack detection systems are safer, lower costing, more efficient, and more objective, the research about them has attracted wide attention from both the academy and the industry [2, 3].

Automated crack detection is a challenging task. In the past few decades, image based algorithms of crack detection have been widely discussed. In early studies, methods are mostly based on combination or improvement of conventional digital image processing techniques, like thresholding [4], mathematical morphology [5], and edge detection [6]. These methods are generally based on photometric and geometric hypotheses about properties of crack images [7]. The most distinguished photometric property is that crack pixels are the darker pixels in an image. Based on that, threshold value is determined globally or locally to segment cracks and background [8, 4, 9]. However, these approaches are very sensitive to noise as they are performed on individual pixels. In order to overcome this problem, other methods take geometric information into account. For instance, the continuity property of cracks is considered to reduce the false detection [5]. Local binary pattern operator is used to determine whether a pixel belongs to cracks based on the local orientation [10]. Wavelet transform is applied to separate crack regions and crack free regions with multi-scale analysis [11]. These methods detect cracks efficiently, but they are not precise enough for finding all the cracks in an image. More recent works have been proposed to improve the accuracy of the detection method, and can be sorted into the following several branches.

Minimal Path Based Methods: Minimal path problem is to find best paths between nodes of a graph. Several methods based on the minimal path principle are proposed. Kaul et al. propose an algorithm that can find an open curve without knowing either the endpoints or topology of the curve [12]. Nguyen et al. take brightness and connectivity into account simultaneously to provide features of anisotropic cracks along free-form paths [13]. Amhaz et al. propose strategies to select endpoints at a local scale and to select minimal paths at the global scale [14]. These methods utilize distinguished features of crack pixels in a global view. However, the drawback is that they are too computationally intensive for practical applications.

Machine Learning:

With the development of machine learning, several methods focusing on feature extraction and pattern recognition have been proposed for crack detection

[15, 16, 17, 18]. In CrackIT [16], Oliveira et al.

use the mean and the standard deviation for unsupervised learning to distinguish blocks with crack from blocks without crack. Cord

et al. use AdaBoost to select textural descriptors that can describe crack images [17]. In CrackForest [18], Shi et al. propose a new descriptor based on random structured forests to characterize cracks. The performance of these methods are great but very dependent on the extracted features. However, due to complicated pavement conditions, it is hard to find features effective for all pavements.

Fig. 1: Two examples of structured prediction based on CNN. (a) Two pixel examples of the original image. are the coordinate of the pixel. (b) The extracted patches centered on the pixels. In the method the size of the patches is . (c) The structured prediction results of CNN with the patches input. (c) is obtained from (b) and (c) is obtained from (b). In the method the size of the output structure is

. (d) The probability map obtained by applying structured prediction on all pixels.

Deep Learning: For image classification tasks, deep learning has been proven to work better than traditional machine learning methods [19]. Recently, deep learning methods are successfully applied on damage and distress detection [20, 21, 22, 23]. Cha et al. use a sliding window to divide the image into blocks and CNN is used to predict whether the block contains cracks or not [20]. But the method can only find patch level cracks without considering the pixel level. In [23], Zhang et al. use CNN to predict whether an individual pixel belongs to crack based on the local patch information. However, the method ignores the spatial relations between pixels and overestimates crack width. In [21], Zhang et al.

use CNN to predict class for each pixel of the image. However, it still needs manually designed feature extractors for preprocessing, and CNN is only used as a classifier. Besides, its network architecture is strictly related to the input image size, which prevents generalization of the method.

Since CNN can extract good features from raw data, this paper propose a CNN based method that can learn the crack structure of a small patch within an image to find the whole crack on pixel level without preprocessing. Specifically, patches are exacted on each individual pixel of a raw image to build a training database. Then a CNN architecture is proposed for structured prediction as a multi-label problem [24]. Since crack pixels are much less than non-crack pixels in a typical image, applying structured prediction directly can hardly obtain a satisfied result. To solve the multi-label classification problem with imbalanced samples, we propose a strategy with modifying the ratio of positive to negative training samples. After training, the CNN can predict the centered structure of each input patch. Then all structure results of an image are summed to obtain a probability output, which reflects the condition of pavement generally, as shown in Fig. 1. Finally a decision threshold is set to obtain a binary output. The proposed method is compared with traditional methods like Canny [25] and local thresholding [26], and the state-of-art methods like Free-Form Anisotropy (FFA) [13], CrackForest [18], and Minimal Path Selection (MPS) [14]) to validate its effectiveness. The contributions of the paper are the following: 1) apply CNN based structured prediction on crack detection and obtain an outstanding performance; 2) propose a strategy to solve the multi-label classification problem with imbalanced samples and analyze its influence.

This paper is organized as the following. Section II describes the implementing details of the convolutional neural network, including data preparation, the architecture of CNN and training details. Section III gives the results and evaluations. Section IV describes the structured prediction in detail. Section V describes the influence of the imbalanced samples and the strategy to deal with it. The generalization of the method is discussed in Section VI. Conclusions and perspectives are given in Section VII.

Ii The Method

Ii-a Data Preparation

The proposed method is trained and tested on two databases: CFD[18]111 with RGB images and AigleRN[7]222 with gray-level images. The CFD database contains 118 RGB images with a resolution of 320 480 pixels. All images are taken by an iphone5 from pavements of Beijing, China. These images contain noises such as shadows, oil spots and water stains, and have non-uniform illumination. We use 72 images for training and 46 images for testing. The AigleRN database contains 38 gray-level images, which have been collected on French pavement and have been pre-processed to reduce the non-uniform illumination. Compared to the images in CFD, the pavement images in AigleRN are with more complex texture. We use 24 images for training and 14 images for testing.

The input of the network are patches extracted from images based on each specific pixel, as shown in Fig. 2. In this work, we set the patch size as and thus . Typically, RGB images with 3 channels and gray-level images with 1 channel are used as input of the network. So we propose a network with 3 channels input and a network with 1 channel input respectively.

Fig. 2: The extracted patch centered on pixel . is the distance between the edges and the center. Thus the size of the patch is .

In a typical crack image, non-crack pixels are a lot more than crack pixels. For a classification task with imbalanced samples, the proportion of positive and negative samples has a great impact on the network performance. Specifically, according to the manually labeled ground truth, if the center pixel of the extracted patch is a crack pixel, the corresponding patch is a positive sample and vice versa. To modify the proportion and keep the samples uniform, for training, all positive samples are extracted first. Then negative samples are extracted randomly according to the setting proportion. Generally, the ratio of positive to negative samples is set to 1:3, which is discussed in Section V. Then the training database can be obtained as shown in Fig. 3. In testing procedure, all pixels from an image are used and symmetric padding is used to deal with the boundary of the image. As a result, we can obtain all samples from an image in testing. For instance, 153,600 samples can be obtained in a 320

480 image. Details of the training and testing database are shown in Table I.

Fig. 3: Example of negative (left) and positive (right) training patches extracted from the CFD database.
Image dimensions 320480 991462/311462
Training Testing Training Testing
Images 72 46 24 14
Positive patches 185,483 107,112 39,208 42,317
Negative patches 556,449 6,958,488 117,624 4,168,351
Total patches 741,932 7,065,600 156,832 4,210,668
Positive : Negative 1:3 1:65 1:3 1:98.5
TABLE I: Training Database

Data with zero mean and equal variance is preferred for optimization in machine learning. For image input, a common practice is scaling the raw data from [0, 255] to [-1, 1]. This preprocessing is adopted our work.

Ii-B Network Architecture

Typical CNN contains convolutional layers, max-pooling layers and fully-connection (FC) layers. The proposed network has 4 convolutional layers with 2 max-pooling layers and 3 FC layers. The architecture of the network is illustrated in Fig. 4 and Table II. Extracted patches are used as input of the network with 3 channels in CFD and 1 channel in AigleRN. All convolutional layers are equipped with kernel of

and stride of 1, whose effectiveness has been verified in VGG-Net

[27]. Zeros padding on the boundary of each convolutional layer is assigned to preserve the spatial resolution of resulting feature maps after convolution. Max pooling is performed with stride 2 over a window.

Fig. 4: An illustration of the CNN architecture. The leftmost image is the input patch with 3 channels. Other cubes indicate the feature maps obtained from convolution (Conv) or max pooling. All convolutional layers are with kernel of , stride 1 and zeros padding. Max pooling is performed with stride 2 over a window.
Input conv16 conv16 maxpool conv32 conv32 maxpool FC64 FC64 FC25
[33,1,1] [33,1,1] [22,2,0] [33,1,1] [33,1,1] [22,2,0]
TABLE II: Architecture of the CNN. Layer Names are Followed by Numbers of Feature Maps.
Square Brackets Specify Kernel Size, Stride and Padding.

As shown in Fig. 1, the output of the network is a prediction of the structure centered in the input patch. For generalization, we use to denote the size of the output structure. Thus as shown in Fig. 1(c). The window is flattened into neurons in the network, as the output layer of the CNN shown in Fig. 4, and is fully connected to the second last layer. Thus the structured prediction problem is modeled as an multi-label problem and labels are from the corresponding ground truth. Details about the multi-label problem is discussed in Section IV.

For a multi-label classification problem, sigmoid function is the commonly used activation function for the output. Except the output units, all hidden layers are equipped with Rectified Linear Units (ReLU)

[28] for dealing with nonlinearity.

Ii-C Training and Testing

In a multi-label problem, all output units are not mutually exclusive, which means that more than one positive output may exist at the same time. In this case, cross entropy is commonly used in the loss function, which is defined as:

where and are the label and prediction of the th output unit, respectively; is the number of labels. To prevent the network from overfitting, weight decay is adopted in the paper to penalize the large weights. The penalty term is added to the loss function and loss used for optimization is:

where is the cross entropy function, is the penalty factor and is the th weight in the network, including weights in convolutional layers and FC layers. is set to 0.0005 in the experiment.

Dropout [29] is also a simple and effective approach for preventing overfitting. During training, value of hidden neurons is set to zero randomly. It is applied on the first two FC layers and the dropout ratio is set to 0.5.

Weights in convolutional layers and FC layers are initialized using Xavier method [30]. Adam [31]

is adopted as the optimizer with default setting in the original paper (learning rate set to 0.001). The batch size for each iteration is set to 256, and 20,000 iterations (43 epochs) in AigleRN and 30,000 iterations (13 epochs) in CFD are conducted.

In testing, each pixel is used for generating an input patch, so input patches overlap, leading to overlap of output windows. Typically, decisions can be obtained from a pixel except that it is near the boundary. Then for a testing image, all outputs from a pixel are summed and normalized to [0,1]. As a result, we can obtain a probability map from the results of all the pixels of an image.

Iii Results

The network is programmed under TensorFlow. All experiments are conducted on the workstation with Intel Xeon E5-2690 2.9GHz CPU, 64GB RAM and Nvidia Quadro K5000 GPU.

Iii-a Evaluation

We test the performance of the network on test images. For evaluation, precision (), recall () and F1 score () are introduced, which are commonly used in classification problems, and defined as:

where , , are the numbers of true positive, false positive and false negative respectively. Considering that the ground truth is subjective manually labels and there are transitional areas between crack pixels and non-crack pixels in real images, small distance between the detected result and the reference, which is 2 pixels in [14] and 5 pixels in [18], is accepted for evaluation. We accept 2 pixels in the paper.

Iii-B Results on CFD

Exemplar detections on CFD are shown in Fig. 5. The output of the network is a probability map as shown on the third row. Whiter color indicates that the pixel is more likely to be a crack pixel. Then the decision probability is set to 0.5 to remove the pixels with low probability, and a binary output is obtained. It can be observed that only few pixels are removed from binarization. The reason is that the contrast between the detected crack pixels and the detected non-crack pixels in the probability map is high. According to the output images in Fig. 5, our method can effectively deal with complex background and complex crack topology.

Fig. 5: Part of crack detection results by the proposed method (from top to bottom: original image, ground truth, probability map, binary output).

Fig. 6 and Table III shows the comparison with some other algorithms. It can be observed that the two traditional methods are not suitable for crack detection as they are sensitive to noises. Since both CrackForest and the proposed method are supervised learning methods, we generate the same training and testing databases for them. Notice that crack width is overestimated in CrackForest, leading to the high recall but low precision as shown in Table III. Evaluation results shown in Table III indicate that the proposed method outperforms CrackForest.

Fig. 6: Results comparing on CFD (from left to right: original image, ground truth, Canny, local thresholding, CrackForest, the proposed method).
Canny 0.4377 0.7307 0.4570
Local thresholding 0.7727 0.8274 0.7418
CrackForest 0.7466 0.9514 0.8318
The proposed method 0.9119 0.9481 0.9244
TABLE III: Crack Detection Results Evaluation on CFD

Iii-C Results on AigleRN

Fig. 7 and Table IV present the results of different algorithms on AigleRN. It is again observed that the two traditional methods are sensitive to noise and fail to find the continuous cracks. FFA can find some cracks in a local scale, but can hardly find all the continuous cracks. MPS is more effective to find the continuous cracks on a global view. Compared to MPS, our proposed method can find the cracks with better precision and higher recall. From image results, it is clear that the proposed method outperforms the others.

Fig. 7: Results of different methods on AigleRN (from left to right: original image, ground truth, Canny, local thresholding, FFA, MPS, the proposed method).
Canny 0.1989 0.6753 0.2881
Local thresholding 0.5329 0.9345 0.6670
FFA 0.7688 0.6812 0.6817
MPS 0.8263 0.8410 0.8195
The proposed method 0.9178 0.8812 0.8954
TABLE IV: Crack Detection Results Evaluation on AigleRN

Iv Output Structure

Generally, CNN is used for classification [19]. Typical classification problems contain single-label classification and multi-label classification. In a single-label classification problem, a sample can only be classified into one class. For example, in [23], crack detection is modeled as a binary classification problem and a pixel is classified as a crack pixel or a non-crack pixel based on its neighbouring information. However, in a multi-label classification problem, a sample can be classified into more than one class. Structured prediction based on CNN, modeled as a multi-label classification problem, is quite a new idea [24]. The method utilizes the information of a patch to predict not only the centered pixel but the centered structure. This operation takes the spacial relations between pixels into account, i.e., a crack pixel is more likely to be neighboured by another crack pixel and vice versa.

In this experiment, we explore the performance of the structured prediction. The size of the output structure is denoted by . Results presented are obtained for , i.e., the number of output units is set to 1, 9, 25, 49, respectively. Other configurations remain unchanged. Specifically, for , the model is simplified to the pixel classification problem similar to [23]. The performance with different sizes of output structure is shown in Fig. 8.

Fig. 8: Precision, recall and F1 score value variations with different sizes of output structure. Testing on CFD and AigleRN respectively.

From Fig. 8, it can be observed that all results for structured prediction, particularly for , are better than that for pixel classification. To identify the difference between the structured prediction and the pixel classification, we present the probability maps for and , as shown in Fig. 9. Both of them can detect the cracks but pixel classification is much more sensitive to noises. This is natural when considering the procedure of patches extraction and classification. Classifying a pixel based on its neighbouring information is the original design. However, the network is essentially classifying the patches in the procedure. Thus, limited by samples, it may be trapped in overfitting and misunderstand the classification problem. For example, if the mean of the extracted patches distinguishes between positive samples and negative samples, CNN may essentially turn to a classifier of patches based on the mean. The mean is a good feature for patches but is not enough to classify the pixel, since it is sensitive to the interference like non-uniform illumination. It is very confusing as we can not ensure what is CNN based on to classify the patches. Structured prediction can solve this problem to some extent, as it aims to predict several outputs at the same time. The outputs between patch to patch differ slightly based on the labels. Therefore, the network can not rely on single feature to accurately predict all outputs of a patch. This prevents the network from misunderstanding the problem.

Fig. 9: Example probability maps with (middle) and (right) for the original image (left).

Another explanation is for the testing procedure. In pixel classification, only one decision is obtained from the network for a pixel. However, in structured prediction, decisions can be obtained for a pixel by scanning all pixels through the whole image. These decisions are summed to obtain a more accurate result. Besides, normalizing the whole output image to [0,1] gives a global view to smooth the detected pixels.

V Ratio of Positive to Negative Samples

Applying structured prediction directly can hardly obtain a satisfying result. As is shown in Table I, crack pixels are much fewer than non-crack pixels in typical crack images. The ratio of positive to negative samples is down to 1:65 in CFD, and even down to 1:98.5 in AigleRN. In such a classification task with severely imbalanced samples, the ratio of positive to negative has a great impact on the training results. To clarify the problem, assume that the natural ratio(i.e., 1:65) is used in training, if the predicted results of CNN are all negative, i.e., it predicts all pixels to be non-crack pixels, it can obtain a high accuracy as . Thus if no treatment is conducted to deal with imbalanced data, the network prefers predicting negative results in training, which would lead to overestimation of the number of non-crack pixels in testing, as shown in Fig. 11.

Typically, in a multi-label problem, one output unit represents one class. Specifically in our method, an output unit represents a pixel on the particular position. For output units, a significant characteristic is that if a crack pixel appears, it is likely to find crack pixels on other units. Based on this characteristic, we introduce the terms positive sample and negative sample that used in binary classification, as we just care about the probabilities of crack pixels and non-crack pixels appearing. The particular position is not so important. According to the patch extraction procedure, we define an input extracted from a crack pixel as a positive sample, and an input extracted from a non-crack pixel as a negative sample. Experiments are conducted to make a thorough investigation on the influence of the ratio.

In the experiment, the total number of positive samples and negative samples are kept constant, and the ratio of positive to negative samples is modified. The ratio is defined as:

For example, if we keep the number of total samples to be 120, means 60 positive samples and 60 negative samples, while means 40 positive samples and 80 negative samples. We set the number of total training samples to be 360,000 for CFD and 60,000 for AigleRN. All training samples are generated from training dataset randomly, and we test the model with different training conditions on the whole testing dataset. The result is shown in Fig. 10.

Fig. 10: Precision, recall and F1 score value variations with different ratios of positive to negative samples. Testing on CFD and AigleRN respectively.
(a) Original image
(f) Natural()
Fig. 11: Results of the network with different ratios of positive to negative samples in training. Natural means all samples are extracted randomly.

From Fig. 10, upward trend of precision and downward trend of recall can be clearly seen. The reason is that increasing negative samples in training would evidently increase the probability of negative prediction of the model, as is discussed above. This can be observed more clearly from Fig. 11. With higher

in training, the model overestimate the number of the non-crack pixels, which leads to higher precision and lower recall. F1 score varies accordingly. Besides, precision and recall are computed according to manual labels, which are different from database to database. Since cracks in ground truth of CFD are wider than those of AigleRN generally, the best ratios for different database are different. From Fig. 10, the acceptable range of

is for CFD and for AigleRN.

Fig. 12: Examples of cross database testing. From left to right on the top (training/testing): original image, CFD/CFD, AigleRN/CFD, Hybrid/CFD. From left to right on the bottom: original image, AigleRN/AigleRN, CFD/AigleRN, Hybrid/AigleRN. All outputs are probability maps.

Vi Generalization Study

From results shown in Section III, we can see that the model works efficiently on CFD database and AigleRN database, respectively. In this section, the generalization of the model is discussed. In practical applications, different data acquisition systems bring different types of data. Besides, conditions of pavements differ from road to road. We conduct cross database testing on the two databases to confirm the validation of the model generalization.

We conduct three experiments: training on CFD and testing on AigleRN, training on AigleRN and testing on CFD, and training and testing on a hybrid database. In the first two experiments, the training and testing databases generated previously are used. For the hybrid database, we take half of both training databases to generate a new training database and test on the other images. The examples of results are shown in Fig. 12 and the evaluations are shown in Table V.

TrainTest AigleRN CFD
TABLE V: Cross Database Testing

It is observed that in cross testing, cracks detected by the model trained on AigleRN are very thin, leading to low recall and high precision, and the opposite is true for the model trained on CFD. In addition, a hybrid data training can make a trade-off. Moreover, results from hybrid data training suggest that with more data from different pavement conditions, CNN can learn better to predict cracks for different roads with the same architecture.

Vii Conclusion and Perspectives

In this paper, an effective method based on CNN is proposed for pavement crack detection. Compared with other state-of-art methods, the proposed method shows a better performance in terms of dealing with different pavement texture. After training, the model can predict cracks close to the manual labels. This evidently shows that deep learning methods have potential for crack structure prediction. The network can learn from raw images without any preprocessing and output very satisfactory results.

With training and testing on different databases, we observe a good generalization of the model. With hybrid data training, the potential of a general model for different conditions of pavements may be brought out. Since CNN has a good learning ability for images, the network architecture can also be considered for other types of data, such as laser and 3D camera images, which are commonly obtained from systems of pavement distress detection.

Moreover, from cross testing we can see the influence of the manual labels to the detection results. As shown in experiments, thinner crack labels lead to thinner crack outputs. This is basically because labelling still limited by judgements of human experts. Our main perspective is to develop Semi-supervised or unsupervised learning methods to break through the limitations of human judgments.


  • [1] H. Zakeri, F. M. Nejad, and A. Fahimifar, “Image based techniques for crack detection, classification and quantification in asphalt pavement: a review,” Archives of Computational Methods in Engineering, vol. 24, no. 4, pp. 935–977, 2017.
  • [2] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “Cracktree: Automatic crack detection from pavement images,” Pattern Recognition Letters, vol. 33, no. 3, pp. 227–238, 2012.
  • [3] D. Zhang, Q. Li, Y. Chen, M. Cao, L. He, and B. Zhang, “An efficient and reliable coarse-to-fine approach for asphalt pavement crack detection,” Image and Vision Computing, vol. 57, pp. 130–146, 2017.
  • [4] H. Oliveira and P. L. Correia, “Automatic road crack segmentation using entropy and image dynamic thresholding,” in Proc. Eur. Signal Process. Conf., 2009, pp. 622–626.
  • [5] N. Tanaka and K. Uematsu, “A crack detection method in road surface images using morphology.” in Proc. Workshop Mach. Vis. Appl., vol. 98, 1998, pp. 17–19.
  • [6] H. Zhao, G. Qin, and X. Wang, “Improvement of canny algorithm based on pavement edge detection,” in Proc. Int. Conf. Image, Signal process., vol. 2.   IEEE, 2010, pp. 964–967.
  • [7] S. Chambon and J.-M. Moliard, “Automatic road pavement assessment with image processing: Review and comparison,” International Journal of Geophysics, vol. 2011, 2011.
  • [8] J. Tang and Y. Gu, “Automatic crack detection and segmentation using a hybrid algorithm for road distress analysis,” in Proc. Int. Conf. Syst., Man, Cybern.   IEEE, 2013, pp. 3026–3030.
  • [9] Q. Li and X. Liu, “Novel approach to pavement image segmentation based on neighboring difference histogram method,” in Proc. Int. Conf. Image, Signal process., vol. 2.   IEEE, 2008, pp. 792–796.
  • [10] Y. Hu and C.-x. Zhao, “A local binary pattern based methods for pavement crack detection,” Journal of Pattern Recognition Research, vol. 1, no. 20103, pp. 140–147, 2010.
  • [11] P. Subirats, J. Dumoulin, V. Legeay, and D. Barba, “Automation of pavement surface crack detection using the continuous wavelet transform,” in Proc. Int. Conf. Image Process.   IEEE, 2006, pp. 3037–3040.
  • [12] V. Kaul, A. Yezzi, and Y. Tsai, “Detecting curves with unknown endpoints and arbitrary topology using minimal paths,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 10, pp. 1952–1965, 2012.
  • [13] T. S. Nguyen, S. Begot, F. Duculty, and M. Avila, “Free-form anisotropy: A new method for crack detection on pavement surface images,” in Proc. Int. Conf. Image Process.   IEEE, 2011, pp. 1069–1072.
  • [14] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack detection on two-dimensional pavement images: An algorithm based on minimal path selection,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 10, pp. 2718–2729, 2016.
  • [15] P. Delagnes and D. Barba, “A markov random field for rectilinear structure extraction in pavement distress image analysis,” in Proc. Int. Conf. Image Process., vol. 1.   IEEE, 1995, pp. 446–449.
  • [16] H. Oliveira and P. L. Correia, “Automatic road crack detection and characterization,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 155–168, 2013.
  • [17] A. Cord and S. Chambon, “Automatic road defect detection by textural pattern recognition based on adaboost,” Computer-Aided Civil and Infrastructure Engineering, vol. 27, no. 4, pp. 244–259, 2012.
  • [18] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, “Automatic road crack detection using random structured forests,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 12, pp. 3434–3445, 2016.
  • [19]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [20] Y.-J. Cha, W. Choi, and O. Büyüköztürk, “Deep learning-based crack damage detection using convolutional neural networks,” Computer-Aided Civil and Infrastructure Engineering, vol. 32, no. 5, pp. 361–378, 2017.
  • [21] A. Zhang, K. C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J. Q. Li, and C. Chen, “Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network,” Computer-Aided Civil and Infrastructure Engineering, vol. 32, no. 10, pp. 805–819, 2017.
  • [22]

    K. Gopalakrishnan, S. K. Khaitan, A. Choudhary, and A. Agrawal, “Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection,”

    Construction and Building Materials, vol. 157, pp. 322–330, 2017.
  • [23] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection using deep convolutional neural network,” in Proc. Int. Conf. Image Process.   IEEE, 2016, pp. 3708–3712.
  • [24] P. Liskowski and K. Krawiec, “Segmenting retinal blood vessels with deep neural networks.” IEEE transactions on medical imaging, vol. 35, no. 11, pp. 2369–2380, 2016.
  • [25] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6, pp. 679–698, 1986.
  • [26] W. Niblack, An introduction to digital image processing.   Strandberg Publishing Company, 1985.
  • [27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [28]

    V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in

    Proc. Int. Conf. Mach. Learn., 2010, pp. 807–814.
  • [29] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [30] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. Int. Confe. Art. Intell., Stat., 2010, pp. 249–256.
  • [31] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.