Pavement diseases pose a great threat to the driving safety of vehicles, since roads age over time due to wear and tear, which reduce the stability of the road surface and form defects of various shapes. Detecting pavement disease is one of the most important steps for maintaining the stability of roads. The traditional pavement disease detection scheme is mainly manual detection, which requires a large number of professionals and fruitful domain knowledge. Moreover, professional testing requires expensive professional sensors 
. As the total mileage and the usage frequency of roads increase, it is almost impossible to accomplish such a detection task manually. Thanks to the rapid progress in artificial intelligence, recent computer vision techniques are able to provide an elegant and effective way of detecting pavement diseases automatically.
There are quite a lot of impressive works for addressing different pavement disease analysis issues from the perspective of computer vision. Conventionally, they are mainly based on low-level image analysis, hand-crafted features and classical classifiers[22, 10, 13, 8, 17]. For examples, Shi et al  presented a random structured forest named CrackForest combined with the integral channel features for automatic road crack detection. In , a filter bank consisting of multiple oriented Gabor filters is proposed to detect the road cracks. Pan et al 
Inspired by the recently remarkable successes of deep learning in extensive applications, there has been a trend of more and more researchers applying the advanced deep learning approaches to tackle these tasks [5, 4, 6]. Zhang et al  segmented the pavement cracks by detecting the crack point with convolutional neural networks (CNN). In 
, an ImageNet pre-trained VGG-16 DCNN is applied to categorize the pavement image into ”crack” or ”non-crack”.[12, 23, 1] utilizes YOLO v2, Faster RCNN, RetinaNet to localize the pavement diseases respectively. Fan et al  produced a novel automatic road crack detection system. In this system, a CNN is used for determining whether the pavement image contains cracks or not, and then an adaptive thresholding method is presented for segmenting the cracks based on the image smoothed by bilateral filters.
In summary, the tasks of the aforementioned works can be grouped into three categories. They are pavement crack segmentation [18, 17, 25, 24, 9], pavement crack localization [12, 23, 1] and specific pavement distress detection [14, 4, 6]. However, the task we concentrate on is relevant but also quite different from these tasks. We intend to judge whether there are diseases or not based on the pavement image. The pavement disease we want to detect is not only limited to cracks and potholes, but also some other general distresses, such as repair, crack pouring. We name this task automatic pavement disease detection which can be deemed as a generalization of pavement crack detection. This task is also the important pre-step of the pavement crack segmentation and the core step of pavement crack localization. Although such task can be considered as a typical pavement image binary classification problem, it is very challenging, since the pavement imaging suffers from uneven illumination, chromatic aberration, road markings in the background, and the high diversity in the appearance of various diseases, such as cracks, potholes, erosive pits, and their mixtures, as shown in Figure 1.
In this paper, we intend to address this issue via using deep learning. The classic CNN-based approaches such as ResNet  and GoogLeNet  often need to translate the image into a fixed low resolution image and accomplish the classification based on the entire image. However, such image translation will lost a lot of image information particularly for the high resolution images. For an example, the input of ResNet is fixed to while the resolution of our pavement image is . After the image translation, the input image will lost 95% pixels. Moreover, the diseased area is often just very small fraction of the entire pavement image. In such manner, such aforementioned global-based approaches may be more easily obstructed by noise and the background variations. Therefore, we propose a novel local-based deep learning framework named Iteratively Optimized Patch Label Inference Networks (IOPLIN) for addressing the automatic pavement disease detection issue.
In IOPLIN, the pavement image is divided into dozens of patches, and then an EfficientNet  is considered as a Patch Label Inference Network (PLIN) for inferring the labels of patches. Finally, the detection result of a pavement image will be achieved by the maximum pooling of its inferred patch labels. The main obstacle of this methodology is that only image-level labels are available. To address this issue, we propose the Expectation-Maximization Inspired Patch Label Distillation (EMIPLD) strategy for iteratively and gradually optimizing PLIN only based on the image label. Different to the convolutional CNN-based pavement disease detection regime, IOPLIN can not only offer good detection results in image level, but also roughly localize the disease in the pavement image via EMIPLD in a weak supervised manner. To evaluate the effectiveness of our work, we introduce a novel large-scale Bituminous Pavement Disease Detection database named CQU-BPDD consists of 60059 high-resolution pavement images involves seven different diseases and the normal one. These images are automatically captured by in-vehicle cameras from different areas in southern China. The extensive experimental results on this dataset validate the effectiveness and superiority of IOPLIN in comparison with state-of-the-art CNN algorithms.
The main contributions are summarized as follows:
To the best of our knowledge, we formally define the automatic pavement disease detection task that is not just limited to specific diseases such as cracks and potholes.
We release a novel large-scale automatic pavement diseases detection dataset which involves various diseases and is acquired from different environments.
We present a novel deep learning-based automatic pavement disease approach named Iteratively Optimized Patch Label Inference Networks (IOPLIN). It not only can sufficiently utilize the information of any resolution image for detecting pavement disease but also can roughly localize the distress position just based on the image label.
We systematically and empirically compare the performances of the recent state-of-the-art CNN approaches in automatic pavement disease detection, and validate the superiority of our work over them.
Ii-a Problem Formulation and Overview
Let be a pavement image associated with a binary label indicating whether there exist diseases or not. The automatic pavement disease detection is essentially a binary image classification task that aims to derive a detector to classify into a pavement image into ”diseased” or ”normal”.
To tackle the automatic pavement disease detection task, we present a novel deep learning approach named Iteratively Optimized Patch Label Inference Networks (IOPLIN). In IOPLIN, the pavement image is preprocessed by Contrast Limited Adaptive Histogram Equalization (CLAHE)  for suppressing the negative effect of uneven illumination first. The processed image will be further divided into patches and a Patch Label Inference Networks (PLIN) is trained for inferring the patch labels. Finally, the pavement image label can be obtained by the maximum pooling of its patch labels. The core of our approach is the PLIN. However the PLIN cannot be well trained directly, since only the image label is available while the patch labels of each image are unavailable in the training phase. To overcome this difficulty, we present the Expectation-Maximization Inspired Patch Label Distillation (EMIPLD) strategy for iteratively optimizing the training of PLIN with the reason initializations of patch labels. In the next subsections, we will go into the details of our method.
Ii-B Histogram Equalization and Patches Collection
Since the pavement images are captured at different times and from different areas, they suffer from serious uneven illuminations. To suppress the negative impact of illumination, the pavement image is processed by the Contrast Limited Adaptive Histogram Equalization processed (CLAHE) . The empirical analysis also implies that such preprocessing indeed improves the detection performance.
The traditional Convolutional Neural Networks (CNN), such as VGGNet , GoogLeNet , and ResNet , often require the size of the input image is around 300300 while the size of the pavement image on our dataset is 1200900. Instead of resizing the high-resolution image into the low-resolution one and directly inputting it into the CNN for yielding the final detection results, our approach aims to partition the image into patches and perform the detection by inferring the patch labels using CNN. In such a manner, the image information can be fully exploited, and the side products such as patch labels or patch-based disease confidences can be produced, which may offer a good explanation of the results or benefit the solutions of other follow-up tasks.
In our case, we simply follow the non-overlapping image blocking strategy and fix the patch size to 300300, since the size of our backbone network (EfficientNet-B3) input is 300300 and our 1200900 resolution pavement images can be evenly divided in such manner. With regard to other resolution pavement image, we can empirically design the image block strategy and the patch size based on the type of backbone network and the size of the pavement image which all pixels of the pavement image are expected to be evenly exploited.
We assume each image are divided into 300300 patches. Such step can be mathematically denoted as follows:
where is the -th pre-processed image, is the CLAHE operation and represents the -th patch of image. is the number of patches and equal to 12 in our implementation. We also assume there are images for training. Thus, the total number of patches for training is .
Ii-C Patch Label Inference Network
There are a lot of classical CNN models that have proved their effectiveness in image classification. We empirically evaluate several CNN models that have a similar size in parameters and eventually choose the very recent CNN model named EfficientNet-B3 as our backbone network for inferring the labels of patches. This network is pre-trained with the ImageNet dataset and its output layer is replaced with a two nodes output layer. The details of EfficientNet-B3 please refer to . We name such network Patch Label Inference Network (PLIN), and the patch label inference is denoted as follows,
where is the mapping function of PLIN and is its associated network parameters. is the prediction value of the true patch label where its value is equal to 1 or 0 when there exists or does not exist disease .
Ii-D EM Inspired Patch Label Distillation
Unfortunately, only the image label is available while the ground truth of patch label is unavailable, which impedes the normal training of PLIN. In this section, we will introduce an iteratively PLIN training strategy named Expectation-Maximization Inspired Patch Label Distillation (EMIPLD). The basic idea of EMIPLD is to give a reasonable initialization of patch label for training a PLIN, and to retrain the PLIN based on the new labels inferred by the previous version PLIN. These steps are iteratively executed until convergence. Considering the training step as step while the label inference step as step, such an iteration scheme is very similar to the idea of Expectation-Maximization (EM) algorithm, and the patch labels will be progressively refined during the iteration just as its name says. Such an idea can work, since the labels of the patches from the normal pavement images are always normal, and these credibly labeled data drive the continual optimization of PLIN and the progressive distillation of the patches from the diseased pavement image.
Ii-D1 Initialization of Patch Labels
We consider the image label as the initial label of its patch . In such a case, the labels of the patches from the normal pavement images are credible while the ones from the diseased pavement images are suspicious, since the diseased areas may not cover all the image.
Ii-D2 The Maximization () Step
We train PLIN with all training data and their associated current patch labels to achieve the network parameters of PLIN in the -th iteration.
Ii-D3 The Expectation () Step
The step is to leverage the trained PLIN to infer the labels of patches. According to Equation 2, each patch can achieve a label prediction value referred to as the confidence score . We present the Image-based Rank Aware Threshold (IRAT) scheme for adaptively updating the label of each patch based on the confidence scores. However we only update the labels of patches from diseased pavement images, since the labels of patches from the normal images should always be 0 (”normal”). IRAT is the core of the step.
Image-based Rank Aware Threshold (IRAT): A patch from a diseased image labelled as the diseased patch by IRAT should meet any of following two conditions:
(a) Its confidence score is above the ratio of the number of the diseased patch to the total one in the previous iteration, and it can be automatically calculated in each iteration with the initialization ;
(b) Its confidence score is belonging to the top percentage high score in its image.
Such label updating strategy can be mathematically denoted as follows:
where returns the minimum threshold in the top percentage specific to the -th image. In our implementation, , which is empirically learned in a small size validation set.
Ii-D4 Prior Knowledge Biased Cross-Entropy
We think that the labels of the diseased patches who own the higher confidence scores produced by PLIN in the previous iteration is more reliable than the ones who own the lower scores, and a good PLIN should also suppress the normal patches who own high confidence scores. Therefore, we deem the confidence scores obtained and the distribution of the patch labels in the previous iteration as the prior knowledge, and incorporate them to design a weighting scheme for cross-entropy. We introduce this novel cross-entropy loss named Prior Knowledge Biased Cross-Entropy (PKBCE) to the PLIN,
The is considered as the normalized version of , and a higher implies that the corresponding patch is paid more attention to the next training.
Ii-E Pavement Disease Detection
After the optimization of PLIN is converged, the trained PLIN model is used to label the patches of test images. And the detection label of a test image is achieved by the maximum pooling of its patch labels, . According to such strategy, the final detection label inference is not up to the patch number of an image. In other words, our model can handle any resolution image.
Algorithm 1 presents the specific steps of our approach.
To speed up the convergence, the PLIN is also fine-tuned with thumbnails of the training pavement images before the iterative optimization. Our empirical study shows that such trick is quite effective and can even further improve the performance of IOPLIN. The details will be discussed in the experimental part.
Ii-F The Merits of IOPLIN
In the contrast to other deep learning models, IOPLIN enjoys many merits:
IOPLIN is essentially a flexible local-based deep learning framework. Any CNN models can be plugged into IOPLIN as the backbone network.
IOPLIN can handle any resolution image and sufficiently exploit the image information. If the image size is smaller than , IOPLIN will be degenerated as a regular EfficientNet model.
IOPLIN pays more attention on the local visual feature and can roughly localize the diseased areas without using any patch-level prior supervised information.
IOPLIN significantly outperforms state-of-the-art CNN models, particularly in the high recall case .
Iii Experiments and Results
Iii-a Dataset and Setup
Dataset: We release a novel large-scale Bituminous Pavement Disease Detection dataset named CQU-BPDD for evaluation. CQU-BPDD consists of 60059 1200900 resolution bituminous pavement images which automatically captured by the in-vehicle cameras at different time from different areas in southern China 11footnotetext: The database website: https://huangsheng-cqu.github.io/. CQU-BPDD involves seven different distresses, namely transverse crack, massive crack, alligator crack, crack pouring, longitudinal crack, ravelling, repair, and the normal ones. The data distribution of CQU-BPDD is shown in Figure 3.
Data Split Protocol: We randomly select 5140 diseased pavement images involved all disease and 5000 normal pavement images to produce the training set while the rest of dataset is used as the testing set. In the testing set, there are 11589 diseased pavement images and 38330 normal images.
, Fisher Vector (FV)[15, 11], VGG-19 , ResNet-50 , Inception-v3 , EfficientNet-B3  are used for comparison. HOG and LBP are the local-based hand craft representation methods while FV is a shallow learning-based representation methods. The last four ones are the state-of-the-art deep learning approaches which have similar amount of parameters and been successfully applied to numerous image classification tasks. As automatic pavement disease detection is a typical binary classification issue, we adopt Area under Curve (AUC) of Receiver Operating Characteristic (ROC) curve as the comprehensive performance metric. All the hyper-parameters involved in the compared methods are well tuned.
Iii-B Pavement Disease Detection
shows the P-R curves of these methods. From these results, it is clear that our work consistently outperforms the compared methods with a significant advantage in different evaluation metrics. EfficientNet-B3 achieves the best performance among the seven compared methods, and it is also adopted as the backbone of our Patch Label Inference Network (PLIN). Even so, our work gets 2% gains in AUC over EfficientNet-B3. And the precision gains of our work over it are 12.8% and 15.9% when the recalls are fixed to 90% and 95% respectively. The hand-craft feature or shallow learning-based methods perform much worse than the deep learning ones obviously. They even cannot achieve 90% accuracy in AUC.
In addition, the observations in Figure 5 reveals a very interesting phenomenon of IOPLIN that the precision gain of our work over EfficientNet-B3 is increased along with the increase of recall. This is a very desirable property for automatic pavement disease detection, since people always pay more attention to the disease images rather than the normal ones. This is because the omission of the disease images may cause serious safety risks while the omission of the normal ones almost leads to no cost in real life. In such a manner, a good pavement disease detection approach should perform much better in a higher recall. All results imply that our work is better to meet such requirement.
Iii-C Ablation Study
Table II shows the ablation analysis results where HE, IRAT, FT and PKBCE respectively represent the histogram equalization, image-based rank award threshold, fine-tune with the thumbnails of the pavement images and prior knowledge biased cross entropy. The comparison of the first two rows implies that the HE step can slightly improve the Pavement Disease Detection (PDD) performances. The backbone network of PLIN is EfficientNet-B3. However, IRAT+HE+PLIN performs slightly worse than HE+EfficientNet-B3. We attribute this to the different training schemes of EfficientNet-B3 in these two approaches. The first one is iteratively trained for patch label inference without any patch label ground truth while the later one is adequately trained with the certain ground truths for image label inference. By considering the fine-tuning with the thumbnails, the IOPLIN get 0.7% AUC gain. This indicates that a good initialization of PLIN is helpful to the optimization of models. Among all the tricks in IOPLIN, PKBCE contributes the most, which improves IOPLIN 1% in AUC. We also plot the relationship between the iteration number and the detection performance in Figure 6. It reveals another benefit from fine-tuning which speeds up the convergence of the model optimization.
Iii-D Visualization of Inferred Patch Labels
Different to the conventional detection regime, IOPLIN accomplishes the detection by judging if there exist any diseased patches in the image. In such a strategy, the labels of patches in an image can be roughly inferred, and these labels are important by-product information for explanation and even benefit the solution of following-up task. We visualize the inferred labels (confidence scores) of patches from two testing images in Figure 7. The observations show that the inferred patch labels can roughly localize the diseased areas in an image.
In this paper, we proposed a novel deep learning framework named Iteratively Optimized Patch Label Inference Network (IOPLIN) for automatic pavement disease detection. IOPLIN iteratively trains the Patch Label Inference Network (PLIN) only with the image labels by applying the EM Inspired Patch Label Distillation strategy. Then it infers the patch labels for a testing pavement image and accomplishes the detection task by maximum pooling of its patch labels. A novel large-scale Bituminous Pavement Disease Detection dataset named CQU-BPDD was constructed for evaluating the effectiveness of our work. The experimental results demonstrate the superiority of our method in comparison with some state-of-the-art CNN approaches and also show that IOPLIN can roughly localize the diseased areas without any location prior information.
The work described in this paper was partially supported by National Natural Science Foundation of China (No. 61602068), Fundamental Research Funds for the Central Universities (No. 106112015CDJRC091101) and the Science and Technology Research Program of Chongqing Municipal Education Commission of China under Grant No. KJQN201800705 and KJQN201900726.
-  (2018) Road damage detection using retinanet. In IEEE International Conference on Big Data (Big Data), pp. 5197–5200. Cited by: §I, §I.
-  (2014) FDTD simulation of the gpr signal for effective inspection of pavement damages. In International Conference on Ground Penetrating Radar, Cited by: §I.
Histograms of oriented gradients for human detection.
IEEE conference on computer vision and pattern recognition, Vol. 1, pp. 886–893. Cited by: §III-A, TABLE I.
-  (2019) Road crack detection using deep convolutional neural network and adaptive thresholding. arXiv preprint arXiv:1904.08582. Cited by: §I, §I.
-  (2018) Automatic pavement crack detection based on structured prediction with the convolutional neural network. arXiv preprint arXiv:1802.02208. Cited by: §I.
Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construction and Building Materials 157, pp. 322–330. Cited by: §I, §I.
-  (2016) Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §I, §II-B, §III-A, TABLE I.
-  (2010) A novel lbp based methods for pavement crack detection. Journal of pattern Recognition research 5 (1), pp. 140–147. Cited by: §I, §III-A, TABLE I.
-  (2019) Automatic pavement crack detection by multi-scale image fusion. IEEE Transactions on Intelligent Transportation Systems 20 (6), pp. 2025–2036. Cited by: §I.
-  (2008) Novel approach to pavement image segmentation based on neighboring difference histogram method. In IEEE Congress on Image and Signal Processing, Vol. 2, pp. 792–796. Cited by: §I.
-  (2012) Local descriptors encoded by fisher vectors for person re-identification. In IEEE international conference on computer vision, pp. 413–422. Cited by: §III-A, TABLE I.
-  (2018) Automated road crack detection using deep convolutional neural networks. In IEEE International Conference on Big Data (Big Data), pp. 5212–5215. Cited by: §I, §I.
Multiple lane detection algorithm based on novel dense vanishing point estimation. IEEE Transactions on Intelligent Transportation Systems 18 (3), pp. 621–632. Cited by: §I.
-  (2017) Object-based and supervised detection of potholes and cracks from the pavement images acquired by uav. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences 42. Cited by: §I, §I.
-  (2010) Improving the fisher kernel for large-scale image classification. In European conference on computer vision, Vol. 6314, pp. 143–156. Cited by: §III-A.
-  (1990) Contrast-limited adaptive histogram equalization: speed and effectiveness. In IEEE Conference on Visualization in Biomedical Computing, pp. 337–345. Cited by: §II-A, §II-B.
-  (2013) Pavement crack detection using the gabor filter. In IEEE international conference on intelligent transportation systems, pp. 2039–2044. Cited by: §I, §I.
-  (2016) Automatic road crack detection using random structured forests. IEEE Transactions on Intelligent Transportation Systems 17 (12), pp. 3434–3445. Cited by: §I, §I.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §II-B, §III-A, TABLE I.
-  (2016) Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §I, §II-B, §III-A, TABLE I.
-  (2019) EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946. Cited by: §I, §II-C, §III-A, TABLE I.
-  (1998) A crack detection method in road surface images using morphology. In IAPR Workshop on Machine Vision Application, pp. 154–157. Cited by: §I.
-  (2018) Deep proposal and detection networks for road damage detection and classification. In IEEE International Conference on Big Data (Big Data), pp. 5224–5227. Cited by: §I, §I.
-  (2019) Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems, pp. 1–11. Cited by: §I.
-  (2016) Road crack detection using deep convolutional neural network. In IEEE international conference on image processing (ICIP), pp. 3708–3712. Cited by: §I, §I.