Unsupervised Pixel-level Road Defect Detection via Adversarial Image-to-Frequency Transform

01/30/2020 ∙ by Jongmin Yu, et al. ∙ RMIT University 13

In the past few years, the performance of road defect detection has been remarkably improved thanks to advancements on various studies on computer vision and deep learning. Although a large-scale and well-annotated datasets enhance the performance of detecting road pavement defects to some extent, it is still challengeable to derive a model which can perform reliably for various road conditions in practice, because it is intractable to construct a dataset considering diverse road conditions and defect patterns. To end this, we propose an unsupervised approach to detecting road defects, using Adversarial Image-to-Frequency Transform (AIFT). AIFT adopts the unsupervised manner and adversarial learning in deriving the defect detection model, so AIFT does not need annotations for road pavement defects. We evaluate the efficiency of AIFT using GAPs384 dataset, Cracktree200 dataset, CRACK500 dataset, and CFD dataset. The experimental results demonstrate that the proposed approach detects various road detects, and it outperforms existing state-of-the-art approaches.



There are no comments yet.


page 2

page 3

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Road defect detection is one of the important studies to prevent vehicle accidents and manage the road condition effectively. All over the United States, road conditions contribute to the frequency and severity of motor vehicle accidents. Almost of third of all motor vehicle crashes are related to poor road conditions, resulting in more than two million injuries and 22,000 fatalities [zaloshnja2009cost]. Over time, as road infrastructure ages, the condition of that infrastructure steadily declines, and the volumes and severity of defects increase [carr2018road]. Therefore, the need for the development of a method for detecting road defects within this area only increases [hadavandsiri2019concrete], and numerous studies have been being proposed in the literature.

Over the past decades, diverse studies have considered the use of image processing and machine learning approaches with hand-crafted features

[acosta1992low, bray2006neural, chambon2010road, deutschl2004defect]. Statistical analysis [acosta1992low, chambon2010road] is the oldest one and also the most popular. Acosta et al. [acosta1992low] and Deutschl et al. [deutschl2004defect] have proposed vision-based methods based on partial differential techniques. Chambon et al. [chambon2010road] have presented a method based on Markovian modelling to take into account the local geometrical constraints about road cracks. Bray et al. [bray2006neural]

have utilized the classification approach using neural networks for identifying road defects. These approaches usually identify road pavement defects using the contrast of texture information on a road surface.

However, the contrast between roads and the defects on the roads may be reduced due to the illumination conditions and the changes in weather [sun2009automated]. Additionally, the specification of cameras for capturing the surface of the roads also can affect the detection accuracies. Hense, it is still challenging to develop a defect detection method which can cover various road conditions in the real world using a simple image processing or machine learning methods alone [baygin2015new].

Recently, various approaches [pauly2017deeper, fan2019road] based on deep learning have been proposed to overcome these drawbacks. Pauly et al. [pauly2017deeper]

have proposed a method for road defect detection employing convolutional neural networks (CNNs). Fan

et al. [fan2019road] have proposed segmentation method based on CNNs and apply an adaptive. These approaches need a well-annotated dataset for road pavement defects, and also their performance may depend on scale of the given dataset. Regrettably, it is problematic in practice to construct such a dataset containing various patterns of road pavement defects.

Developing an unsupervised method which does not need annotations for road pavement defects in the training step, is an issue that has been noticed for a long time in this literature. Various unsupervised approaches based on image processing and machine learning were proposed [abdel2006pca, oliveira2012automatic]. However, these approaches still have an inherent weakness which is detection performances are highly dependent on camera specifications and image qualities. Recently, among the approaches based on deep learning, several studies [mujeeb2019one, kang2018deep]

have presented unsupervised methods using autoencoder

[vincent2010stacked]. These approaches take normal road images as their training samples and optimize their models in a way to minimize reconstruction errors between their input and output. These approaches recognize defects if the reconstruction errors of inputted samples are larger than a predefined threshold.

However, according to Perera et al. [perera2019ocgan] and Pidhorskyi et al. [pidhorskyi2018generative], even though a model based on the reconstruction setting obtains a well-optimized solution, there is a possibility that the model can reconstruct samples which have not appeared in the training step. It could be a significant disadvantage in detecting road pavement defects using the model. Due to this disadvantage, the model may produce lower error than the expectation even if it takes defect samples as their input, and it can make hard to distinguish whether this sample contains defects or not.

Fig. 1: Architectural detail of the adversarial image-to-frequency transform. The blue objects denote the operation units including the generator and the discriminators and

. The red circles indicate the loss functions corresponded to the each operation unit. The red arrow lines show the work flow for the image-to-frequency cycle

, and the blue arrow lines represent the process of the frequency-to-image cycle . The dotted arrow lines represent the correlations of each component to the loss functions.

To tackle this issue, we present an unsupervised approach, which exploits domain transformation based on adversarial learning, to detecting road defects. The proposed approach called Adversarial Image-to-Frequency Transform (AIFT) is trained by normal road images only and needs no annotations for defects. In contrast to other approaches [mujeeb2019one, kang2018deep]

optimizing their models by minimize reconstruction errors, AIFT is concentrated on deriving mapping function between an image-domain and a frequency-domain using adversarial manner. To demonstrate the efficiency of the proposed approach for road defect detection, we compare the proposed approach with various state-of-the-art approaches, including supervised and unsupervised methods. The experimental results show that the proposed approach can outperform existing state-of-the-art methods.

The main contributions of our work are summarized as follows:

  • An unsupervised method for detecting road defects, which can provide outstanding performance without a well-annotated dataset for road defects.

  • The adversarial learning for deriving the image-to-frequency mapping function. Our approach can derive the more optimal transform model than typical approaches such as reconstruction or classification settings.

  • The extensive experiments about road defect detection. The experiments include ablation analysis depending on the loss functions and comprehensive comparison with the existing state-of-the-art methods.

In the further sections, we describe the details of our approach and provide the experimental results and analysis it. We conclude this paper by summarizing our works.

Ii The Proposed Method

Ii-a Adversarial Image-to-Frequency Transform

It is essential to derive a robust model invariant to environments in order to detect a great number of defect patterns on roads. Our method is inspired by novelty detection studies

[perera2019ocgan, pidhorskyi2018generative]

, which derive a model using inlier samples only and recognize outliers by computing a likelihood or an reconstruction error. The proposed method, called Adversarial Image-to-Frequency Transform (AIFT), initially derives a transform model between image-domain and frequency-domain using normal road pavement images only. The frequency-domain corresponding to the image-domain is generated by applying Fourier transform to the given image-domain. Detecting road pavement defects is conducted by comparing given and generated samples of each domain.

AIFT is composed of three components: Generator , Image discriminator , Frequency discriminator , for applying adversarial learning. The original intention of adversarial learning is to learn generative models while avoiding approximating many intractable probabilistic computations arising in other strategies e.g.,

maximum likelihood estimation. This intention is suitable to derive an optimal model for covering the various visual patterns of road pavement defects. The workflow of AIFT is illustrated in Fig


The generator plays as a role for the mapping function between image-domain to frequency-domain as follows, . For the convenience of notation, we distinguish the notations of mappings for image-to-frequency and frequency-to-image , separately. generate the transformed results from each domain as follows,


where and indicate the transformed results from and , respectively. and are conveyed to the two discriminators and for computing an adversarial loss. For computational-cost-effective implementation, weight sharing has employed.

Fig. 2: Structural details of the network models in the generator and the discriminators and . (a) and (b) denote the structural details of the generator and the two discriminators and , respectively. The green, blue, and red boxes denote the convolutional layers, the deconvolutional layers, and the fully-connected layers, respectively.

The discriminators and are defined as follows,


where denotes the indicator to assign the discriminators depending on the types of inputs . takes and as an input, and takes and as an input, respectively. indicates the outputs and according to the types of the inputs and the discriminators. The value of can be regarded by as a likelihood to discriminate whether a given sample is truth or generated. Each component is compiled by CNNs and fully-connected neural networks and the structural details of these components are shown in Fig 2.

Ii-B Adversarial transform consistency learning

As the workflow of AIFT shown in Fig 1, the generator plays a role as a bidirectional mapping function between image-domain and corresponding frequency-domain generated from . The underlying assumption for detecting road pavement defects using AIFT is as follows. Since AIFT is only trained with normal road pavement images, if AIFT takes images containing defect patterns as an input, the error between the given samples and the transformed results would be larger than normal ones. Given this assumption, the prerequisite for precise road defect detection on AIFT is deriving a strict transform model between the image-domain and the frequency-domain from a given dataset for normal road pavement samples.

To end this, we present an adversarial transform consistency loss for training AIFT. Adversarial transform consistency loss is defined by,


where tries to generate images and frequency samples via and that look similar to given images and frequencies , while and aim to distinguish between given samples ( and ) and transformed results ( and ).

Fig. 3: Comparison of the given and generated samples for the road pavement image and the corresponding frequency.

Adversarial learning can, in theory, learn mappings that produce outputs identically distributed as image and frequency domains, respectively [zhu2017unpaired]. However, with large enough capacity, can map the same samples of an input domain to any random permutation of samples in the different domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, adversarial transform consistency loss alone may not guarantee that the learned function can map an individual input to the desired output.

To further reduce the space of possible mapping functions, we utilize the reconstruction loss to optimize the generator . It is a common way to enforce the output of the generator to be close to the target through the minimization of the reconstruction error based on the pixel-wise mean square error (MSE) [ying2019x2ct, bai2018finding, liu2018future, sabokrou2018deep]. It is calculated in the form


Consequently, the total loss function is:,


where indicates the balancing parameter to take the weight for the reconstruction loss.

Given the definition of above loss functions, the discriminators and the generator are trained by maximizing or minimizing corresponding loss terms expressed by,


where , ,and denote the parameters corresponded to the generator , the image discriminators , and the frequency discriminator . Fig 3 illustrates the examples of the given samples and the transformed results for image and frequency domains. We have conducted the ablation studies to observe the effect of each loss term in learning AIFT.

Fig. 4:

The trends of AIU over the training epochs. (a) show the AIU trend over the training epochs on GAPs384 dataset, and (b) illustrate the AIU trend with respect to the training epochs on CFD dataset. The red-coloured curve (AIFT

) denotes the AIU trend of AIFN trained by the total loss (Eq 5). The green-colored curve (AIFT) indicates the AIU trend of AIFN trained by the ATCL loss (Eq 3) only. The blue-colored curve (AIFT) shows the AIU trend of AIF trained by the reconstruction loss (Eq 4).

Ii-C Road pavement defect detection

Detecting defects on a road is straightforward. Initially, AIFT produces the frequency sample using given an image samples . Secondly, AIFT transforms into the image samples via . Pavement defects are defected by comparing the given image sample with the transformed result .

Similarity metric for comparing the two samples and , is defined as follows,


where is expectation of and . Above similarity metric is based on Jeffery divergence, which is a modified KL-divergence to take symmetric property. Euclidean distances such as -norm and -normal are not suitable as a similarity metric for images since neighboring values are not considered [rubner2000earth]. Jeffrey divergence is numerically stable, symmetric, and invariant to noise and input scale [puzicha1997non].

Model GAPs384 dataset [eisenbach2017get] CFD dataset [shi2016automatic]
AIFT 0.052 0.181 0.201 0.152 0.562 0.572
AIFT 0.081 0.226 0.234 0.187 0.642 0.659
AIFT 0.083 0.247 0.249 0.203 0.701 0.732
TABLE I: Quantitative performance comparison of the detection performance on AIFT using GAPs384 dataset and CFD dataset depending on the loss functions (Eq 4), (Eq 3), and (Eq 5). The bolded figures indicate the best performances on the experiments.

Iii Experiment

Iii-a Experiment setting and dataset

To evaluation the performance of the proposed method on road defect detection, we employ the best F-measure on the dataset for a fixed scale (ODS), the aggregate F-measure on the dataset for the best scale in each image (OIS), and AIU, which is proposed by Yang et al. [yang2019feature]. AIU is computed on the detection and ground truth without non-max suppression (NNS) and thinking operation, defined by, , where denotes the total number of thresholds with interval 0.01; for a given , is the number of pixels of intersected region between the predicted and ground truth crack area; and denote the number of pixels of predicted and ground truth crack region, respectively. The proposed method has been evaluated on four publicly available datasets. The details of the datasets are described as follows.

GAPs384 dataset is German Asphalt Pavement Distress (GAPs) dataset presented by Eisenbach et al. [eisenbach2017get], and it is constructed to address the issue of comparability in the pavement distress domain by providing a standardized high-quality dataset of large scale. The dataset contains 1,969 gray scaled images for pavement defects, with various classes for defects fsuch as cracks, potholes, and inlaid patches. The resolution of images is 1,9201,080.

Cracktree200 dataset [zou2012cracktree] contains 206 pavement images with 800600 resolution, which can be categorized to various types of pavement defects. The images on this dataset are captured with some challenging issues such as shadows, occlusions, low contrast, and noise.

CRACK500 dataset is constructed by Yang et al. [yang2019feature]. The dataset is composed of 500 images wity 2,0001,500, and each image has a pixel-level annotation. The dataset is seperated by training dataset and test dataset. The training dataset consists of 1,896 images, and the test dataset is composed of 1,124 images.

CFD dataset [shi2016automatic] contains 118 images with 480320 resolution. Each image has pixel-level annotation for pavements. The images on this dataset are captured by Iphone 5 with focus of 4mm aperture of and exposure time of 1/135s.

The hyperparameter setting for the best performance is as follows. The epoch size and the batch size are 50 and 64, respectively. The balancing weight for the reconstruction loss

is 0.1, and the critic iteration is set by 10. The networks are optimized by Adam et al. [kingma2014adam]

. The proposed approach has implemented with Pytorch library

111Source codes are publicly available on https://github.com/andreYoo/Adversarial-IFTN.git, and the experiments have conducted with GTX Titan XP and 32GB memory.

Fig. 5: Visualization of the road pavement defect detection results. The images on the first row represent the input images. The second row’s images illustrate the ground-truths. The images on the third row denote the detection results for road pavement defects.
Methods S/U GAPs384 [eisenbach2017get] Cracktree200 [zou2012cracktree] CRACK500 [yang2019feature] CFD [shi2016automatic] FPS(s)
HED [xie2015holistically] S 0.069 0.209 0.175 0.040 0.317 0.449 0.481 0.575 0.625 0.154 0.683 0.705 0.0825
RCF [liu2017richer] S 0.043 0.172 0.120 0.032 0.255 0.487 0.403 0.490 0.586 0.105 0.542 0.607 0.079
FCN [long2015fully] S 0.015 0.088 0.091 0.008 0.334 0.333 0.379 0.513 0.577 0.021 0.585 0.609 0.114
CrackForest [shi2016automatic] U - 0.126 0.126 - 0.080 0.080 - 0.199 0.199 - 0.104 0.104 3.971
FPHBN [yang2019feature] S 0.081 0.220 0.231 0.041 0.517 0.579 0.489 0.604 0.635 0.173 0.683 0.705 0.237
AAE [makhzani2015adversarial] U 0.062 0.196 0.202 0.039 0.472 0.491 0.371 0.481 0.583 0.142 0.594 0.613 0.721
SVM [zhang2016road] S 0.051 0.132 0.162 0.017 0.382 0.391 0.362 0.418 0.426 0.082 0.3R52 0.372 0.852
ConvNet [zhang2016road] S 0.079 0.203 0.211 0.037 0.472 0.499 0.431 0.591 0.609 0.152 0.579 0.677 0.921
AIFT 0.083 0.247 0.249 0.045 0.607 0.642 0.478 0.549 0.561 0.203 0.701 0.732 1.1330
TABLE II: Quantitative performance comparison about road pavement defect detection using GAPs384 [eisenbach2017get], Cracktree200 [zou2012cracktree], CRACK500 [yang2019feature], and CFD [shi2016automatic]. ”-” means the results are not provided. The bolded figures indicate that the best performance among them. ’S/U’ denotes whether a model focuses on ’supervised’ or ’unsupervised’ approaches. FPS indicates the execution speed of each method, and it is computed by averaging the execution speeds about all datasets.

Iii-B Ablation study

We have conducted an ablation study to observe the effect of the loss function terms on the performance of AIFT. We have trained AIFT using the three loss functions (Eq 4), (Eq 3), and (Eq 5) using GAPs384 dataset and CFD dataset, and observed AIU at every two epochs. The hyperparameter settings applied to train each model, are all same, and only the loss functions are different. Fig 4 shows the AIU trends of AIFTs trained by the three loss functions. Table I contains AIUs, ODSs, and OISs on GAPs384 dataset and CFD dataset. The experimental results show that AIFT trained by the total loss (AIFT) achieves the best performance on this experiments. As shown in Table I, AIFT achieves 0.083 of AIU, 0.247 of OIS, and 0.249 of ODS for GAPs384 dataset. These figures show that AIFT can produce approximately 7% better performance than others. In the experiments using CFD dataset, AIFT achieves 0.203 of AIU, 0.701 of OIS, and 0.732 of ODS, and these figure are all higher than that of the others.

Notably, the overall experimental results demonstrate that the AIFTs trained by adversarial learning, can outperform the AIFT based on the reconstruction setting (AIFT). Not only AIFT, but also AIFT obtains the improved achievement than AIFT. The AIU Trends (Fig 4) also justify that the AIFT learnt by adversarial manners can outperform the AIFT trained by the reconstruction setting. The experimental results justify adversarial learning can improve the robustness of AIFT for detecting road pavement defects.

Iii-C Comparison with existing state-of-the-arts

We have carried out the comparison with existing state-of-the-art methods for the crack detection [xie2015holistically, shi2016automatic, yang2019feature] and the road pavement defect detection [zhang2016road]. For the efficiency of the experiments, only AIFT is compared with other methods. Table II contains AIUs, OISs, and ODSs on Cracktree200, GAPs384, Cracktree200, and CFD datasets. AIFT has achieved state-of-the-art performance for GAPs384 dataset, Cracktree200 dataset, and CFD dataset. In the experiments using GAPs384 dataset, AIFT achieves 0.083 of AIU, 0.247 of ODS, and 0.249 of OIS. These figures show that AIFT outperforms than the previous state-of-the-art performance that achieved by FPHBN [yang2019feature]. FPHBN obtains 0.081 of AIU, 0.220 of ODS, and 0.231 of OIS. AIFT shows 3% better performances than FPHBN. The experiments on Cracktree200 dataset and CFD dataset also show that AIFT surpasses other methods. AIFT produces 0.045 of AIU, 0.607 of ODS, and 0.642 of OIS in the experiments using Cracktree200 dataset. Additionally, AIFT achieves 0.203 of AIU, 0.701 of ODS, and 0.732 of OIS on CFD dataset. These figures are 8.8% and 3% better than the previous state-of-the-art methods.

However, AIFT could not obtain the highest performance on CRACK500 dataset. The state-of-the-art performance on CRACK500 dataset is achieved by FPHBN [yang2019feature], and it produces 0.489 of AIU, 0.604 of ODS, and 0.635 of OIS, respectively. AIFT has 0.478 of AIU, 0.549 of ODS, and 0.561 of OIS. The gaps between FPHBN and AIFT are 0.011 on AIU, 0.055 on ODS, and 0.074 on OIS. However, FPHBN exploits a supervised approach, and it needs predetermined pixel-level annotations for road pavement defects. Also, the network architecture applied to their approach is much deeper than Ours. These are the great advantages of detecting road pavement defects.

The overall experiments show that AIFT can outperform existing state-of-the-art methods. As shown in Table II, the detection performance of AIFT surpasses other unsupervised methods [shi2016automatic, makhzani2015adversarial]. Additionally, AIFT

achieves outstanding detection performance in detecting defects than others based on supervised learning approaches, even AIFT

does not need an annotation for road pavement defects in the training step. This may be thought that AIFT is enabled to apply various practical situations in which a large-scale and well-annotated dataset can not be used. Consequently, the experimental results demonstrate that AIFT can outperform existing state-of-the-art methods.

Iv Conclusions

In this paper, we have proposed an unsupervised approach to detecting road pavement defects, based on adversarial image-to-frequency transform. The experimental results demonstrate the proposed approach can detect various patterns of road pavement defects without explicit annotations for road defects in the training step, and it outperforms existing state-of-the-art methods for detecting road pavement defects.


This work was partly supported by the ICT R&D program of MSIP/IITP. (2014-0-00077, Development of global multi target tracking and event prediction techniques based on real-time large-scale video analysis).