Lane Detection in Low-light Conditions Using an Efficient Data Enhancement : Light Conditions Style Transfer

02/04/2020 ∙ by Tong Liu, et al. ∙ Beijing Institute of Technology 25

Nowadays, deep learning techniques are widely used for lane detection, but application in low-light conditions remains a challenge until this day. Although multi-task learning and contextual information based methods have been proposed to solve the problem, they either require additional manual annotations or introduce extra inference computation respectively. In this paper, we propose a style-transfer-based data enhancement method, which uses Generative Adversarial Networks (GANs) to generate images in low-light conditions, that increases the environmental adaptability of the lane detector. Our solution consists of three models: the proposed Better-CycleGAN, light conditions style transfer network and lane detection network. It does not require additional manual annotations nor extra inference computation. We validated our methods on the lane detection benchmark CULane using ERFNet. Empirically, lane detection model trained using our method demonstrated adaptability in low-light conditions and robustness in complex scenarios. Our code for this paper will be publicly available.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the development of deep learning, neural networks have been widely used in lane detection

[1]. Lane detection can be challenging in low-light conditions, such as night, cloudy day and shadows. When the light condition is not ideal, as shown in Fig 1. (a), the lane markings will become fuzzy, which makes it difficult to extract features, thus bringing problems to lane detection. At present, there are two methods for lane detection in low-light conditions: (1) multi-task learning; (2) contextual information. Multi-task learning can provide more monitoring information. For example, VPGNet[2] adds road marking detection and vanishing point prediction in multi-task learning, which can have a better performance in lane detection. However, this method needs additional manual labeling. Contextual information is another method, such as SCNN[3], replacing the connection of the layers from lay-to-lay to slide-to-slide, which makes the pixels transfer information between rows and columns. Contextual information is able to help the detector understand the traffic scene better in low-light conditions, but it increases inference computation due to the process of the message passing.

Fig. 1: (a)Low-light conditions, such as night and shadow. (b) Proportion of day and night in CULane training set. The data in low-light conditions make up a very small percentage of the training set.

In deep learning, the number of data and the variety of environment directly determine the performance of the detector. Due to the absence of data in low-light conditions, such as night and shadow, the detector performs poorly in these conditions. CULane[3] is a challenging lane detection benchmark. As shown in Fig 1.(b), in the training set of CULane, the data in night environment only accounts for . That may be a possible reason why the performance of detector is not ideal in the low-light conditions.

In this paper, we propose a style-transfer-based data enhancement method of lane detection, which uses Generative Adversarial Networks(GANs)[4] to generate images in low-light conditions and improves the performance of the lane detector. The proposed light condition style transfer method can be easily applied to various detection tasks and solve the detection difficulty in low-light conditions , which does not require additional manual collection and marking, nor does it increase the inference computation of the lane detector.

For light conditions style transfer, we propose Better-CycleGAN, which achieves a better effect of style transfer than CycleGAN[5]. The proposed Better-CycleGAN adds scale match operation to solve the scale variation problem caused by non-proportional resizing, and generates the images in low-light conditions. At the same time, the generated images use the labels from original images, so there is no need to collect and mark manually. We put the generated images with the original labels into the training set and provide them to lane detection model for training. In the process of inference, the lane detection model only needs to run independently without Better-CycleGAN, which will not increase the inference computation. We use ERFNet as our lane detection model. In our experiments, the ERFNet using our light conditions style transfer method not only performs better in low-light conditions, but also gets better performance in other challenging environment, contributing to the model being more robust.

Our main contributions are summarized as following:

  • We propose Better-CycleGAN instead of CycleGAN, which solve the scale variation problem, as well as makes the generated images more realistic.

  • We propose an efficient data enhancement in low-light conditions using light conditions style transfer method, which does not require additional manual labeling and increasing inference computation.

  • We use ERFNet as our lane detection model, and verify the effectiveness of our method.

Ii Related Work

Ii-a Lane Detetion

Most traditional lane detection methods are based on hand-crafted features. Cheng et al.[6] extract lane based on color information and Aly[7] apply Line Segment Detection (LSD), which is followed by post-processing steps. Apart from color features and gradient feature, ridge features are also applied in lane detection, et al., such as hough transform[8]

, Kalman filtering

[9] and Particle filtering[10].

Since Convolutional Neural Network (CNN) can extract features at different levels, it has been widely used for lane detection tasks in different scenes

[11, 12]. Van Gansbeke et al.[13] proposed a lane detector in an end-to-end manner, which consists of two parts: a deep network that predicts a segmentation-like weights map for each lane, and a differentiable least-squares fitting module for returning fitting parameters. Y.HOU et al.[14] present Self Attention Distillation(SAD) for lane detection, while further representational learning is achieved by performing top-down and hierarchical attention distillation networks within the network itself.

Ii-B GANs for Image Transfer

Based on Conditional of Generative Adversarial Networks (cGANs)[15]

, there are some methods for supervised image-to-image translation between unpaired samples

[5, 16]. CycleGAN[5] is essentially two mirrored and symmetrical GANs, which share two generators and each carries a discriminator, enabling unsupervised transfer between the two fields. Compared with CycleGAN, the proposed Better-CycleGAN can avoid non-proportional resizing, and make the generated images more realistic.

Fig. 2: The main framework of our method. The proposed Better-CycleGAN is shown on the left. The generator transfer images from suitable light conditions to low-light conditions, while the generator transfer in the opposite way. The discriminator and feedbacks the single scalar value(real or fake) to generators. The middle section shows light conditions style transfer from suitable light conditions to low-light conditions by the trained Better-CycleGAN. Lane detection model is shown on the right, whose baseline is ERFNet. We add lane exist branch for better performance.

Ii-C GANs for Data Enhancement

There are some works using GANs to generate images for data enhancement to improve object detection. Xiaolong W et al.[17] consider that object occlusion or deformation account for a small percentage, so ASDN and ASTN are designed for learning how to occlude and rotate object respectively. Lanlan Liu et al.[18] develope a model that jointly optimized GANs and detector, and train the generator with detection loss, which can generate objects that are difficult to detect by the detector and improve the robustness of the detector. The methods above only use GANs to solve problems of object occlusion and small object detection, but it has no effect on the problem of low-light conditions.

Closely related to our work, Vinicius F. et al.[19] complete the style transfer from day to night through CycleGAN to enhance the data in night scenes. Nevertheless, when using CycleGAN for style transfer, the author does not consider the inconsistent width and height of the images taken by the vehicle camera, directly cut out the two sides of the images and then resize the images to , which ignores the challenging detection areas. In contrast, the proposed method adds scale information match on generator to avoid forcing resize, which can better ensure the authenticity of the generated images and has certain benefits for the training of the detector.

Iii Proposed Method

The proposed method, illustrated in Fig 2, consists of three steps: (i) Better-CycleGAN, (ii) light conditions style transfer, and (iii) lane detection. Better-CycleGAN is composed of two generators and two discriminators, when generators focus on generating images and discriminators give feedback to the generator about whether the generated images are authentic. We use Better-CycleGAN to realize light conditions style transfer from suitable light conditions to low-light conditions, in order to implement data enhancement without any manual collection and labeling. Finally, we use ERFNet as our lane detection model. ERFNet is trained with the data after enhancement by light conditions style transfer, and it inferences by itself without any other process.

Iii-a Better-CycleGAN

Fig. 3: Generator architecture, composed of convolution layers, residule blocks and deconvolution layers. Convolution layers record the changing information of scale in the encoding process and maps it to the corresponding operation in the decoding process.

For the mismatch ratio of resolution between the input of CycleGAN and the real images in dataset, the scale variation problem caused by non-proportional resizing could easily lead to the distortion in the generated image. Better-CycleGAN can achieve automatic scale adaptation by adding scale information match operation on generator, making the generated images more realistic. We implement our architecture based on CycleGAN, which is composed of two generators and two discriminators.

Generator Network

Inspired by the architecture of the CycleGAN, the generator of Better-CycleGAN contains two stride-2 convolutions, several residual blocks, and two stride-1 convolutions, as shown in Fig 3. However, the difference is that the convolutional auto-encoder cannot be applied to different resolution images. For the mismatch ratio of resolution, the input images should be resized to required resolution. Generally, in the encoding process, we automatically remove numbers after decimal point when the scale of the feature maps become an odd number. The following step is carrying out the convolution operation, with padding concerning the original resolution of the image in end. It’s so inconvenient that this method cannot be used for different resolution of images. Better-CycleGAN automatically records the change information of scale in the encoding process and maps it to the corresponding operation in the decoding process, so as to solve the scale variation problem caused by non-proportional resizing and realize adapting different resolution images without any other operation.

Discriminator Network We employ a sequence of convolution layers as our discriminator network, which includes three stride-2 convolutions and one stride-1 convolution. The discriminator outputs the result of judging the generated image, which can feedback to the generator.

Iii-B Light Conditions Style Transfer

In this paper, light conditions style transfer is completed by Better-CycleGAN. We define the data in suitable light conditions (such as day) as domain X, and the data in low-light conditions (such as night and shadow) as domain Y. Better-CycleGAN can transform domain X to domain Y and also domain Y to domain X through a circle.

In training process, the generator generates more realistic images based on the feedback from the discriminator, which aims to achieve the purpose of spoofing discriminator. The discriminator judges the authenticity (real/fake) of the generated images more accurately, and finally achieves dynamic balance. Therefore, we introduce adversarial loss to describe this process, which is the key of GANs to generate realistic images. We define the adversarial loss from domain X to domain Y as:

(1)

where tries to generate images that look like images from domain Y by minimizing this objective, while aims to distinguish between generated samples and real samples y by maximizing this objective, i.e. . The adversarial loss from domain Y to domain X is similar to Eqa. 1: .

For better performance, we adapt cycle-consistent like and . In this way, we can monitor the difference between the original images and the generated images after style transfer for two times, thus leading the generated images to be more realistic. We incentivize this behavior using a cycle consistency loss:

(2)

The total loss is the sum of the three items.

(3)

where controls the relative importance of the two objectives.

Some examples of light conditions style transfer result are shown in Fig 4. The generated images use the annotations from corresponding original images, which doesn’t require additional annotations. Therefore, it’s efficient to use Better-CycleGAN for data enhancement.

Fig. 4: Examples of real images in normal light conditions and their corresponding transfer images in low-light conditions. Although some details are not well handled, most of generated images can be converted to low-light conditions with high fidelity.

Iii-C Lane Detection

For lane detection, we use ERFNet[20]

as our baseline, which has a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Considering slow convergence of ERFNet, we add lane existence branch as shown in Fig 5. In our architecture, decoder takes in charge of the instance segmentation task, which outputs probability maps of different lane markings, and the second branch predicts the existence of lane. The loss function is shown as follow:

(4)

where is instance segmentation negative log likelihood loss, is lane existence binary cross entropy loss. We balance the tasks by weight terms , and finally set , .

During training, the generated images in low-light conditions are send to ERFNet. In inference time, we just need ERFNet without any other operation and go from probability maps to curve. For each lane marking whose existence value is larger than 0.5, we search the corresponding probability map every 20 rows for the position with the highest response. In the end, these positions are then connected by lines, which are the final predictions.

Fig. 5: Our lane detection model architecture. The decoder outputs probability maps of different lane markings, and the second branch predicts the existence of lane.
Category ERFNet CycleGAN + ERFNet Better-CycleGAN + ERFNet(ours) SCNN [3] ENet-SAD [17] ResNet-101-SAD [17]
Normal 91.5 91.7 91.8 90.6 90.1 90.7
Crowded 71.6 71.5 71.8 69.7 68.8 70.0
Night 67.1 68.9 69.4 66.1 66.0 66.3
No Line 45.1 45.2 46.1 43.4 41.6 43.5
Shadow 71.3 73.1 76.2 66.9 65.9 67.0
Arrow 87.2 87.2 87.8 84.1 84.0 84.1
Dazzle Light 66.0 67.5 66.4 58.5 60.2 59.9
Curve 71.6 69.0 72.4 64.4 65.7 65.7
Crossroad 2199 2402 2346 1990 1998 2052
Total 73.1 73.6 73.9 71.6 70.8 71.8
TABLE I: Performance (-measure) of different methods on CULane testing set. For crossroad, only FP is shown.
Fig. 6: The probability maps from our method and other methods. The brightness of the pixel indicates the probability of this pixel belonging to lanes. It can be clearly seen from this figure, in low-light conditions, the probability maps generated by our method is clearer and more accurate.

Iv Experimental Results

Iv-a Dataset

CULane[3] dataset is widely used in lane detection, which contains many different challenging driving scenarios, like crowed, shadow, night, dazzle light and so on.

For light conditions style transfer, 3200 images in suitable light conditions and 3,200 images in low-light conditions have been selected. These images are divided according to a ratio of 3:1, which are respectively used as the training set and test set of Better-CycleGAN. After finishing the training of Better-CycleGAN, we select 13,000 images in suitable light conditions which are transferred into the images in low-light conditions through the trained Better-CycleGAN.

Iv-B Implementation Details

Light conditions style transfer Due to the limited memory of GPU, we resize images to

. Better-CycleGAN is trained with 100 epochs, with one image per batch. In order to compare the effects of Better-CycleGAN with CycleGAN, we also do similar work on CycleGAN, and resize images to

before training.

Lane detection A public source code222https://github.com/cardwing/Codes-for-Lane-Detection/tree/master/

ERFNet-CULane-PyTorch

is used for carrying out the experiments. Before training, we resize the images in CULane to . ERFNet is pre-trained by Cityscape dataset. Our lane detection model is trained with 12 epochs, with 12 images per batch. We use SGD to train our models and the initial learning rate is set to 0.01.

In order to verify the effectiveness of our method, we design a cogent comparative experiment, which includes three groups: (1) the original CULane training set, (2) CULane training set adding 13,000 images in low-light conditions generated by CycleGAN, and (3) CULane training set adding 13,000 images in low-light conditions generated by Better-CycleGAN. These three groups are named as ERFNet, ERFNet+CycleGAN and ERFNet+Better-CycleGAN respectively. The test set of the experiment is CULane test set.

Iv-C Evaluation metrics

Following [3],in order to judge whether a lane is correctly detected, we treat each lane marking as a line with 30 pixel width and compute the intersection-over-union (IoU) between labels and predictions. Predictions whose IoUs are larger than threshold are considered as true positives (TP). Here threshold is set to 0.5. Then, we use F1 measure as the evaluation metric, which is defined as:

, where , and .

Iv-D Results

The results of our comparative experiment are shown in Table I. Compared with ERFNet without data enhancement, we can find that our method(Better-CycleGAN+ERFNet) perform better in low-light conditions, such as night and shadow, whose measure increase and respectively. This indicates that the light conditions style transfer method is helpful to the lane detection performance of ERFNet in low-light conditions. At the same time, light conditions style transfer also help our lane detection model perform better in other scenarios, which increase the total measure from to . Specifically, our method increase the measure from to in no line scenarios and from to in dazzle light scenarios. It shows that our method not merely can help ERFNet achieve lane detection better in low-light conditions, but also prompt ERFNet to understand different lane markings in other challenging scenarios, meanwhile making the trained network more robust.

We also add the CycleGAN+ERFNet method into the comparison experiment, which shows that the proposed method is also superior to CycleGAN+ERFNet in most traffic scenes. Because the low-light conditions images generated by Better-CycleGAN is more realistic than the images generated by CycleGAN, so as to reduce the impact of the noise from generated images on the network.

In the low-light conditions (night, shadow), the probability maps output by the three methods above are shown in the Fig 6. The probability maps generated by our method are clearer and more accurate, which comes to a conclusion that our method can capture the characteristics of lane markings better in low-light conditions.

Iv-E Ablation Study

Generated Images v.s. Real Images

For comparing influence of the generated images with real images, 13,000 images in low-light conditions from the original CULane training set and 13,000 images in low-light conditions generated by Better-CycleGAN are used as the training set. The CULane test set is used for test. The results are shown as Fig. 7, which indicate that the model using generated images could converge and overfit faster. Because the data collected in the real environments will have a lot of interference and noise that cannot be eliminated, while the images generated by Better-CycleGAN can be easily understood by lane detection model. In addition, since deep learning is a probability distribution problem, the images generated by Better-CycleGAN can be closer to the distribution of the actual traffic scenarios, so that the data-driven detection model can understand different lane markings better.

Light Conditions Style Transfer v.s. Retinex Retinex theory assumes that the enhancement of low-light image can be realized by removing the illumination and only keeping the reflectance as the final result. Therefore, we use Retinex to enhance the images in the low-light conditions of CULane test set, and then use ERFNet for lane detection. The results show that the effect of this method is not as good as ours. Because the essence of Retinex is the conversion between the graph domain and the number domain, which is easy to have a great impact on such slender targets as the lane markings. In addition, the inference computation of Rentinex cannot meet the requirement of real-time detection.

Fig. 7: The training process of ERFNet on real images and generated images. The ordinate is mIOU of validation set, and the abscissa is epoch.

Image Amounts to Generate We investigate how many images we generate is better for our lane detection model. Assume that the ratio of the generated images to the real image in low-light conditions is N, we take for comparative experiment. The results are shown as Table II. Although having generated more images in low-light conditions, the -measure of in low-light conditions is no longer rising, and even perform poorly in other scenes. We cannot draw a conclusion that the more images we generate, the better performance we get. Because too much images in low-light conditions don’t coincide the distribution of the actual traffic scenarios, which leads the trained model prefer the low-light conditions. In this experiment, is the best ratio to generate images by light conditions style transfer. In conclusion, the appropriate amounts to generate images is necessary for better performance on lane detection.

Category N=0.25 N=0.5 N=1 N=2 N=4
Normal 91.5 91.7 91.8 91.7 91.5
Crowded 71.3 72.1 71.8 71.9 71.6
Night 67.5 68.9 69.4 68.7 69.4
No Line 45.0 45.5 46.1 46.6 45.6
Shadow 71.8 72.3 76.2 63.2 69.3
Arrow 87.1 87.0 87.8 87.0 86.9
Dazzle Light 66.0 66.3 66.4 65.1 65.1
Curve 71.6 67.5 72.4 65.8 70.5
Crossroad 3300 2861 2346 2393 2620
Total 73.2 73.7 73.9 73.4 73.4
TABLE II: -measure of different N on CULane testing set. For crossroad, only FP is shown.

V Acknowledgment

This work was partly supported by National Natural Science Foundation of China (Grant No. NSFC 61473042).

Vi Conclusions

Lane detection can be a challenge in low-light conditions. In this paper, we propose a style-transfer-based data enhancement method of lane detection in low-light conditions. Our method use the proposed Better-CycleGAN to generate images in low-light conditions for improving the environmental adaptability for lane detector, which does not require additional annotations and extra inference computation. We have validated our method on CULane, which shows that our method can not only help ERFNet achieve lane detection better in low-light conditions, but also make ERFNet have a better understanding of lane markings in other challenging scenarios. Since we do not jointly train the proposed Better-CycleGAN and ERFNet, it is possible to reduce the influence of light conditions style transfer on ERFNet. We would like to explore this direction in future work.

References

  • [1] M. Bertozzi and A. Broggi, “Gold: A parallel real-time stereo vision system for generic obstacle and lane detection,” IEEE transactions on image processing, vol. 7, no. 1, pp. 62–81, 1998.
  • [2] S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-H. Lee, H. Seok Hong, S.-H. Han, and I. So Kweon, “Vpgnet: Vanishing point guided network for lane and road marking detection and recognition,” in

    The IEEE International Conference on Computer Vision (ICCV)

    , Oct 2017.
  • [3]

    X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial as deep: Spatial cnn for traffic scene understanding,” in

    Thirty-Second AAAI Conference on Artificial Intelligence

    , 2018.
  • [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
  • [5] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  • [6] H.-Y. Cheng, B.-S. Jeng, P.-T. Tseng, and K.-C. Fan, “Lane detection with moving vehicles in the traffic scenes,” IEEE Transactions on intelligent transportation systems, vol. 7, no. 4, pp. 571–582, 2006.
  • [7] M. Aly, “Real time detection of lane markers in urban streets,” in 2008 IEEE Intelligent Vehicles Symposium.   IEEE, 2008, pp. 7–12.
  • [8] H. Jung, J. Min, and J. Kim, “An efficient lane detection algorithm for lane departure detection,” in 2013 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2013, pp. 976–981.
  • [9] Z. Kim, “Robust lane detection and tracking in challenging scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 9, no. 1, pp. 16–26, 2008.
  • [10] A. Borkar, M. Hayes, and M. T. Smith, “A novel lane detection system with efficient ground truth generation,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 1, pp. 365–374, 2011.
  • [11] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool, “Towards end-to-end lane detection: an instance segmentation approach,” in 2018 IEEE intelligent vehicles symposium (IV).   IEEE, 2018, pp. 286–291.
  • [12] M. Ghafoorian, C. Nugteren, N. Baka, O. Booij, and M. Hofmann, “El-gan: Embedding loss driven generative adversarial networks for lane detection,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 0–0.
  • [13] B. De Brabandere, W. Van Gansbeke, D. Neven, M. Proesmans, and L. Van Gool, “End-to-end lane detection through differentiable least-squares fitting,” arXiv preprint arXiv:1902.00293, 2019.
  • [14] Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning lightweight lane detection cnns by self attention distillation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1013–1021.
  • [15] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
  • [16]

    P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , July 2017.
  • [17] X. Wang, A. Shrivastava, and A. Gupta, “A-fast-rcnn: Hard positive generation via adversary for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2606–2615.
  • [18] L. Liu, M. Muelly, J. Deng, T. Pfister, and L.-J. Li, “Generative modeling for small-data object detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6073–6081.
  • [19] V. F. Arruda, T. M. Paixão, R. F. Berriel, A. F. De Souza, C. Badue, N. Sebe, and T. Oliveira-Santos, “Cross-domain car detection using unsupervised image-to-image translation: From day to night,” in 2019 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2019, pp. 1–8.
  • [20] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272, 2017.