Sim-to-Real Domain Adaptation for Lane Detection and Classification in Autonomous Driving

02/15/2022
by   Chuqing Hu, et al.
University of Waterloo
0

While supervised detection and classification frameworks in autonomous driving require large labelled datasets to converge, Unsupervised Domain Adaptation (UDA) approaches, facilitated by synthetic data generated from photo-real simulated environments, are considered low-cost and less time-consuming solutions. In this paper, we propose UDA schemes using adversarial discriminative and generative methods for lane detection and classification applications in autonomous driving. We also present Simulanes dataset generator to create a synthetic dataset that is naturalistic utilizing CARLA's vast traffic scenarios and weather conditions. The proposed UDA frameworks take the synthesized dataset with labels as the source domain, whereas the target domain is the unlabelled real-world data. Using adversarial generative and feature discriminators, the learnt models are tuned to predict the lane location and class in the target domain. The proposed techniques are evaluated using both real-world and our synthetic datasets. The results manifest that the proposed methods have shown superiority over other baseline schemes in terms of detection and classification accuracy and consistency. The ablation study reveals that the size of the simulation dataset plays important roles in the classification performance of the proposed methods. Our UDA frameworks are available at https://github.com/anita-hu/sim2real-lane-detection and our dataset generator is released at https://github.com/anita-hu/simulanes

READ FULL TEXT VIEW PDF

page 1

page 3

page 5

07/08/2020

Synthetic-to-Real Domain Adaptation for Lane Detection

Accurate lane detection, a crucial enabler for autonomous driving, curre...
06/16/2022

CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains

Unsupervised Domain Adaptation demonstrates great potential to mitigate ...
10/19/2020

Continual Unsupervised Domain Adaptation with Adversarial Learning

Unsupervised Domain Adaptation (UDA) is essential for autonomous driving...
07/20/2022

GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation

3D point cloud semantic segmentation is fundamental for autonomous drivi...
07/20/2022

CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation

3D LiDAR semantic segmentation is fundamental for autonomous driving. Se...
09/30/2018

Pixel and Feature Level Based Domain Adaption for Object Detection in Autonomous Driving

Annotating large scale datasets to train modern convolutional neural net...
08/11/2022

Predicting Tornadoes days ahead with Machine Learning

Developing methods to predict disastrous natural phenomena is more impor...

I Introduction

As simulators become increasingly photorealistic and simulation platforms become highly flexible in terms of sensor configuration and environmental setting, synthetic data has growing potential in filling gaps within existing real-world datasets [2, 27]. Nevertheless, there exists a domain difference between the appearance of simulation and real-world data [28] where domain adaptation techniques can be employed to minimize this difference. While domain adaptation is an extensively researched topic in digit recognition [20, 10], object classification [29, 5] and in recent years sim-to-real object detection [11] and semantic segmentation [31, 27], few works [14, 9] explore potential utilization in lane detection for autonomous driving applications.

Lane detection is crucial in autonomous driving systems since it serves as the foundation for path planning decisions and is utilized for vehicle localization in high resolution maps. Lane detection is also widely used in Advanced Driver Assistance Systems (ADAS) such as lane keeping assist and adaptive cruise control [3]. The state-of-the-art works in lane detection [19, 1]

focus on evaluating against existing open-source datasets. However, training a robust neural network requires a large dataset with labelled data covering a wide range of scenarios and environment conditions. Gathering such dataset, however, is considered costly and time consuming. For tasks such as lane detection and classification, the availability of such datasets can be a limiting factor. Hence, Unsupervised Domain Adaptation (UDA) methods can be utilized in this simulation.

UDA techniques are designed to transfer knowledge from a labelled source domain to a similar target domain that is fully unlabelled by addressing the domain shift between the two domains [33]. Motivated by the recent work by [9, 34] in unsupervised and semi-supervised domain adaptation for learning tile-based lane representations in a top view image, our work explores domain adaptation techniques applied in end-to-end lane detection. Utilizing CARLA [6], we introduce Simulanes, a synthetic data generator for lane detection and classification, which is capable of generating photorealistic traffic scenarios under a variety of weather conditions. Built upon the Unreal Engine, Simulanes addresses the limitation in photorealism of generated synthetic data from [8] used in [9].

In [4], the domain-invariant latent space learnt in UNIT [20] through image translation and reconstruction, is exploited to transfer an end-to-end driving policy from a simulation domain to an unlabelled real-world domain. Inspired by their work and recent successes in adversarial methods for domain adaptation [34], in this paper, we focus on developing unsupervised domain adaptation techniques using adversarial generative and adversarial discriminative approaches for lane detection and classification purposes in autonomous driving. Our proposed method performs end-to-end lane detection and classification on the domain-invariant latent space in UNIT and MUNIT [13]. To further encourage the latent features to be domain-invariant, we introduce adversarial discriminators to predict the domain of the latent feature. To the best of the authors’ knowledge, this paper is the first to apply sim-to-real domain adaption for lane detection and classification tasks. In summary, this paper offers the following contributions:

  • unsupervised domain adaptation techniques using adversarial discriminative and generative methods are proposed for lane detection and classification enhancement in autonomous driving.

  • a new synthetic data generator, Simulanes, for lane detection and classification applications is implemented with CARLA taking into account the heterogeneous of traffic conditions, weather and surrounding environment.

  • the proposed detection and classification frameworks are evaluated using TuSimple [30] and our synthetic dataset, and an ablation study on the effect of simulation dataset size on model performance is carried out.

Ii Related Work

Ii-a Unsupervised Domain Adaptation

While early UDA methods match feature distributions between source and target domains, deep UDA methods focus on learning domain-invariant features [34]. These methods can be grouped into discrepancy-based, adversarial discriminative, adversarial generative, and self-supervision methods or a combination of these approaches. Discrepancy-based methods, such as [29, 5]

, introduce a loss function to minimize the discrepancy between the prediction and/or activation layers from source and target streams. Adversarial discriminative methods achieve a similar goal of minimizing domain shift using an adversarial objective

[34]. For instance, authors in [31] use a discriminator to align feature distributions, whereas Ganin et al. [7]

minimizes cross-covariance between both features and class predictions. Instead of explicitly comparing source and target representations, self-supervision-based methods add auxiliary self-supervised learning task(s) e.g. image rotation prediction in

[32]

, to help close the domain gap. Adversarial generative methods combine a generative adversarial network (GAN) with discriminator(s) at the image and/or feature level. CycleGAN

[35] is adopted in [12] for semantic segmentation adaptation by utilizing feature and image discriminators and enforcing cycle consistency. [18]

uses two feature translators with cycle consistency and conditions the adversarial networks with the cross-covariance of learned features and classifier predictions as in

[21]. While discrepancy-based and self-supervision-based methods are easier to optimize, for more challenging tasks like object detection and semantic segmentation, adversarial learning-based methods are more effective due to their strength in local feature alignment [34].

Ii-B Unsupervised Image-to-Image Translation

Unsupervised image-to-image methods such as [20, 13, 17] learn a shared latent space in order to translate an image from one domain to another. [20]

uses variational autoencoders (VAEs) to map images to the shared latent space while GANs generate corresponding images in two domains. To increase the diversity of generated images,

[13, 17] breaks down an image representation into a domain-invariant content code and domain-specific style code. [20] and [17] explore different approaches when applying their method for domain adaption. [20] employs a multi-task learning framework where the network learns image translation and classification using high level features in the discriminator. [17] takes a two-stage approach by first training an image translation network from source to target then training a classifier with translated images in the labelled source domain.

Ii-C Sim-to-Real for Lane Detection

Recent works, [14] and [9], investigate domain adaption for lane detection, leveraging synthetic data to enhance the available real-world data. [14] improves real-world performance in autonomous driving perception tasks including lane detection by mixing GAN translated simulation images with labelled real-world data during training. Garnett et al. [9] is the first to apply unsupervised and semi-supervised domain adaptation for lane detection using a two stage approach similar to [17] with CycleGAN [35]. In the second stage, [9] trains lane detection in conjunction with lane segmentation and self-supervised view orientation prediction tasks. [9] adopts a segment-based approach to lane detection which requires clustering on the model output for lane-based evaluation. By contrast, our work explores an end-to-end approach to lane detection using row anchors as in [23, 25]. [9] uses the method in [8] to generate scenes with randomized lane topology, road textures and objects within Blender as labelled training data. While this approach is capable of generating scenes with high variability in lane topology and 3D geometry, it has limited variety in the appearance of lanes and the surrounding environment. We aim to fill this gap with our simulation framework using CARLA [6] to generate photo-realistic scenes within urban, rural and highway environments.

Iii Method

The challenge of utilizing simulation data to develop a model for a task without real-world labels can be formulated as an UDA problem. Here, the source domain is a dataset generated from simulation with labels and the target domain is a real-world dataset without any labels. Naturally, the images from simulation and the images from the real-world dataset are unpaired. The goal of the learned model is to correctly predict the labels of the target domain .

Iii-a Simulanes: Simulation Data Generator

Real-world driving is heterogeneous with diverse traffic conditions, weather, and surrounding environment. Thus, the diversity in simulated scenarios is crucial for the model to adapt well in the real-world. There are many open source simulators for autonomous driving, namely CARLA and LGSVL [26] as the state-of-the-art for end to end testing with high quality simulation environments [15]. In this paper, we chose CARLA for generating our simulation dataset due to its rich content of premade maps covering urban, rural and highway scenarios, in addition to its flexible Python API.

Fig. 1: Sample images from Simulanes showing highway, rural and urban scenarios with varied weather.

Our simulation data generator, Simulanes, generates various simulation scenarios in urban, rural and highway environments with 15 lane classes and dynamic weather. Figure 1 shows samples from our synthetic dataset. Pedestrian and vehicle actors are randomly generated and placed to roam on the maps, increasing the difficulty of the dataset through occlusions. Following TuSimple [30] and CULane [23], we limit the maximum number of lanes to 4 adjacent to the vehicle and use row anchors for the labels. To generate a dataset with frames, we divide evenly across all available maps. For each map, the vehicle actor is spawned at a random location and would roam randomly. Dynamic weather is achieved through changing the sun’s position smoothly with time as a sinusoidal function and generating storms occasionally which affects the look of the environment through variables such as cloudiness, precipitation and precipitation deposits. To avoid saving multiple frames at the same location, we check that the vehicle has moved from the previous frame location and respawn the vehicle if it has stopped for too long.

Iii-B Lane Detection Model

As we apply the proposed sim-to-real algorithm for lane detection, we adopt an end-to-end approach and use Ultra-Fast-Lane-Detection (UFLD) [25] as our base network. We chose UFLD due to its light weight architecture achieving 300+ FPS with the same input resolution while having comparable performance to state-of-the-art methods. UFLD formulates the lane detection task as a row-based selecting method where each lane is represented by a series of horizontal positions at predefined rows, i.e., row anchors. For each row anchor, the position is divided into gridding cells. For the -th lane and

-th row anchor, the prediction of position becomes a classification problem where the model outputs the probability,

, of selecting gridding cells. The extra dimension in the output indicates the absence of lane. The lane location loss is given by:

(1)

where is the maximum number of lanes, is the number of row anchors, and is the one-hot label of the correct position of the -th lane at the -th row anchor. To ensure that the predicted lanes are continuous, similarity loss

is added to constrain the distribution of classification vectors over adjacent row anchors. This is done by calculating the L1 norm as follows:

(2)

An auxiliary segmentation branch is proposed in [25] to model local features by aggregating features at multi-scales. Following UFLD, cross entropy loss is used for the segmentation loss . For lane classification, a small branch with fully connected (FC) layers is added which receives the same features as the FC layers for lane location prediction. Lane classification loss is also using cross entropy loss.

The overall supervised lane detection and classification task loss is formulated as:

(3)

where , and are loss coefficients.

Fig. 2: Proposed adversarial generative (A) and adversarial discriminative methods (B). Both UNIT and MUNIT are represented in (A) with generator inputs shown for image translation. MUNIT’s additional style input is shown in blue dotted lines. For simplicity, MUNIT’s style encoder output is omitted as it is not used in image translation.

Iii-C Adversarial Generative and Descriminative Models

To mitigate domain shift in the UDA setting, we employ an adversarial generative approach using UNIT and MUNIT, and an adversarial discriminative approach using a feature discriminator. Our proposed architectures are presented in Figure 2.

Adversarial generative

The UNIT framework utilizes an encoder-generator pair for each image domain where . The encoder is a VAE that maps an image to a latent space . The generator then reconstructs the input image from a random-perturbed version of the latent code . Image translation or reconstruction is implemented as follows:

(4)
(5)

where is a random noise sampled from and is only added during training. Image translation is where and are from different domains whereas image reconstruction is when they are from the same domain.

MUNIT is similar to UNIT where an encoder-generator pair is used for each domain. However, the encoder consists of a context encoder and a style encoder. The two encoders map the image to a context code and a style code . The generator

reconstructs input images using the context code and style code. Image translation is done using the context code and a random style code from a Gaussian distribution

.

To ensure cyclic consistency, cyclic reconstruction is used where the input is translated to the other domain and then back. Image reconstruction and cyclic reconstruction are learned via and which calculates the L1 loss i.e. mean absolute error between the reconstructed image and the original image.

The encoder-generator pairs are trained along with two domain adversarial discriminators and . The discriminator learns to predict true for real images in the domain and false for the generated images by . The image translation task optimizes the encoder-generator and discriminator using the Least-Squares Generative Adversarial Network (LSGAN) objective from [22]. Here the generator learns to generate images that resemble real images from domain to trick the corresponding discriminator . This is formulated as:

(6)
(7)

To ensure the translated image contains similar semantics as the original, we compare their features using a pretrained VGG16 network. The original and translated images are passed through the VGG network. The features from the last convolution layer are normalized with Instance Norm and find perceptual loss which computes the feature difference via Mean Squared Error (MSE) loss.

The lane detection task is learned with domain-invariant latent features

as input. The detector is a Convolutional Neural Network (CNN) that is trained in a supervised manner with

and using labelled simulation data. is the cyclic task loss where is first translated to the real domain then encoded using and passed to the detector. For the UNIT framework, the total loss for the encoder-generator pairs and the lane detector is the sum of the following losses weighted by :

(8)

Different from UNIT, MUNIT adds context code and style code reconstruction losses on top of defined for UNIT,

(9)

where and are used to control the weight of these loss terms. See MUNIT [13] for details of these losses.

Adversarial discriminative

We chose to implement a feature discriminator following ADA [31]. The feature discriminator aligns the marginal feature distributions of source and target domains hence increasing the performance of the detector with decision boundaries optimised on the source domain. The discriminator is optimized jointly with the encoder through adversarial losses :

(10)
(11)

Here the encoder is trained to maximize the domain confusion of the discriminator. In our experiments, we found that adding a feature discriminator also improves the performance of the adversarial generative method. Thus, the encoder-generator loss defined in equation 8 is modified to:

(12)

Iv Experiment

In this section, we evaluate our method on TuSimple, a widely used real-world lane detection dataset with classification labels of the training and validation set provided by [24]. As described in Section 3, the evaluated methods do not use labels in the real-world dataset during training, only real-world images and labelled simulation data generated with Simulanes

. By default, the number of images in the simulation dataset match the number of training images in the real-world dataset. In our ablation studies, we further explore the effect of simulation dataset size on performance. All experiments were run 3 times with different random seeds and the mean and standard deviation are reported.

Iv-a Experimental Setup

Dataset

TuSimple has 6,408 frames with 1280720 resolution which is split into 3,268 training, 358 validation, and 2,782 test images. TuSimple contains daytime highway scenarios in fair weather with varied traffic conditions.

Evaluation metrics

For TuSimple, the official evaluation metric is accuracy, described by:

(13)

where is the number of correctly predicted points in each frame of the clip, is the number of ground truth points in each frame of the clip. A correctly predicted point is within a width threshold from the ground-truth.

For classification, we follow the two classes used in [24]: dashed and continuous. Since TuSimple and CARLA had different lane classes available, we mapped each separately into the two classes. For TuSimple, we used the same mapping as [24] and for the additional CARLA classes, we mapped solid dashed lane combinations as continuous as well.

Benchmarks

Our baseline model is the UFLD model with a classification branch as described in Section III.B. The most direct method is to train the baseline model on the synthetic dataset without any domain adaptation methods. Another baseline is to take a two-stage approach where UNIT or MUNIT is first trained to translate images from simulation to real-world, then the baseline model is trained on translated simulation dataset images. As an upper bound, we compare to the baseline model trained on the TuSimple dataset with labels.

Implementation details

For the lane model, we use the same gridding cell and row anchor configuration as [25] and the input image is resized to 288800. During training, data augmentation consisting of rotation, vertical and horizontal shift is added. The networks are optimized using ADAM [16]

with a cosine decay learning rate schedule for a maximum of 100 epochs. The lane detector and discriminators use an initial learning rate of 0.0004 while the encoder-generator pairs use 0.0001. For the generative approach, we found that having a linear learning rate warmup from 0 to 0.0004 in 25 epochs was optimal and use

, , for the encoder-generator loss weights. For the lane detector, we use and . Lastly, for MUNIT . The validation set lane detection accuracy is used to select the best model for testing.

Iv-B Results

Table I

summarizes our results on TuSimple. The detection accuracy was obtained using the test set, while the classification accuracy was only generated on the validation set as the test set does not have classification labels. Compared to detection accuracy, the higher variance in classification scores could be due to the small validation set used to generate these results. During training, the models were saved based on the highest lane detection accuracy on the validation set, without considering the classification accuracy. Two-stage models consistently underperform other methods, by 4 to 6 percentage points in detection accuracy. Classification accuracy suffers more, with both two-stage models averaging around 18% worse. This poor performance is likely due to the translator not preserving the small lane features in the content of the image during the first training stage as the lane task losses are not included. Qualitatively, the translated images show very little contrast between road and lane marking. The lanes are difficult to identify in the translated image, meaning that the translation step makes the detection task more difficult for the downstream network.

MUNIT consistently outperforms UNIT in detection and classification in the proposed method and in image translation. This can be a result of separating the context and style codes in the latent space allowing MUNIT to have a slightly larger feature space to store context information for the detector. Our adversarial discriminative method (ADA) outperforms the adversarial generative methods. This may be due to the small size of TuSimple, making it more difficult for generative methods to converge. Thus, ADA only optimizing for feature similarity via the discriminator was an easier task. Lastly, we observe that the proposed methods perform similarly in detection but outperform the baseline trained on simulation data by a margin in classification. This suggests that our simulation dataset is close to TuSimple in terms of lane location but more different in lane appearance, creating a bigger domain gap for classification. This is further demonstrated in Figure 3.

Model Det-Acc Cls-Acc
mean stddev mean stddev
Baseline trained on sim 82.60 1.177 42.25 6.358
S2R translation (UNIT) 77.55 4.485 34.19 0.772
S2R translation (MUNIT) 78.61 1.371 38.57 7.836
Adv. discriminative 82.90 0.069 55.31 3.829
MUNIT + adv. dis. 82.28 1.605 53.70 1.228
UNIT + adv. dis. 81.62 0.648 55.77 3.847
Baseline trained on real 95.38 0.130 37.08 0.185
TABLE I: Comparison of our proposed methods and baselines on TuSimple lane detection and classification.
Fig. 3: Sample images from TuSimple test set. Baseline predictions on the left column and ADA on the right. The lane classes are labelled using red and green for solid and dashed respectively.

Iv-C Ablation Study

For our ablation study, we examine the effect of the simulation dataset size on the test set performance. As generating more images for the simulation dataset is more feasible compared to collecting and labelling real-world images, seeing an increase in test set performance by simply increasing the simulation dataset size would be beneficial. Since the adversarial discriminative approach was found to be our best performing method, we ran this study with ADA and the baseline that is trained only using labelled simulation data. For the study, we generated a simulation dataset 5x the size of the TuSimple training set, resulting in a dataset of 16344 images. We then trained the baseline and ADA on 5 subsets of this dataset, each with the number of images being an integer multiple of the TuSimple training set size, ranging from 1x to 5x. Considering ADA needs a real image and a simulated image for each training step, we trained using a 1x number of simulation images at each epoch but re-sampled between epochs for both models. The results for the detection and classification accuracy are shown in Figures 4 and 5, respectively. The dark line represents the mean and the shaded area shows the standard deviation from the mean in both directions. We observe a general increase in both detection and classification accuracy as the simulation dataset size increases.

Fig. 4: Test detection accuracy with increasing simulation dataset size.
Fig. 5: Validation classification accuracy with increasing simulation dataset size.

V Conclusions

In this paper, we propose unsupervised domain adaptation techniques using adversarial generative and feature discriminate approaches for lane detection and classification applications in autonomous driving. To facilitate UDA, we introduced Simulanes, a simulation data generator for lane detection and classification using CARLA. Using our synthetic dataset and TuSimple, we evaluated our proposed method against benchmarks in unsupervised sim-to-real domain adaptation. We observed that with a fixed number of real-world images, we can improve detection and classification performance by increasing the number of synthetic images available to the model. The Simulanes dataset generation tool can be used in future methods for evaluating works in sim-to-real domain adaptation and producing lane detection models without the use of real-world labelled data. In future work, we aim to extend the learnt UDA-based models to more complicated scenarios, e.g. night images, presented in [23].

Acknowledgment

This work was supported by NSERC CRD 537104-18, in partnership with General Motors Canada and the SAE AutoDrive Challenge.

References

  • [1] H. Abualsaud, S. Liu, D. B. Lu, K. Situ, A. Rangesh, and M. M. Trivedi (2021) LaneAF: robust multi-lane detection with affinity fields. IEEE Robotics and Automation Letters 6 (4), pp. 7477–7484. External Links: Document Cited by: §I.
  • [2] N. Alshammari, S. Akcay, and T. P. Breckon (2021)

    Competitive simplicity for multi-task learning for real-time foggy scene understanding via domain adaptation

    .
    In 2021 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 1413–1420. External Links: Document Cited by: §I.
  • [3] A. Bar Hillel, R. Lerner, D. Levi, and G. Raz (2014-04-01) Recent progress in road and lane detection: a survey. Machine Vision and Applications 25 (3), pp. 727–745. External Links: ISSN 1432-1769, Document Cited by: §I.
  • [4] A. Bewley, J. Rigley, Y. Liu, J. Hawke, R. Shen, V. Lam, and A. Kendall (2018) Learning to drive from simulation without real world labels. CoRR abs/1812.03823. External Links: 1812.03823 Cited by: §I.
  • [5] C. Chen, Z. Fu, Z. Chen, S. Jin, Z. Cheng, X. Jin, and X. Hua (2020-Apr.)

    HoMM: higher-order moment matching for unsupervised domain adaptation

    .

    Proceedings of the AAAI Conference on Artificial Intelligence

    34 (04), pp. 3422–3429.
    External Links: Document Cited by: §I, §II-A.
  • [6] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun (2017-13–15 Nov) CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning,

    Proceedings of Machine Learning Research

    , Vol. 78, pp. 1–16.
    Cited by: §I, §II-C.
  • [7] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky (2016) Domain-adversarial training of neural networks. Journal of Machine Learning Research 17 (59), pp. 1–35. Cited by: §II-A.
  • [8] N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi (2019-10) 3D-lanenet: end-to-end 3d multiple lane detection. In

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    ,
    Cited by: §I, §II-C.
  • [9] N. Garnett, R. Uziel, N. Efrat, and D. Levi (2021) Synthetic-to-real domain adaptation for lane detection. In Computer Vision – ACCV 2020, Cham, pp. 52–67. External Links: ISBN 978-3-030-69544-6 Cited by: §I, §I, §II-C.
  • [10] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li (2016) Deep reconstruction-classification networks for unsupervised domain adaptation. In Computer Vision – ECCV 2016, Cham, pp. 597–613. Cited by: §I.
  • [11] D. Ho, K. Rao, Z. Xu, E. Jang, M. Khansari, and Y. Bai (2021) RetinaGAN: an object-aware approach to sim-to-real transfer. In 2021 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 10920–10926. External Links: Document Cited by: §I.
  • [12] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell (2018-10–15 Jul) CyCADA: cycle-consistent adversarial domain adaptation. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, pp. 1989–1998. Cited by: §II-A.
  • [13] X. Huang, M. Liu, S. Belongie, and J. Kautz (2018)

    Multimodal unsupervised image-to-image translation

    .
    In Computer Vision – ECCV 2018, Cham, pp. 179–196. External Links: ISBN 978-3-030-01219-9 Cited by: §I, §II-B, §III-C.
  • [14] N. Jaipuria, X. Zhang, R. Bhasin, M. Arafa, P. Chakravarty, S. Shrivastava, S. Manglani, and V. N. Murali (2020-06) Deflating dataset bias using synthetic data augmentation. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    ,
    Cited by: §I, §II-C.
  • [15] P. Kaur, S. Taghavi, Z. Tian, and W. Shi (2021-04) A survey on simulators for testing self-driving cars. In 2021 Fourth International Conference on Connected and Autonomous Driving (MetroCAD), Vol. , Los Alamitos, CA, USA, pp. 62–70. External Links: ISSN , Document Cited by: §III-A.
  • [16] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), Cited by: §IV-A.
  • [17] H. Lee, H. Tseng, J. Huang, M. Singh, and M. Yang (2018-09) Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §II-B, §II-C.
  • [18] J. Li, E. Chen, Z. Ding, L. Zhu, K. Lu, and Z. Huang (2019) Cycle-consistent conditional adversarial transfer networks. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, New York, NY, USA, pp. 747–755. External Links: ISBN 9781450368896, Document Cited by: §II-A.
  • [19] L. Liu, X. Chen, S. Zhu, and P. Tan (2021-10) CondLaneNet: a top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3773–3782. Cited by: §I.
  • [20] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, Vol. 30, pp. . Cited by: §I, §I, §II-B.
  • [21] M. Long, Z. CAO, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems, Vol. 31, pp. . Cited by: §II-A.
  • [22] X. Mao, Q. Li, H. Xie, R. Y.K. Lau, Z. Wang, and S. Paul Smolley (2017-10) Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Cited by: §III-C.
  • [23] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang (2018) Spatial as deep: spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7276–7283. Cited by: §II-C, §III-A, §V.
  • [24] F. Pizzati, M. Allodi, A. Barrera, and F. García (2020) Lane detection and classification using cascaded cnns. In Computer Aided Systems Theory – EUROCAST 2019, R. Moreno-Díaz, F. Pichler, and A. Quesada-Arencibia (Eds.), Cham, pp. 95–103. External Links: ISBN 978-3-030-45096-0 Cited by: §IV-A, §IV.
  • [25] Z. Qin, H. Wang, and X. Li (2020) Ultra fast structure-aware deep lane detection. In Computer Vision – ECCV 2020, Cham, pp. 276–291. External Links: ISBN 978-3-030-58586-0 Cited by: §II-C, §III-B, §III-B, §IV-A.
  • [26] G. Rong, B. H. Shin, H. Tabatabaee, Q. Lu, S. Lemke, M. Možeiko, E. Boise, G. Uhm, M. Gerow, S. Mehta, E. Agafonov, T. H. Kim, E. Sterner, K. Ushiroda, M. Reyes, D. Zelenkovsky, and S. Kim (2020) LGSVL simulator: a high fidelity simulator for autonomous driving. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Vol. , pp. 1–6. External Links: Document Cited by: §III-A.
  • [27] A. Savkin, M. Kasperek, and F. Tombari (2019) Sampling/importance resampling for semantically consistent synthetic to real image domain adaptation in urban traffic scenes. In 2019 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 1061–1068. External Links: Document Cited by: §I.
  • [28] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb (2017) Learning from simulated and unsupervised images through adversarial training. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 2242–2251. External Links: Document Cited by: §I.
  • [29] B. Sun and K. Saenko (2016) Deep CORAL: correlation alignment for deep domain adaptation. CoRR abs/1607.01719. External Links: 1607.01719 Cited by: §I, §II-A.
  • [30] TuSimple lane detection challenge. Note: https://github.com/TuSimple/tusimple-benchmarkAccessed Jun. 2021 Cited by: 3rd item, §III-A.
  • [31] M. Wulfmeier, A. Bewley, and I. Posner (2017) Addressing appearance change in outdoor robotics with adversarial domain adaptation. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 1551–1558. External Links: Document Cited by: §I, §II-A, §III-C.
  • [32] J. Xu, L. Xiao, and A. M. López (2019) Self-supervised domain adaptation for computer vision tasks. IEEE Access 7 (), pp. 156694–156706. External Links: Document Cited by: §II-A.
  • [33] K. You, X. Wang, M. Long, and M. Jordan (2019-09–15 Jun) Towards accurate model selection in deep unsupervised domain adaptation. In Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, pp. 7124–7133. Cited by: §I.
  • [34] S. Zhao, X. Yue, S. Zhang, B. Li, H. Zhao, B. Wu, R. Krishna, J. E. Gonzalez, A. L. Sangiovanni-Vincentelli, S. A. Seshia, and K. Keutzer (2020) A review of single-source deep unsupervised visual domain adaptation. CoRR abs/2009.00155. External Links: 2009.00155 Cited by: §I, §I, §II-A.
  • [35] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017-10) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Cited by: §II-A, §II-C.