DeepAI
Log In Sign Up

A System-driven Automatic Ground Truth Generation Method for DL Inner-City Driving Corridor Detectors

07/20/2022
by   Jona Ruthardt, et al.
Bosch
0

Data-driven perception approaches are well-established in automated driving systems. In many fields even super-human performance is reached. Unlike prediction and planning approaches, mainly supervised learning algorithms are used for the perception domain. Therefore, a major remaining challenge is the efficient generation of ground truth data. As perception modules are positioned close to the sensor, they typically run on raw sensor data of high bandwidth. Due to that, the generation of ground truth labels typically causes a significant manual effort, which leads to high costs for the labelling itself and the necessary quality control. In this contribution, we propose an automatic labeling approach for semantic segmentation of the drivable ego corridor that reduces the manual effort by a factor of 150 and more. The proposed holistic approach could be used in an automated data loop, allowing a continuous improvement of the depending perception modules.

READ FULL TEXT VIEW PDF

page 2

page 4

page 5

page 6

page 7

page 8

05/14/2021

Towards Sensor Data Abstraction of Autonomous Vehicle Perception Systems

Full-stack autonomous driving perception modules usually consist of data...
07/05/2019

A Novel Deep Learning Pipeline for Retinal Vessel Detection in Fluorescein Angiography

While recent advances in deep learning have significantly advanced the s...
05/27/2019

Automated Ground Truth Estimation of Vulnerable Road Users in Automotive Radar Data Using GNSS

Annotating automotive radar data is a difficult task. This article prese...
04/08/2021

A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control

In this paper, we present a state-of-the-art reinforcement learning meth...
07/21/2017

Learning Aerial Image Segmentation from Online Maps

This study deals with semantic segmentation of high-resolution (aerial) ...
08/13/2020

Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction

We present a novel method for testing the safety of self-driving vehicle...
11/19/2020

ReAssert: Deep Learning for Assert Generation

The automated generation of test code can reduce the time and effort req...

I Introduction

Lane detection is an essential part of the perception sub-architecture of any automated driving (AD) or advanced driver assistance system (ADAS). In highway scenarios, reasoning and vehicle control strongly rely on information provided by high-precision (HD) maps and lane markings. Here, HD maps are used as the primary input for lateral control as well as for the selection of relevant objects via map-based lane association. More recently and with a focus on inner-city scenarios, data-driven approaches have been proposed that detect the freespace (drivable area in a physical sense). The focus of AD approaches in inner city scenarios is less on lane-based driving due to the fact that the lane concept does not fully apply in unmarked, residential urban environments (e.g., a narrow inner-city scenario with rows of parking and oncoming vehicles). In unmarked inner-city environments, human-steered vehicles tend to drive according to the available freespace and not according to a fuzzy lane definition. Hence, representing the unstructured inner-city world by pre-defined HD maps and realizing reasoning on that does not work, as other traffic participants do not behave according to that static representation.

Therefore, in a recent contribution [1]

, we proposed to rely on a data-driven approach for generating the drivable space that extends the state-of-the-art by adding more semantics to the typically used freespace. More specifically, we proposed the concept of an ”AI ego corridor” that classifies the corridor the ego vehicle is allowed to use. Instead of a static map, this is the basis for lateral control and for determining behavior-relevant objects, especially in challenging inner-city scenarios.

As for all image-based semantic segmentation approaches, we also face the challenge of creating sufficient labelled ground truth (GT) data in terms of quantity as well as quality. In this contribution, we present a pipeline for automated GT generation that is closely coupled to a L3 AD stack running on an AD vehicle featuring a high-performance multi-modal sensor set (see Figure 1).

The proposed system-driven solution allows for an automatic ground truth generation even in unmarked, complex inner-city scenarios. The reduction in labeling effort is very significant. Given our experiments, a reduction factor in manual labeling time of more than 150 was achieved.

The remaining paper is structured as follows: Section II discusses related contributions and derives the still unresolved research questions. Section III gives a detailed description of our system-driven approach for automated ground truth generation. In Section IV, we apply the approach on a semantic segmentation algorithm for detecting the drivable ego-corridor and evaluate its impact on the manual labeling effort as well as the labeling noise.

Fig. 1: Used AD stack (all modules marked with * contribute data to the automatic GT generation pipeline)

Ii Related Work

Detecting lanes and thus determining the trajectory to follow is an essential part of AD and has been researched for decades. Consequently, a lot of different approaches have been proposed to tackle this problem [2]. The technologies and concepts thereto deployed vary substantially, however [3].

Whereas feature-based (e.g., [4]) or model-based (e.g., [5]

) methods rely on characteristics within the available input data that can be extracted and evaluated in a rule-based or heuristic framework, data-driven machine learning (ML) approaches require the training of a model using supervisory signals. The drawback of requiring large volumes of typically annotated training and evaluation data for such ML models is often offset by their substantial performance advantage compared to classical approaches

[6]. As a result, the development of data-driven models currently receives much attention from the research community and is a trend that is likely to continue in the next years [7].

Owing to the emerging interest in ML-based lane detection approaches and their demand for large volumes of GT data, a variety of public datasets were created (e.g., [8, 9, 10, 11]). However, their distribution and complexity of included traffic scenarios, as well as their types of annotations and the number of contained frames, vary substantially [12]. Moreover, the utilized (deep) learning techniques also differ in the type of lane representation they output and subsequently the GT labels they require [13]. Taken together, this often results in a shortage of GT data with suitable characteristics, high-quality annotations, and coverage of a broad spectrum of driving situations, including complex and rare cases, thus constituting one central limitation of current ML approaches. Especially in the case of less common lane representations, as is the case in our approach, obtaining sufficiently large datasets to train robust and accurate models often involves the cumbersome process of manually creating the desired GT annotations.

The problem of having limited or too little GT data for a specific lane detection use case can be approached in different ways.

In case the GT data is sparse in a particular environment but readily available in another, transfer learning can be utilized by fine-tuning a model that was first trained on the more extensive source domain with data from the ultimate target domain. In doing so, the model can generalize from the insights learned in the first step and is less likely to overfit the task-specific training data. Kim et al. [14]

, for example, have employed transfer learning by fine-tuning a general semantic image segmentation model to only extract and segment the left and right lanes in given images.

Another option is to combine the transfer learning methodology with computer-simulated traffic scenarios. By these means, it is possible to train models with data generated in almost arbitrary numbers in simulations where the environment can be exactly observed and altered as desired, and to optionally only fine-tune on task-specific real-world GT data. In lane detection applications, this approach has already proved to be effective and very time efficient (e.g., [15, 16]). Even with increasingly photo-realistic simulations techniques (e.g., [17]) and approaches that try to close the simulation-reality-gap (e.g., [18]), there are still persisting distributional differences between the data generated in simulation and real-world data, however, making it more difficult for models to adapt to the eventual application domain [19].

Over the last years, there has been an increase in publications proposing novel approaches for (automatically) creating large quantities of real-world lane detection and segmentation GT data while requiring no or only little human intervention. Subsequently, larger volumes of training and evaluation data are available that also share the exact properties of the eventual application domain.

Methods such as [20] or [21]

reduce the ratio of lane markings that have to be labeled to accurately approximate the lane course. This is achieved by creating time-sliced images through composing horizontal image slices of a video sequence which are then manually annotated before being deconstructed into the original input frames again while inferring the lane course from the interpolated labeled marker positions. While these approaches are more efficient and can achieve a good GT quality when compared to fully manually annotated GT data

[21], they still require manual labeling efforts.

More sophisticated approaches can forgo manual intervention in the GT creation process altogether by utilizing a-priori knowledge (e.g., maps) or unsupervised methods that solely rely on characteristics of the input data to extract lane properties and augment the GT representations.

Properties of roads and lanes, such as the course and position, are stored in most maps, making them a prime source of a-priori knowledge. This is reflected in the literature, where many automatic GT generation frameworks primarily rely on information extracted from maps. The level of detail and type of maps utilized varies broadly, however. HD maps of static objects as they are used in [22] and [23] can be created during good environmental conditions and constitute a detailed and accurate representation of specific environmental entities. This high definition comes at the expense of their creation and maintenance being time- and resource-intensive and them not being broadly available. Therefore, others choose to use less detailed but openly available maps for their systems. Kasmi et al. [24], for instance, extract the road segments from OpenStreetMap [25] and detect lanes more accurately by matching the hereby presumed course of the road with LiDAR point clouds. While the objective of this work was not directly to create GT data but to detect lanes, the approach could be used with little modifications for automatic GT generation. Only the road skeletons obtained from geographic information system (GIS) maps were used in [26] and augmented with additional assumptions before ultimately being projected as a road surface segmentation mask into the image plane. By adjusting projection parameters in accordance with additional road clues (e.g., vanishing points, surface color and structure, horizon line, etc.) in an unsupervised manner to improve the fit of the map-based segmentation mask and the corresponding camera image, this approach can generate GT data on unmarked roads as well. Due to their reliance on maps that only include the position of lane markers ([22, 23]) or on the reflection intensity of LiDAR-based approaches ([24]), other methods can only be deployed on roads with present lane markers. This significantly limits their applicability, especially in urban scenarios where unmarked roads are common.

Another aspect that makes GT generation in urban scenarios more challenging is the increased complexity of the environment and encounterable scenarios. Steep de- or inclines, obstructions through dynamic or static objects and dynamic vehicle movements all impact the desired GT annotation. While some methods implicitly [26] or explicitly [23] consider dynamic vehicle position and orientation to ensure accurate annotations, there is no holistic system-driven approach that considers all of the aforementioned scenarios in the GT generation process by utilizing additional sensors and sources of information about the environment.

Most publications focus on the detection and segmentation of individual lane markers ([22, 23]) or on inferring the course of the road from them ([24, 21]) rather than constructing an ego-lane corridor that is bound by lane markings, road curbs, and road users or obstacles in front. Although it is possible in many cases to create specific GT types by modifying the given representation (e.g., lane markers can be converted to lane spline from which the preceding lane corridor can be constructed), the absence of explicit object and obstacle detection algorithms in these frameworks prevents the GT data to be used for training models utilizable for longitudinal control. With other GT representations such as in [26] or [22], not even lateral control behaviors can be inferred as either the entire road surface is segmented instead of specific lanes or no semantic information about the identified lane markers is stored, respectively. To the best of our knowledge, there does not exist an automatic GT generation approach with which the specific kind of data that is required for the AI ego corridor model presented in [1] can be created.

Summarizing the known literature in the area of automatic GT generation in the lane detection domain, Table I shows that our work contributes the following new aspects compared to others:

  • Support of extended semantics beyond a drivable freespace: Labels for the ”AI ego corridor” that enable lateral and longitudinal control,

  • Explicit consideration of static and dynamic objects and occlusions that restrict the drivable corridor,

  • Road type independent and compatible with marked and unmarked roads,

  • Support of complex inner-city scenarios,

  • Close coupling to a full-fledged L3 AD stack.

Major advantages are:

  • Holistic GT generation approach that supports all possible camera types given their intrinsics and extrinsics,

  • Significant reduction of manual labeling effort by factor of >150.

Method Year Ref. Label Types Ego Corridor GT Representation Consideration of Objects and Occlusions Compatible to Unmarked Roads + Inner-city scenes Used Sensors Modalities
Álvarez et al. 2014 [26] binary, pixel-wise road segmentation - implicitly X vision, GPS, GIS map
Behrendt et Witt 2017 [22] binary, pixel-wise segmentation of lane markings - - - vision, GPS, HD map
Behrendt et Soussan 2019 [23] pixel-wise segmentation of individual lane markings and their semantic association - - - vision, LiDAR, radar, odometry, GPS, HD map
Kasmi et al. 2020 [24] polynomial ego-lane representation - - - vision, LiDAR, odometry, GPS OpenStreetMap
Our approach binary pixel-wise segmentation of drivable ego-corridor X X X vision, LiDAR, odometry, GPS, HD map
TABLE I: Selected state-of-the-art approaches for map-based automatic ground truth generation in the lane detection domain.

Iii Method

The following section describes the proposed automatic GT generation method in detail, especially elaborating on which individual processing steps are performed to obtain an increasingly detailed and accurate GT representation. Each of those steps augments and enhances the desired GT instances with more information on the vehicle’s current state and its surroundings to best capture the complexity and diversity of real-world traffic scenarios. These steps, together with the sequence in which they are executed and their respective information sources, are depicted in Figure 2. Owing to its tight integration with the AD system as introduced in Figure 1, this processing pipeline can be flexibly adjusted to generate alternative GT representations, to work with different sensor combinations, or to utilize additional information resources for specific scenarios.

Fig. 2: Data flow and processing sequence for automatic GT generation

The following sub-sections discuss the individual processing steps in more detail.

Iii-a Border Coordinate Acquisition

As the generated GT annotations are supposed to classify and indicate the areas in a camera image that belong to the current lane, the availability of accurate and suitable data on the positioning of the lane within the surrounding environment of the vehicle is the most essential part of this endeavor. Specifically, the course of the left and right lane boundaries, subsequently also referred to as lines, are required to construct the corresponding corridor polygon.

Utilizing HD maps from which the lines can be extracted has several advantages. As the course and position of the road can be inferred directly from the map, the range in which this approach can operate is practically unlimited. Additionally, even in challenging environmental situations or with objects obstructing parts of the lane, highly accurate and detailed line information are still available. This HD map-based approach is only limited by the precision of the ego-localization module that determines the vehicle’s current position within the environment and map. Through the alignment of map-based and online lane detections, for example, localization inaccuracies and errors can be significantly mitigated as illustrated in Figure 3.

Fig. 3: Effects of shifting original map-based corridor coordinates (green) according to online lane detection system to obtain better fit (blue). The error was induced by a local ego-localization inaccuracy.

Iii-B Consideration of Traffic Participants

In order to accordingly adjust the corridor such that it does not interfere with and overlap any traffic participant occupying parts of the own lane, information on traffic participants determined by the object fusion module of the vehicle system is used. Here, data and predictions from video, radar and LiDAR object detection algorithms are combined and tracked, leading to highly accurate and dependable object classification and property assessment. Besides the position of an individual object with respect to the ego-vehicle, the relative velocity and the direction of travel are also ascertained. The classes of objects and traffic participants that are distinguished and detected by the module include motorized vehicles like cars, trucks and motorcycles, non-motorized road users in the form of pedestrians and cyclists, as well as stationary objects like cones and bollards.

While there might be many traffic participants and objects in the surrounding environment of the ego-vehicle, most of them are irrelevant and do not need to be considered for the GT generation process. Only objects adjoining and intersecting with the corridor are of importance. For traffic participants that meet these requirements, two different scenarios are distinguished (see Figure 4): traffic that uses the same stretch of road and follows the same general route as the ego-vehicle (object A) and cross-traffic that intersects or crosses with the ego-lane (object B). The distinction between these scenarios is necessary as the corridor has to be cut off differently depending on how the objects intersect with the corridor such that the side closest to the corridor is taken as the cut-off line.

Ego-vehicle

Object A

Object B

Cutoff line A

Cutoff line B
Fig. 4: Corridor cut-off scenarios for objects and traffic participants

For traffic driving ahead, the own corridor needs to be cut off accordingly such that it ends right before the preceding traffic participant. For this purpose, the corridor is split along a line angled perpendicular to the object’s direction of travel and originating from the rearmost point of the object. By taking the resulting part of the split operation closest to the ego-vehicle, the new corridor instance can be obtained. In case of the cross-traffic scenario, the corridor is modified similarly, but the cut-off line originates not from the rearmost point but from a point of the bounding box’s side facing the corridor instead and shares the same orientation as the object. Figures (a)a and (b)b each illustrate the original corridor (red), the position of objects relative to the ego-vehicle and the accordingly cut-off new corridor instance (blue) for both of the introduced scenarios respectively.

(a) Traffic within own lane
(b) Traffic crossing own lane
Fig. 5: Modifications to the original corridor in case of traffic participants

Iii-C Masking of Occluded Areas

While the corridor is already cut off in case of any present traffic participants within the own lane at this stage, other obstacles that possibly obstruct the camera’s view are not considered yet. Especially in urban driving scenarios, however, obstacles like buildings, parked cars or vegetation can cause the direct line of sight from the camera to the corridor to be blocked. Therefore, it is especially important for these areas to be identified and removed from the corridor representation accordingly.

Contrary to the dynamic object’s removal workflow, here, areas not visible to the camera are only stamped out instead of cutting off the corridor completely. In order to obtain information about these areas, a visibility grid constructed on the basis of LiDAR-sensor data is employed. This grid is constructed by subdividing the world around the vehicle into distinct cells and computing the cost of traversing the specific cells to obtain if an obstacle is present [27]. This three-dimensional representation, the so-called costmap, of the surrounding environment is then projected into a two-dimensional binary map that either denotes the existence of an obstruction or the possibility of unobstructed view and movement. By assuming the areas behind obstacles to be occluded and not visible in camera images, the intended visibility grid can be obtained. It is now possible to determine which parts of the drivable corridor are obstructed and which ones are not. When applying this processing procedure to an urban scenario, parts of the initially projected drivable corridor corresponding to the obstructed areas are removed as depicted in Figure 6.

(a) Visibility grid not considered
(b) Visibility grid considered
Fig. 6: Effects of modifying the corridor according to the LiDAR-based visibility grid

Iii-D Consideration of Landscape Profile

Contrary to the flat world assumption in which the environment is regarded as an even two-dimensional surface with no differences in height throughout the plane shared among the previous processing steps, the landscape, in reality, is shaped by — sometimes significant — variations in altitude and slope. Therefore, it is essential to compensate and consider the effects of changes in the landscape profile, if an accurate projection is desired.

As the utilized HD map does not incorporate precise altitude data, it is necessary to construct a height map along the driven route specifically. By recording the altitude measurements of the localization system and combining them with the associated measured position of the vehicle, such a height reading can be mapped to a unique and unambiguous location. Different from online application of the AD stack, the GT generation can realize an ex-post analysis, running the AD systema recorded datasets and gather the altitude and corresponding positional data. Subsequently, these measurements can then be stitched together into a map of the desired format which can be acted upon in future GT generation runs to approximate the altitude of arbitrary points along the route. The originally two-dimensional corridor representation can thus be iteratively transformed to also incorporate elevation data and constitute a three-dimensional representation of the drivable area. Thereby, it is eventually possible to substitute a projection like in Figure (a)a with a significantly more accurate annotation as Figure (b)b demonstrates.

(a) Height map not utilized
(b) Height map utilized
Fig. 7: Effects of height consideration on corridor projection in urban scenario with notable changes in elevation.jpg

Iii-E Corridor Projection

While the corridor has been available and modified in its 3D representation relative to a vehicle reference point so far, it is ultimately necessary to project the relevant areas onto the actual image. For this translation, a pinhole camera model is employed. Given calibrated intrinsic and extrinsic camera parameters, this model allows a world position to be translated into the corresponding pixel coordinates of any camera system mounted on the vehicle. Doing so for all positions spanning the drivable corridor eventually yields corresponding drivable areas within the image plane. After these steps, projections like already depicted in Figures 6 or 7 with the drivable areas clearly and accurately designated can be obtained.

Iii-F Dynamic Movement Correction

As a dynamic system that is affected by various external forces and influences, the ego-vehicle’s positional state and, as a result, the camera’s position and orientation are constantly changing. This might be due to inclines and slopes of the road, potholes and other surface irregularities, or varying pitch and roll angles through changes in inertia and centrifugal forces. Such variations need to be considered and compensated for the projected corridor to match the corresponding and appropriate image positions. Besides compensating movement brought about by these effects, it is also necessary to consider the current vehicle tilt when utilizing the height map approach from Section III-D. Both of these effects have an impact on the projection in Figure (a)a but can be compensated through according measures to obtain the accurate projection in Figure (b)b.

(a) No tilt compensation
(b) Active tilt compensation
Fig. 8: Impact of compensating dynamic vehicle movements and road gradients

In particular, changes in the roll and pitch angle are relevant for this process as even minor deviations in the orientation of the camera can result in substantial displacements for faraway objects and landmarks [23]. In contrast, slight changes in the effective camera height above the ground caused by, for example, the compression of the suspension only have a minor impact on the projection and occur to a considerably smaller extent in typical driving scenarios. The roll and pitch angles of the vehicle are measured by the DGPS-supported innerial measurement unit of the AD system and are compensated by altering the projection process through the addition of an upstream modification of the world coordinates before being plugged into the camera matrix model. Specifically, this is done to rotate the employed coordinate system such that it still matches the camera’s orientation in the real world. Thus, a separate rotation matrix is constructed that rotates the input positions around the roll-axis followed by the pitch-axis, compensating dynamic variations.

Iv Experiments

In the following section, qualitative and quantitative measures are employed to assess both the automatically generated GT annotations as well as a ML model trained therewith.

For the quality of a generated GT annotation to be conclusively assessed, it was necessary to compare it against a manually created annotation representing the segmentation mask’s desired target stage. Here, highway and more challenging urban driving scenes are separately analyzed. Owing to the laborious nature of the manual labeling process, only a total of 20 batches, each containing 25 images, were randomly selected out of all available image-mask pairs amounting to a total of 500 images for each scene type to be manually annotated. With a corpus of approximately and GT instances that passed basic quality assurance measures for highway and urban scenarios, respectively, this sampling strategy still retains a representative, dependable and unbiased assessment of the approach’s performance.

Using metrics that measure the misalignment between automatically () and manually created () segmentation masks, it is possible to quantitatively judge and assess the quality of the generated GT labels. As common practice in the semantic segmentation space [28], it was opted to utilize the Dice coefficient (Equation (1)) [29]

and the Jaccards index (Equation (

2)) [30] to ascertain the differences.

(1)
(2)

With a mean of both metrics of in the case of the highway scenarios, the general quality of the generated GT annotations accurately corresponds to the manually created masks. Only slight variations in the score across the individual sample batches suggest consistent qualitative characteristics in different driving and environmental scenarios.

This provides preliminary evidence of the approach’s suitability for a successful deployment which is further reinforced by the encouraging albeit somewhat lower scores in more diverse and complex urban scenarios. Here, an overall score of is reached. Dissecting the urban sample batches into further types of scenarios for a more expressive explanation of the underlying effects, as seen in Table II, shows that especially parking cars negatively impact the quantitative judgment.

In some cases, this effect is due to inaccurate placements of detected objects resulting in them being mistakenly cut off in case vehicles are parked in proximity to the current lane (see Figure (a)a). Another related problem is that parked, and moving cars are not robustly distinguished. Therefore, the corridor is cut off right in front of them instead of detouring around them as depicted in Figure (b)b. These problems could be solved by classifying and discerning mobile and immobile objects and cutting off the corridor or removing the intersection with the corridor, respectively.

Scenario #
Sharp Curve 6 0.934 0.891 0.913
No Markings 7 0.974 0.960 0.967
Parking Cars 3 0.880 0.838 0.859
Others 4 0.997 0.994 0.995
All (weighted) 0.953 0.928 0.940
TABLE II: Performance of the GT generation approach on urban sequences
(a) Cut-off corridor due to overly extended object bounding box
(b) Cut-off corridor due to parking car within ego-lane
Fig. 9: Issues evoked by insufficient consideration of parked vehicles

Aside from these effects, the quality of the other scenarios was reasonably good, however. Namely on road segments without lane markings and in the ”Others” category (includes mainly inter-urban passages with clear lane markings and no significant curvature), the attained scores demonstrated a solid segmentation accuracy. The reduced quality during sharp curves and junctions is, to a large extent, caused by sequences during which no clear lane markings or directional references are available. One example of such an instance that was part of the evaluation dataset is illustrated in Figure 10

. Whereas the generated mask directs the route more straightforwardly towards the diverging street, the manual annotation intends for the turning maneuver to be later and therefore covers a considerably greater ego-lane area at the given moment, leading to a penalization that is reflected in the scores. Due to the lack of conclusive markings, it is hard to judge, which alternative is more accurate, reducing the expressiveness of the utilized metrics in such scenarios.

(a) Generated GT image
(b) Human annotation
Fig. 10: Difference in the proposed route during sharp turn on unmarked road

Summarizing these results, the obtained scores suggest that the automatically created GT annotations - despite not quite matching their human-created counterparts across all situations - are of sufficient accuracy to be used to train ML models. In addition, higher degrees of labeling noise can be compensated by the sheer volume of generatable GT data. This can be seen in III, where a network trained on manual labelled data is compared to a network trained on automatically generated data. We used the highway network architecture proposed in [1] and unseen hand-labeled frames for the KPI generation.

GT type # of labels manual effort
Auto-generated 20,000 15min 0.884 0.936
Manual 5,000 11h 0.869 0.923
TABLE III: Comparison of model performance trained with manual and auto-generated GT on highway sequences

V Conclusion and Outlook

The results presented in the previous section provide convincing evidence that the proposed GT generation approach does yield GT annotations that nearly match the quality of their manually created counterparts. Only requiring basic quality assurance of 15 minutes or less to remove obviously flawed instances for the generation of 20,000 GT annotations, our developed approach vastly outperforms the human labeling process requiring approximately 40 hours of work for a dataset of the same magnitude. In our experience, deploying the automatic GT can result in time savings by a factor of over 150.

As a result, significantly larger volumes of GT data are generatable with only minimal human intervention. Subsequently, ML models with more complex architectures and better as well as more robust performance should be trainable. The availability of training and evaluation data is, therefore, no longer the bottleneck of the AI corridor approach, making it much more practicable and supporting its deployment in the real world.

In the future, we plan to realize an automated data-loop and want to research its impact on the network performance.

References

  • [1]

    T. Michalke, C. Wüst, D. Feng, M. Dolgov, C. Gläser, and F. Timm, “Where can i drive? a system approach: Deep ego corridor estimation for robust automated driving,” in

    2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 1565–1571.
  • [2] A. Kumar and P. Simon, “Review of lane detection and tracking algorithms in advanced driver assistance system,” Intern. Journal of Computer Science and Inform. Technology, vol. 7, pp. 65–78, 08 2015.
  • [3] S. P. Narote, P. N. Bhujbal, A. S. Narote, and D. M. Dhane, “A review of recent advances in lane detection and departure warning system,” Pattern Recognition, vol. 73, pp. 216–234, 2018.
  • [4]

    J. Niu, J. Lu, M. Xu, P. Lv, and X. Zhao, “Robust lane detection using two-stage feature extraction with curve fitting,”

    Pattern Recognition, vol. 59, 12 2015.
  • [5] J. Lee, “A machine vision system for lane-departure detection,” Computer Vision and Image Underst., vol. 86, pp. 52–78, 04 2002.
  • [6] Y. Xing, C. Lv, L. Chen, H. Wang, H. Wang, D. Cao, E. Velenis, and F. Wang, “Advances in vision-based lane detection: Algorithms, integration, assessment, and perspectives on acp-based parallel vision,” IEEE/CAA Journal of Automatica Sinica, vol. 5, pp. 645–661, 2018.
  • [7] N. e. a. Chetan, “An overview of recent progress of lane detection for autonomous driving,” 01 2020, pp. 341–346.
  • [8] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” Intern. Journal of Robotics Research (IJRR), 2013.
  • [9] T. Scharwächter, M. Enzweiler, U. Franke, and S. Roth, “Efficient multi-cue scene segmentation,” in Pattern Recognition, J. Weickert, M. Hein, and B. Schiele, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 435–445.
  • [10]

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in

    Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • [11] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “BDD100K: A diverse driving video database with scalable annotation tooling,” CoRR, vol. abs/1805.04687, 2018. [Online]. Available: http://arxiv.org/abs/1805.04687
  • [12] Y. Zhang, l. Zongqing, X. Zhang, J.-H. Xue, and Q. Liao, “Deep learning in lane marking detection: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. PP, pp. 1–17, 04 2021.
  • [13] J. Tang, S. Li, and P. Liu, “A review of lane detection methods based on deep learning,” Pattern Recognition, vol. 111, p. 107623, 03 2021.
  • [14] J. Kim and C. Park, “End-to-end ego lane estimation based on sequential transfer learning for self-driving cars,” in 2017 IEEE Conf. on CV and Pattern Recog. Workshops (CVPRW), 2017, pp. 1194–1202.
  • [15] N. Garnett, R. Uziel, N. Efrat, and D. Levi, “Synthetic-to-real domain adaptation for lane detection,” in ACCV, 2020.
  • [16] C. Hu, S. Hudson, M. Ethier, M. Al-Sharman, D. Rayside, and W. Melek, “Sim-to-real domain adaptation for lane detection and classification in autonomous driving,” 2022.
  • [17] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
  • [18] S. R. Richter, H. A. AlHaija, and V. Koltun, “Enhancing photorealism enhancement,” 2021.
  • [19] M. Ranaweera and Q. H. Mahmoud, “Virtual to real-world transfer learning: A systematic review,” Electronics, vol. 10, no. 12, 2021.
  • [20] A. Borkar, M. Hayes, and M. T. Smith, “An efficient method to generate ground truth for evaluating lane detection systems,” in 2010 IEEE Int. Conf. on Acoustics, Speech, Sig.Proc., 2010, pp. 1090–1093.
  • [21] A. Borkar, M. Hayes, and M. Smith, “A novel lane detection system with efficient ground truth generation,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 1, pp. 365–374, 2012.
  • [22] K. Behrendt and J. Witt, “Deep learning lane marker segmentation from automatically generated labels,” in 2017 IEEE/RSJ Intern. Conf. on Intelligent Robots and Systems (IROS), 2017, pp. 777–782.
  • [23] K. Behrendt and R. Soussan, “Unsupervised labeled lane markers using maps,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 832–839.
  • [24] A. Kasmi, J. Laconte, R. Aufrere, R. Theodose, D. Denis, and R. Chapuis, “An information driven approach for ego-lane detection using lidar and openstreetmap,” in 2020 16th Intern. Conf. on Control, Automation, Robotics and Vision (ICARCV), 2020, pp. 522–528.
  • [25] OpenStreetMap contributors, “Planet dump retrieved from https://planet.osm.org ,” https://www.openstreetmap.org, 2017.
  • [26] J. M. Álvarez, A. M. López, T. Gevers, and F. Lumbreras, “Combining priors, appearance, and context for road detection,” IEEE Transactions on Intell. Transp. Systems, vol. 15, no. 3, pp. 1168–1178, 2014.
  • [27] ROS Wiki Contributors, “Ros costmap 2d package summary.” [Online]. Available: http://wiki.ros.org/costmap˙2d
  • [28] A. A. Taha and A. Hanbury, “Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool,” BMC Medical Imaging, vol. 15, p. 29, Aug. 2015.
  • [29] L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
  • [30] P. Jaccard, “The distribution of the flora in the alpine zone,” New Phytologist, vol. 11, no. 2, pp. 37–50, 1912.