A perception centred self-driving system without HD Maps

by   Alan Sun, et al.
Washington University in St Louis

This paper proposes a new self-driving system to solve the localization and lines detection problem with the scalability under consideration. The proposed system is HD Map unrelated. All path planning is based on a rebuilt scene based on a topological map, and the traffic lines detection result from our detection subsystem. The proposed lines detection subsystem achieves a state of the art performance without using deep learning. The proposed localization subsystem relies on neither GPS nor IMU and provide a human level localization result by counting the stop lines and intersections. The system was tested on diverse datasets covering complicated urban situations. It is proved to be robust and easy to implement on a large scale.



page 4

page 6

page 7

page 8

page 9


Self-Calibration of the Offset Between GPS and Semantic Map Frames for Robust Localization

In self-driving, standalone GPS is generally considered to have insuffic...

Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

Over the last few years, we have witnessed tremendous progress on many s...

Room Detection for Topological Maps

Mapping is an important part of many robotic applications. In order to m...

Automatic Generation of the Axial Lines of Urban Environments to Capture What We Perceive

Based on the concepts of isovists and medial axes, we developed a set of...

Crowdsourced Smartphone Sensing for Localization in Metro Trains

Traditional fingerprint based localization techniques mainly rely on inf...

Where to Map? Iterative Rover-Copter Path Planning for Mars Exploration

In addition to conventional ground rovers, the Mars 2020 mission will se...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of the fully autonomous self-driving system is active for more than 20 years yet remaining unsolved. The industrial leaders are still struggling to pass necessary tests according to [7]. Before talking about the bottleneck, let us briefly review the architecture of a typical self-driving system. The self-driving system can be abstracted as five subsystems: perception, localization, planning (or path planning), control, and system management. The last system can be seen as the one taking care of all other trivial functions besides the four main pillars, like user interface interactions or log maintaining. Readers are referred to [15] for more information about these subsystems and [34] for a remarkable implication of this four subsystems architecture. Under controlled environment, this system from [34] can drive full-autonomously over 100 kilometres without any interruptions in 2014. However, it is tough to scale this system towards other more general driving environments. To understand the intrinsic reason for lacking scalability, let us briefly review how human drivers tackle the tasks of localization and detection.

Human drivers make driving decisions based on what they saw. They make sense of the environment around and decide to turn or to keep the current driving direction. They cannot mark the exact position of themselves on a map, but they know how to travel through a complicated intersection based on the knowledge of which way they should take. Likewise, does the self-driving system have to depend on a centimetre-level accuracy localization subsystem?

In this paper, A new perception centred self-driving system is proposed. Two driving scenarios are discussed: cruising and turning. The cruising scenario is when the vehicle cruising on parallel lanes. The turning scenario is when the vehicle drives through free spaces (defined as the drivable area outside of lanes, like intersections or parking area). The proposed system comes with several advantages. Firstly, it does not rely on HD Maps. So it is easy to scale without recording new HD Maps. Secondly, the proposed feature detection is not based on any specialized end-to-end deep learning solutions. Hence it is easy to debug and visualize. Also, it does not need additional time-consuming training process for scaling. Lastly, it performs more robustly with a severely changed environment (like seasons, weather or lighting condition).

The system only involves with related visual features (called traffic features, including traffic lines, traffic lights and traffic signs), just as human drivers do, rather than SIFT like visual features. The pipeline of the detection and localization part of the system is shown in Figure 1. In the cruising scenario, we only need to finish the first step, including 1.1 and 1.2. In the turning scenario, we need to finish all four steps of the pipeline. Note that the localization subsystem provides location information based on the rebuilt scene rather than the global map or GPS coordinates. The localization subsystem also projects the rebuilt scene onto a digital map to provide navigation instructions while crossing free spaces. The navigation part instructs the car to travel from one exit to the target entrance of the free space. The path planning system and control system works only on the rebuilt scene. Hence they are map unrelated.

Figure 1: The workflow of the proposed perception centered system

The proposed system uses traffic lines (including curbs) as the significant feature for tracking the vehicle’s position. Hence, this paper focuses on building and verifying the lines detector in the detection subsystem. We need a general lines detector for understanding complicated traffic lines on the road. The experiment covers several types of lines, including lane lines, stop lines, curbs, merging and splitting lines and intersections in a roundabout. The proposed new traffic lines detector performs as good as other deep neural network supported approaches on lane lines detection leveraging the prior knowledge of lines position and angles with easy erosion and clustering implications. This robust and straightforward method is generalized and successfully detected other kinds of lines as well. The process of localizing the position in the rebuilt scene will also be discussed with examples and limitations. In that example, the system requires neither GPS nor IMU signal nor dense 3D HD Maps to locate the vehicle.

2 Related Work

Most self-driving systems are relying on a map-based localization subsystem. They are categorized as localization centred systems because all other subsystems are working under the map space from the localization subsystem. The perception centred system uses a local scene, instead of a global map, as the working space for all other subsystems. Limited research have been done on this direction. One of the exceptions is [2] by Bojarski from Nvidia. In this work, they tried to build an end-to-end system from camera images to control signal with the help of an augmented learning deep neural network. It is also map-unrelated. However, this system only works for minimal lane-keeping tasks in the cruising scenario. It is not compatible to work with other subsystems, and the scalability is not tested for more sophisticated roads or sensor settings.

2.1 Localization

For most localization centred systems, all decision making and path planning are based on a centimetre level localization accuracy from their localization subsystem. Using GPS, with the aid of IMU, is a popular solution and provides accuracy better than 20 centimetres with SLAM over an HD Map [4]

. The problem of GPS is that the signal is not always available, and the result tends to drift accidentally. For quite a long time, SLAM is considered as the key to solving the localization problem for self-driving cars. The SLAM algorithm uses visual features stored in the HD Map to match features extracted from the live camera on the self-driving cars. Visual features are usually organized as bags of words (BOW) in the descriptor space.

Without HD Maps or IMU, researchers can hardly reach the centimetres level accuracy like [32] and [10]. However, two problems of the SLAM based localization approach are tricky to solve. Firstly, the performance decreases once the environment changes. Light angle changes cause different shadow shapes and season changes cause massive appearance changes on trees and grass. These changes yield new visual features which cannot be matched with the recorded ones on the HD Map. This problem requires routine labour-intensive map recording once after the changes occur. Secondly, the localization result tends to drift after a long-range driving, and the error will accumulate with growing driven distance, as discussed in [4]. The intrinsic reason of these problems is that the original SLAM algorithm is designed for indoor localization problems where dramatic environment changes or long-distance moving is not considered. Hence these problems are hard to eliminate.

Recent researchers, like Ma [22], started to use as less visual features as possible for localization. Besides saving the storage for the BOW of these features, using fewer features decrease the risk of being affected by the environment changes [28]. This trend brings the idea of using minimal features for localization. The LaneLoc system proposed by Schreiber [25] tried to use the exact appearance of lane markings for matching from pre-recorded maps. This approach could be seen as counting the number of dashed fragments the vehicle travelled to localize the car itself. This approach still has several limitations. Firstly, it will not work on a solid line situation and ends up with only relying on IMU without any visual aids. Secondly, the exact appearance will eventually change one day in the future. Think about the time when those dashed lines were repainted or worn out, which are both prevalent cases. Thirdly, the performance is very fragile. Slight turbulence, like occlusions or heavy shadows, will make the system omit one or more fragments and yield a steady error as a result. Lastly, the labelling process is both complicated and hard to finish accurate, as discussed by Schreiber in their paper. Our proposed system solved these limitations by abstracting line features further to types and directions by the proposed lines detector.

2.2 Lines Detection

The line detection, or the lane detection which is a narrower problem, was the essence of many early driving assistant systems [26] like Lane Departure Warning System (LDWS) and Lane Keeping Assist System (LKAS). Many researchers, like Kim [13]

, used Convolutional Neural Network (CNN) to reduce noise and get the segmentation of the lines markings. Wang

[29] used shape extracted from OpenStreetMap (OSM) as prior knowledge to help detect the lanes. Some problems remain for the CNN supported approaches.

Firstly, they still can not solve the long-tail challenging situations because CNNs heavily relies on the distribution of the training dataset. As a result, CNN generally works terribly in rare situations. Secondly, the segmentation result of the CNN approaches often cause blurry edges when it is not confident about the prediction. These blurry edges come with difficulty for the following algorithms when they try to form a line from these ambiguous pixels. Lastly, CNNs are significantly dataset related. They tend to work well only on the dataset they have been trained on [3]. This limitation is because that different datasets and sensor settings tend to create distinctive patterns of noise in the images. For example, in the KITTI dataset [9]

, the same line marks show different appearances in different locations under the BEV space. Lines far from the camera shows clear artifacts caused by the BEV transformation. The self-driving related datasets are often covering just one type of camera settings. A vast and comprehensive dataset like MS-COCO

[19] for the object detection task does not exist for now.

As a result, CNN was not used for lines detection. The proposed lines detector leverages the lines information from a topology map, similar to what Wang did in [29] from the OSM, as prior knowledge to help. The proposed lines detector separates different line types to boost the performance even more by using different lines detector for each type of lines (solid or dashed lines, straight or curved lines). It also used a sliding window to detect and connect traffic lines, similar to Tsai did in [20]. The sliding window approach is proved to be both robust and easy to visualize for debugging.

3 System Design

The overall working pipeline is shown in Figure 1. In the cruising scenario, the detection subsystem will finish the part 1.1 and 1.2 to give the current lane number of the vehicle, and that is enough for generating a driving path and control signal without involving the localization system at all. However, the detection system needs to continuously detect the traffic features for the next traffic part (could be another lane ahead or a free space connected with an exit). The order of the series of traffic features are based on the topology map. The topology map is drawn before the system can run on a new area. The topology map also provides lane information helping lines detection as prior knowledge. It also helps the vehicle to change to a preferred lane in advance in the cruising scenario.

In the turning scenario, the vehicle needs to travel through a free space. The first problem to solve is which is the exit and entry of the free space the vehicle should take. The topological map, being used as the descriptor space for matching with the digital map and the rebuilt scene, is the centre and the relationship is shown in Figure 2. The topology map contains the following information:

  • Lanes information: (1) the lines information on both sides (like straight yellow lines on the left and straight curb on the right), (2) ending information (like ends with a stop line or merges with other lanes on the left), (3) direction information (like starting direction, turning angel limitation for each window), (4) neighbour lanes used for lane changing while cruising, (5) connected entrance and exit numbers, (6) traffic rules metadata (like speed limits), (7) status (like normal, under maintenance or closed under specific time windows)

  • Entrance and exit: (1) position, (2) direction, (3) the relationship (an N to N relationship) with each other.

  • Free spaces: (1) detectable traffic features used for localization (including stop lines, crosswalk lines, traffic lights, traffic signs, lines of adjacent lanes) and their relative position in a real-world scale, (2) adjacent entrance and exit numbers, (3) traffic rules (like speed limits), (4) status

Figure 2: The left is a digital map used for navigation, the middle is the topology map, the right is the rebuilt scene

3.1 Matching digital map with topology map for navigation

Each turning point on the digital map is used for finding a nearest entrance-exit pair which have the correlated directions. Define as the set of all turning points on the digital map, where and is the latitude and longitude of turning point , is the direction before the turning and is the direction after the turning. and are the set of all entry points and all exit points. The score function f is the multiplication of g and h, as equation 1, where g is the Euclidean distance between two points and h is the difference of two angles, defined as and . The

is the set of all legal pairs of entrance and exits. All legal pairs should connecting with a same free space and follow the traffic law. For example, the exit on the end of a right turning lane cannot pair with the entrance a head with the same direction. The optimal pair for a minimal f score is the matched result with the condition of

. This method assumes the turning point on the digital map is the center point of the target exit and the target entrance.


The data of and are manually initialized as part of the topology map. These data usually do not need to be changed unless the traffic features are changed. For example, an intersection was updated with an additional right changing lane or new construction on the road updated the lane changing rules temporally. The maintenance of the topology map is easy, fast since we only need to change the lane data in the sets of and .

3.2 Matching topology map with perception scene for localization

Lanes form two kinds of lane sets: driving lane sets and detectable lane sets. The driving lane sets provide information about lane changing behaviour and traffic laws, like speed limits. Two examples of driving lane sets are illustrated in Figure 3. The vehicle can change to other lanes within the same driving lane set. The target lane and original lane information will be pass towards other following subsystems to act and finish the changing maneuver while lane changing.

Figure 3: An illustration of lane sets under two different situation overlapping on a satellite map, the left is on highway exit, the right is a complicated lane topology near a roundabout, the red and green arrow represents a entrance or an exit of that lane

The detectable lane sets provide information about how to detect these lanes. Lanes with either same travelling direction or opposite ones can be grouped into the same set. Each detectable lane sets has left and right line types, lane width (used as detection aid, but restrictions), dashed lines interval, suggested detection window size and other metadata which can be added as one’s convenience. A detectable lane set must have at least two sides of lines information used for lanes tracking. The lane width follows the priority of (1) the width between two detected lines, (2) the width of other detected lanes within the same detectable lane sets, (3) equally divided width if two lines (probably are curbs) of the whole set are detected, (4) the default lane width of the detectable lane sets. For example:

  • When there are two lanes travelling in opposite directions, and there is no middle line to separate these two lanes. If both sides are detected, the space in between will be divided by two for the width of each lane. If the vehicle only detected the right side (assume under right-hand driving condition), the lane width for the current lane is the default lane width of the lane set.

  • When there are four lanes travelling in opposite directions by two groups of two lanes, the middle line is a solid line, and the line between two same direction lanes is a dashed line. The number one lane (counting from the right) is the space between the curb and the dashed line, and the number two lane is the space between the middle solid line and the dashed line. If the vehicle cannot detect the curb to get the lane width of lane number one, the width of number two will be used for the width of lane number one.

For the cruising driving scenario, we care about two questions: (1) which lane set we are in (to prepare for the next exit), (2) which is the ego lane from the lane set. For these two questions, the system relies on either initialized at the beginning of the currently running period or initialized after driving through a free space through a specific entrance. The detection system verifies and corrects current lane set and lane number by matching detected lane lines types with the ones from the topology map. The detection system provides four lines detectors for detecting different types of lines: (1) solid straight line, (2) solid curve, (3) dashed straight line, (4) dashed curve. In the following of this paper, curbs are considered as the same as traffic lines without further clearance.

The changes of lane line types usually represent an end of the current lane. If is possible that there will be multiple line types in one side of a lane, the system uses the detector from the highest priority line type, because detector with higher level is more complicated and can handle the task of detecting low-level lines. The line types levels (one is the highest level and four is the lowest) are: (1) Dashed curves, (2) Solid curves, (3) Dashed straight lines, (4) Solid straight lines.

For the turning scenario, the detection subsystem only needs to detect one pair of non-parallel lines to rebuild the scene. For example, under the intersection scenario shown in Figure 4, a middle lane line and a stop line are enough for a strong anchor to rebuild the scene based on the given relative position from the topology map. The target entrance on the right side can be predicted and used for path planning. Once the vehicle has driven into the free spaces passing the stop line which will no longer be detected, the stop line of the target lane will be detected and provide a strong anchor to follow up. The starting point of the target lane will form a weak anchor as additional clues for localization.

The detection of anchors might be effected by occlusions caused by other objects on the road. In other situations, there is a chance when the vehicle is crossing a large intersection, the vehicle will have no available anchor in sight. The target lane direction and the current drivable area, as a backup, will aid the vehicle to finish the turning and pass the next entrance. The free space situation ends with positive detection of the next detectable lane set. If there are multiple lines parallel with each other nearby, the system assumes the detected one is the nearest one based on the current lane level position.

The system needs to be initialized at the beginning of each run based on GPS signals and the current driving direction from the gyroscope to tell the system which lane the vehicle is on. The GPS signal does not need to be centimetre-level accurate, and the detection system will update the lane number to a correct one, relying on counting the line numbers between the vehicle and the detected curbs.

Figure 4: How the vehicle locate itself using lines detection results. The yellow triangle is a weak anchor and the red triangle is a strong anchor.

This paper does not cover behaviour decision among lanes because this can be considered as a solved problem thanks to previous research like [34]. Behaviour decision includes behaviours, like yielding to vehicles coming out from other merging lanes. These rules are universal and consistent.

3.3 General lines detector

The proposed lines detector in the detection subsystem can detect diverse types of lines with minor changes on the algorithm. The code for lane lines detection for KITTI can be found on this repository. The following types of lines were tested: (1) lane lines, (2) curbs, (3) stop lines, (4) merging or splitting points of two lines (pair of lines), (5) special lane lines or curbs (which are not parallel to the current ego lane). The lines detection problem was dissected by tracing back to the most significant visual features of the lines, which is their long and narrow appearance. A sliding window was used to follow possible lines to exploit this feature. All noise without this narrow feature was eliminated by applying these following methods:

  • Region Restriction: The detection subsystem leverage a given prior knowledge about the starting points to eliminate noise in unrelated regions. This knowledge comes from either previous lines detection results or predicted by the positive detection results of neighbour lines given lane width from the topology map. For dashed lines, the sliding window moves at a step size of dash segment intervals given from the topology map to make sure optimal detection position for each segment. The system tolerate minor errors on this interval distance. The more knowledge we know about the lines, the smaller window for detection we can use. A smaller region of interest gives better resilience for challenges, helps the segment normalize better and speeds up the lines detection process.

  • Special Convolution Kernel: The system uses a special kernel for line detection, as shown in Figure 5. This proposed kernel helps to yield a cleaner result in hough space for the following steps with less noise. Also, this kernel is more friendly for detecting bending curves, merging lines and splitting lines than a simple vertical kernel.

  • Directional Erosion: The system uses a special directional erosion kernel to erode noise which is not spanning through a specific direction (, is the pixels in the window and is a 5 by 1 narrow structuring element), as illustrated in Figure 6. The direction of target lines is given from the topology. In one sliding window, the line segment can be considered as a straight line. Sharp turning lines or circles will also be eroded into small segments which will be filtered afterwards. Though there are some other more complicated ways to leverage the direction information for lines detection [12], the directional erosion is simpler and also works.

  • Types of Lines: For curves in each detection window, they are restricted to have smaller angle changes than the thresholds, which is usually very small given from the topology map. Extinguish sharp turning lane lines have large turning thresholds. For straight lines, we can use a much narrower window. For dashed lines, the marks which are too long or too short will be filtered out, as shown in Figure 7. The topology map gives the length of segments of the dashed line. This method helps to filter some unusual noise, as shown in Figure 7.

Figure 5: Four results for different convolution kernels. The first one is the proposed one and the last one is a typical square edge detection kernel. The result on the left is more smooth and cleaner than the ones on the right in the noisy area.
Figure 6: Directional erosion eliminates strong noise in the red circles while detecting stop lines.

The proposed lines detector uses the Y channel from the YUV color channels since it was proved to perform better by Lin in [18]

. The system works on the Bird-Eye-View (BEV) space since we can leverage the lines prior knowledge without predicting the camera pose or estimation of the vanishing point (VP)

[17]. More about the homography transformation from the camera image to a BEV space with a given camera pose can be found in [14].

For the feature detection on the hough space, a low-high-low kernel was widely used by [1], [16] and [31]. I rectify it as a low-middle-high kernel and then mirror it to make the detection on the left and right side separately. We can detect merging or splitting lines spot and their directions (merging from / splitting to the left or the right) by comparing these two results. For example, at the place a line is splitting to the right, the line detection from the right side will break coming with a shorter length of the line than the left side, as shown in Figure 8. To separate splitting and merging, an additional window will be created both upwards and downwards leaning to one side. Positive lines detection result in the upwards window means splitting and lines detection result in the downwards window means merging.

Lastly, the procedure for stop lines detection is as follows. After the detection, if the line is broken in the upper end, two side windows will be created. A horizontal line detection, using horizontal convolution kernel and erosion structure, will be applied to detect the stop lines. If the result is positive, then this lane line is marked as finished, and no window above will be created.

Figure 7: An example of how length information helps to filter noise within a single window.
Figure 8: Illustrate when lines splits, left side and right side line detection result will not agree with each other

The overall process of lines detector is shown as pseudo code in algorithm 1. For special lines which are not parallel to the current ego lines, the problem is we do not have any initial position for the sliding window to start. However, we can still use the direction information from the topology map. Spotting the anchors while turning in free spaces is one of the situations which require detecting special lines, as shown as in Figure 4. The algorithm is a little different, shown as pseudo code in algorithm 2. The image after erosion is cut into several blocks. The blocks containing valid pixels are used for forming windows.

Data: Input Image to detect and topology map information
Result: Detected lines
Initialize the first window;
while sliding window does not go out of the image do
       Cut pixels in the window;
       Rotate the window;
       Commit Convolution;
       Commit Directional Erosion;
       Cluster pixels;
       Form lines for all candidates;
       Filter and get the result for current window;
end while
Connect results and form a line;
Algorithm 1 The lines detector
Data: Input Image to detect and topology map information
Result: Detected lines
Rotate the image;
Commit Convolution;
Commit Directional Erosion;
Get valid pixel blocks;
Form valid blocks into windows;
Cut pixels for each window;
for each window do
       Cluster pixels;
       Form lines for all candidates;
       Filter and get the result for current window;
Connect and merge similar lines;
if Any positive result in any window then
       Return the longest detection result;
       Return negative as a result;
Algorithm 2 The special lines detector

4 Results

The earlier part of this chapter shows the proposed general lines detector is robust to typical noise on the road, works well under different weather and lighting conditions and detect multiple types of lines. The later part of this chapter shows that the localization method helps the vehicle travel through the turning scenario.

For lane lines detection, the method was tested on KITTI [9]

and Cityscapes

[6]. For general traffic lines detection, The proposed method was tested on the Berkeley deep drive (BDD 100k) [33], KITTI and a self-recorded video. These results of general lines detection cannot be compared to other methods due to lacking metrics. At last, the BDD 100k dataset and images from a self-recorded video are used for testing the localization method while passing free spaces.

4.1 Lane lines detection

Figure 9: The first row is some of the detection results of KITTI-UM. The second row is some of the detection results of Cityscapes. The red is the converted ego lane based on lines detection.

The proposed lines detector, ECPrior (Erosion and Cluster with prior knowledge), performance as good as other deep neural network supporting approaches [23] [30] [5] [21] based on the KITTI behaviour evaluation [8] result in the table 1. Some of the detection results are as shown in the first row of Figure 9. The proposed detector does not include object detection; hence it will be affected by other cars close to the lines. A typical object detector can be added before the proposed lines detector to get a better result, like Satzoda did in [24]. The object detection is usually a separate module, and the same feature should not be implemented again in the lines detection module. The proposed lines detector works equally fine on Cityscape showing its scalability, as shown in the second row of Figure 9, despite they have very different aspect ratios. A different aspect ratio was used because the Cityscapes dataset does not provide official BEV transformation methods or camera coordinates.

Method HR-30 PRE-40 F1-40
CyberMELD 97.55 % 94.57 % 89.66 %
RBNet 95.92 % 95.56 % 87.21 %
RoadNet3 95.57 % 94.57 % 83.72 %
ECPrior (Mine) 93.96 % 96.70 % 91.86 %
Up-Conv-Poly 93.14 % 90.11 % 83.72 %
Table 1: KITTI (UM Lane) lane lines detection result

The limitations of the current lines detector are:

  • Like all other methods, the detector relies on a stable and accurate BEV transformation. The transformation is hard to be accurate when the ground is not flat. Although the deep neural networks can learn to avoid this for a specific dataset, it is still hard to scale due to some degree of overfitting. When it comes to non-flat surface, the width of a lane might shrink, as shown in the first fail case in Figure 10. Dynamic adjustment of the window width can avoid windows from merging.

  • Because ECPrior is for general cases, the input images should not have special manifests which would disturb the detector, as shown in the second fail case in Figure 10. For KITTI, these manifests are mainly caused by the BEV transformation.

Figure 10: Two fail cases in each row: (1) the first image is the ego lane result and the second is the lines detection result of the right side curb; (2) the middle image is the gradience showing clear manifests and the lines detection results shown in the right image.

4.2 General lines detection

The ECPrior can solve the problem caused by typical shadows or short breaks. The ECPrior is also proved to be robust with different lighting and weather conditions. For stop lines, images from the BDD 100k was used for testing. The result is shown in 11. The upper case in that image is under a lightly snowing daylight environment, and the lower case in that image is in a night lighting environment. In both cases, the ECPrior lines detector successfully detect the stop line ahead.

The ECPrior also detects special lines well. A self-recorded video was used for testing. An example in Figure 12 shows the ability to detect special lines under a turning scenario travelling through a roundabout. In this situation, the lines detector needs to detect the rear inner side of the roundabout. The left side curb of the current lane and the inner side curb of the roundabout can then form a strong anchor used to rebuild the scene of the free space for localization.

The ECPrior uses intense erosion and threshold so that only a small portion of target lines will be detected at the pixel level. Hence the ECPrior detector is not a pixel-level detector. The ECPrior, as an intact line detection module, provides full lines detection result with the lines forming feature together (using regression for dash segments and straight lines and Spline for the others). The ECPrior inevitably relies on an accurate BEV transformation to leverage the prior knowledge of the lines. Deformation due to camera behind the windshield or problematic camera settings cause a narrower efficient area for detection, at that situation only lines lie in the middle of the front can be detected. As an example, the detector failed to detect the left side of the inner curb due to deformation in Figure 12.

Figure 11: Stop lines detected results on BDD 100K dataset. Blue pixels are the detected stop line and green pixels are the guiding lane line of that stop line.
Figure 12: Detecting the inner side of the roundabout is an example of detecting a special line with only its direction given. The top 2 results are shown as red and blue in the last image in the red box.

4.3 Localization

Based on these examples of detections, strong and weak anchors can be established to locate the vehicle in the turning scenario. This localization approach relies on neither GPS nor IMU for vehicles to travel through urban areas. The system provides a stable and accurate position based on the rebuilt scene for path planning and control subsystems in the turning scenario. For the cruising scenario, the detection system gives a lane level localization result (which lane the vehicle is on) which is enough for the following systems.

There are several limitations for using this approach alone for localization. Firstly, the proposed localization method relies on visual clues of specific traffic features (traffic lines only in this paper). Heavy occlusion blocking most of the target traffic lines will affect the location result in several ways. In one situation, the vehicle is approaching the intersection with heavy traffic ahead, blocking most of the stop lines ahead. The localization system will not spot anchors until the vehicle is very close to the stop lines, producing a short reaction time to stop for the following subsystems. In another situation, the vehicle is about to turn right into a small allay based on the navigation route and some parking vehicles block the view of the turning part of the right curb. The detection system will not detect the right turning feature to spot this allay and make the vehicle miss the target turn.

In the first situation, we can involve the behaviours of other vehicles as an input for the localization, like the way Gao leveraged the position of other vehicles in [27]. For example, when we detect a line of idle vehicles, we can assume the position of the first stopping car is indicating the position of the stop line to form a prediction to extend the reaction time for the following subsystems. In the second situation, a more comprehensive drivable area analysis, aided by LiDAR signals, will show a right side road extension indicating the allay. With limited visual clues, the 3D height information of the road is essential for the detection of the roads topology. Additionally, the localization subsystem is compatible with traffic lights, traffic signs and GPS as pieces of additional information to help.

5 Conclusions

This paper proposes a new perception centered self-driving system and focuses on testing the proposed general lines detector, ECPrior, and the localization method on several urban cases. The proposed system design is a skeleton and a starting point with all potentials to work with additional modules to get better performance. For example, users can try to apply the method by Hillel in [11] to get rid of the lens flare to make the detection of ECPrior more robust when driving towards the sunshine. The potential is much more promising than other deep neural networks based detection methods. And the scene rebuilding module of the localization subsystem was just briefly proved without touching more complicated settings. Diverse types of scenes rebuilding can be discussed in future works. Places like indoor parking area without GPS signals will heavily rely on the rebuilt scene to localize the vehicle. Hence they should be prioritized.

In the end, I appeal to the community to reconsider the necessity of using SIFT like visual features for localization, as well as the need for relying on deep neural networks for traffic lines detection in the context of self-driving.


  • [1] M. Bertozzi and A. Broggi (1998) GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection. IEEE Transactions on Image Processing 7 (1), pp. 62–81. Cited by: §3.3.
  • [2] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba (2016) End to end learning for self-driving cars. CoRR abs/1604.07316. External Links: Link, 1604.07316 Cited by: §2.
  • [3] A. Borkar, M. Hayes, M. T. Smith, and S. Pankanti (2009) A layered approach to robust lane detection at night. In 2009 IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems, Vol. , pp. 51–57. Cited by: §2.2.
  • [4] G. Bresson, Z. Alsayed, L. Yu, and S. Glaser (2017) Simultaneous localization and mapping: a survey of current trends in autonomous driving. IEEE Transactions on Intelligent Vehicles 2 (3), pp. 194–220. Cited by: §2.1, §2.1.
  • [5] Z. Chen and Z. Chen (2017) Rbnet: a deep neural network for unified road and road boundary detection. In International Conference on Neural Information Processing, pp. 677–687. Cited by: §4.1.
  • [6] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele (2016)

    The cityscapes dataset for semantic urban scene understanding

    CoRR abs/1604.01685. External Links: Link, 1604.01685 Cited by: §4.
  • [7] DMV (2019) Autonomous vehicle disengagement reports. Note: Last accessed 15 August 2020 External Links: Link Cited by: §1.
  • [8] J. Fritsch, T. Kühnl, and A. Geiger (2013) A new performance measure and evaluation benchmark for road detection algorithms. In 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), Vol. , pp. 1693–1700. Cited by: §4.1.
  • [9] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    Vol. , pp. 3354–3361. Cited by: §2.2, §4.
  • [10] T. Heidenreich, J. Spehr, and C. Stiller (2015) LaneSLAM – simultaneous pose and lane estimation using maps with lane-level accuracy. In 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Vol. , pp. 2512–2517. Cited by: §2.1.
  • [11] A. B. Hillel, R. Lerner, D. Levi, and G. Raz (2014) Recent progress in road and lane detection: a survey. Machine vision and applications 25 (3), pp. 727–745. Cited by: §5.
  • [12] A. S. Huang, D. Moore, M. Antone, E. Olson, and S. Teller (2009) Finding multiple lanes in urban road networks with vision and lidar. Autonomous Robots 26 (2), pp. 103–122. External Links: Document, ISBN 1573-7527, Link Cited by: 3rd item.
  • [13] J. Kim and M. Lee (2014-11) Robust lane detection based on convolutional neural network and random sample consensus. pp. 454–461. External Links: Document Cited by: §2.2.
  • [14] Z. Kim (2008) Robust lane detection and tracking in challenging scenarios. IEEE Transactions on Intelligent Transportation Systems 9 (1), pp. 16–26. Cited by: §3.3.
  • [15] S. Kuutti, S. Fallah, K. Katsaros, M. Dianati, F. Mccullough, and A. Mouzakitis (2018) A survey of the state-of-the-art localization techniques and their potentials for autonomous vehicle applications. IEEE Internet of Things Journal 5 (2), pp. 829–846. Cited by: §1.
  • [16] R. Labayrade, J. Douret, J. Laneurit, and R. Chapuis (2006-07) A reliable and robust lane detection system based on the parallel use of three algorithms for driving safety assistance. IEICE - Trans. Inf. Syst. E89-D (7), pp. 2092–2100. External Links: ISSN 0916-8532, Link, Document Cited by: §3.3.
  • [17] C. Lee and J. Moon (2018) Robust lane detection and tracking for real-time applications. IEEE Transactions on Intelligent Transportation Systems 19 (12), pp. 4043–4048. Cited by: §3.3.
  • [18] Q. Lin, Y. Han, and H. Hahn (2010) Real-time lane departure detection based on extended edge-linking algorithm. In 2010 Second International Conference on Computer Research and Development, Vol. , pp. 725–730. Cited by: §3.3.
  • [19] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. Lecture Notes in Computer Science, pp. 740–755. External Links: ISBN 9783319106021, ISSN 1611-3349, Link, Document Cited by: §2.2.
  • [20] Luo-Wei Tsai, Jun-Wei Hsieh, Chi-Hung Chuang, and Kuo-Chin Fan (2008) Lane detection using directional random walks. In 2008 IEEE Intelligent Vehicles Symposium, Vol. , pp. 303–306. Cited by: §2.2.
  • [21] Y. Lyu, L. Bai, and X. Huang (2019) Road segmentation using cnn and distributed lstm. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. Cited by: §4.1.
  • [22] W. Ma, R. Urtasun, I. Tartavull, I. A. Barsan, S. Wang, M. Bai, G. Mattyus, N. Homayounfar, S. K. Lakshmikanth, and A. Pokrovsky (2019-11) Exploiting sparse semantic hd maps for self-driving vehicle localization. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). External Links: ISBN 9781728140049, Link, Document Cited by: §2.1.
  • [23] G. Oliveira, W. Burgard, and T. Brox (2016) Eifficient deep methods for monocular road segmentation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016), Cited by: §4.1.
  • [24] R. K. Satzoda and M. M. Trivedi (2014) Efficient lane and vehicle detection with integrated synergies (elvis). In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Vol. , pp. 708–713. Cited by: §4.1.
  • [25] M. Schreiber, C. Knöppel, and U. Franke (2013) LaneLoc: lane marking based localization using highly accurate maps. In 2013 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 449–454. Cited by: §2.1.
  • [26] Y. Son, E. S. Lee, and D. Kum (2019-02) Robust multi-lane detection and tracking using adaptive threshold and lane classification. 30 (1). External Links: ISSN 0932-8092, Link, Document Cited by: §2.2.
  • [27] Tianshi Gao and H. Aghajan (2009) Self lane assignment using egocentric smart mobile camera for intelligent gps navigation. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vol. , pp. 57–62. Cited by: §4.3.
  • [28] C. Toft, E. Stenborg, L. Hammarstrand, L. Brynte, M. Pollefeys, T. Sattler, and F. Kahl (2018-09) Semantic match consistency for long-term visual localization. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §2.1.
  • [29] X. Wang, Y. Qian, C. Wang, and M. Yang (2020) Map-enhanced ego-lane detection in the missing feature scenarios. IEEE Access 8, pp. 107958–107968. External Links: ISSN 2169-3536, Link, Document Cited by: §2.2, §2.2.
  • [30] X. Wang, Y. Qian, C. Wang, and M. Yang (2020) Map-enhanced ego-lane detection in the missing feature scenarios. arXiv preprint arXiv:2004.01101. Cited by: §4.1.
  • [31] S. Wu, H. Chiang, J. Perng, C. Chen, B. Wu, and T. Lee (2008) The heterogeneous systems integration design and implementation for lane keeping on a vehicle. IEEE Transactions on Intelligent Transportation Systems 9 (2), pp. 246–263. Cited by: §3.3.
  • [32] Yan Jiang, Feng Gao, and Guoyan Xu (2010) Computer vision-based multiple-lane detection on straight road and in a curve. In 2010 International Conference on Image Analysis and Signal Processing, Vol. , pp. 114–117. Cited by: §2.1.
  • [33] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell (2018) BDD100K: A diverse driving video database with scalable annotation tooling. CoRR abs/1805.04687. External Links: Link, 1805.04687 Cited by: §4.
  • [34] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, T. Dang, U. Franke, N. Appenrodt, C. G. Keller, E. Kaus, R. G. Herrtwich, C. Rabe, D. Pfeiffer, F. Lindner, F. Stein, F. Erbs, M. Enzweiler, C. Knöppel, J. Hipp, M. Haueis, M. Trepte, C. Brenk, A. Tamke, M. Ghanaat, M. Braun, A. Joos, H. Fritz, H. Mock, M. Hein, and E. Zeeb (2014) Making bertha drive—an autonomous journey on a historic route. IEEE Intelligent Transportation Systems Magazine 6 (2), pp. 8–20. Cited by: §1, §3.2.