Soft Constrained Autonomous Vehicle Navigation using Gaussian Processes and Instance Segmentation

01/18/2021 ∙ by Bruno H. Groenner Barbosa, et al. ∙ University of Waterloo Federal University of Lavras 0

This paper presents a generic feature-based navigation framework for autonomous vehicles using a soft constrained Particle Filter. Selected map features, such as road and landmark locations, and vehicle states are used for designing soft constraints. After obtaining features of mapped landmarks in instance-based segmented images acquired from a monocular camera, vehicle-to-landmark distances are predicted using Gaussian Process Regression (GPR) models in a mixture of experts approach. Both mean and variance outputs of GPR models are used for implementing adaptive constraints. Experimental results confirm that the use of image segmentation features improves the vehicle-to-landmark distance prediction notably, and that the proposed soft constrained approach reliably localizes the vehicle even with reduced number of landmarks and noisy observations.



There are no comments yet.


page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reliable and accurate navigation is a critical necessity for safe performance of mobile autonomous systems, such as in autonomous vehicles, service robots, and mobile robots for automated storage in warehouses. Leveraging a wide range of proprioceptive and exteroceptive sensory information, mobile autonomous systems enhance navigation and mapping algorithms for accurate motion planning, trajectory tracking, and robot stabilization algorithms [levinson2010robust, li2019deep, li2020toward]

. Integration of semantic knowledge of the scene, in which appearance-based features of the explored space are classified, improves reliability of navigation

[salas2013slam++, lateef2019survey]. In this regard, map-based methods are promising approaches for achieving high-accuracy localization [kassas2020cellular, polemap2020]. Dense maps, such as Point Cloud Map (PCM) [burgard2006], or landmark maps [trafficlight2019] are also widely used to perform navigation. The former requires a large amount of storage memory and may be not applicable for large scale environments, whereas the latter is more compact. Existing dense mapping approaches are computationally expensive for on-board computation in autonomous mobile robots and vehicles, limiting real-time capabilities. To resolve this, simplified multi-frame inference that decouples geometry and semantics are utilized [dai2017bundlefusion, bloesch2018codeslam].

In the map-based navigation approach, the selected features are expected to contain relevant prior knowledge about the static information the robot/vehicle may encounter in the environment. In the intelligent transportation setting, traffic signs [trafficsign2015, sparsemap2019], traffic lights [trafficlight2019], lanes locations [roadmarkings2017, lin2020lane], pole-like structures [polelike2017], curbs, and other road markings are normally used in this context [survey2020ieee]. Besides, artificial landmarks such as QR-codes are also utilized to reduce ambiguity [polelike2017]

. For feature detection, semantic segmentation, and instance segmentation, Convolutional Neural Networks (CNN) have been widely utilized

[deephrr2020]. Complex CNNs such as AlexNet [alexnet], VGGNet [simonyan2015deep], Inception [szegedy2014going], and Resnet [he2015deep], have been developed due to the increasing computational capability and the availability of large datasets. As opposed to object detection, in instance segmentation, besides detecting (objects’) bounding boxes, a per-pixel segmented mask and classification is obtained that results in better time efficiency [xie2020polarmask] and precise detection, especially for irregular objects [qi2020pointins]. Indeed, existing approaches have reached real-time operation capability comparable to object detection performance [Bolya_2020, tian2020conditional, wang2020solov2, lee2020centermask]. According to [wang2020solov2], the instance segmentation approaches can be grouped into three categories: i) top-down, in which the object bounding box is detected (anchor-free or not) and then semantic segmentation is applied [maskrcnn, Bolya_2020, xie2020polarmask, qi2020pointins]; ii) bottom-up, where pixels are firstly labeled and then clustered [deepwatershed2017, gao2019ssap]; and iii) direct methods, where instance segmentation is performed directly without box detection or feature embedding [wang2020solov2].

The proposed and experimentally verified soft constrained navigation approach in this paper is considered a landmark-aided method (i.e., so-called a priori map-based localization [survey2020ieee]) and is different from Simultaneous Localization and Mapping (SLAM) techniques [orbslam22017], Structure-from-Motion (SfM) methods [sfm2016], or GPS-IMU fusion systems [caron2006]

. An important feature of the proposed method is that it could be used together with other approaches to enhance state estimation and localization, such as an add-on approach discussed in

[bundle2019traffic, visionenhanced2020] to reduce possible drifts in SLAM. The main contributions of the paper are summarized as:

  • A generic feature-based navigation framework is developed and verified experimentally.

  • Within this framework, a hybrid distance predictor is developed through a Gaussian Process Regression (GPR) model on an instance-based segmented landmarks; this is to increase the reliability of relative distance estimation for far landmarks as well as the close ones.

  • A soft-constrained particle filter is designed for pose estimation and navigation using kinematic/geometric constraints and the hybrid GPR distance predictor.

The remainder of the paper is organized as follows. Section 2 presents landmark segmentation and the hybrid GPR-based distance predictor. The constrained particle filter is developed in Section 3. Experimental validation through an autonomous vehicle platform and corresponding discussions are provided in section 4. The conclusion is drawn at last.

2 Segmentation and Distance Estimation

An overview of the proposed localization framework is presented in Figure 1. Firstly, global position of selected map features, such as landmarks and road central line position, are obtained. By means of a monocular camera, images are acquired and, based on an instance segmentation technique, known landmarks are detected and segmented (Sec. 4.1

). According to features extracted from the detected and segmented landmarks, a Gaussian Process Regression model is implemented to predict the distance between the vehicle and each landmark (Sec.

2). Considering these estimated relative distances, the detected landmarks extracted from images are compared to those from the available map and are properly associated (landmark matching process). With the predicted distances and the known road boundaries, constraints are developed to estimate the vehicle position based on a soft constrained particle filter algorithm (Sec. 3).

Figure 1: General structure of the feature-based navigation algorithm.

The proposed localization framework relies on a predefined global positioning planar map of landmarks (Ground Control Points - GCP) and road shape (course) model, i.e., on a static HD map [surveylocalization2018]. Thus, some relevant map features must be selected in order to generate the map before applying the proposed self-localization procedure. From this premise, a vision based semantic landmark search approach is employed here [combininddeep2019lanmark]. Feature maps can be obtained using different approaches, such as by means of LiDAR and accurate GNSS systems or from public crowdsourced maps such as OpenStreetMap or even proprietary HD maps.

2.1 Instance-Based Segmentation

In order to define landmark constraints for implementing the Soft Constrained Particle filter (Sec. 3), previously selected map landmarks must be detected from RGB images and the relative distance between the vehicle and the detected landmarks have to be estimated. In this regard, for the sake of simplicity and considering that only planar HD map is available, vehicle, landmark and road elevations are neglected for relative distance prediction. Stereo-vision approaches [stereo2006] and monocular depth estimation methods [yin2019enforcing] are normally used to address scene analysis and calculate the relative positioning of objects therein [geotagging2018]. However, considering that only one monocular camera is available, the proposed approach applies instance-based segmentation approach to obtain relevant features of known landmarks for predicting the relative distance between the vehicle and the detected and segmented landmark.

Figure 2 presents an example of instance segmentation of landmarks (e.g. light poles). As can be noticed, besides providing the bounding boxes of the two detected landmarks, the algorithm also yields their associated pixel masks. We argue that the use of features extracted from the segmented landmark pixels together with bounding boxes features improve the object-to-camera distance prediction.

Some simple features such as bounding box height and width may be directly used for object-to-camera distance prediction. In some cases, these features would be enough since the detected object geometry is previously known and modification of the bounding box size is probably related to the change in the relative distance between the camera and the object. However, change in the bounding box size may be not related to object-to-camera distance due to camera distortion or even object occlusion. In this case, models based only on bounding boxes may fail. For instance, Figure

2 presents two detected and segmented light poles (landmarks). It is worth noting that although the detected objects share the same geometry, their estimated bounding boxes as well as their segmented masks are of very different shapes. This occurs, in this example, not only because their distance to the camera are different but also due to the absence of the upper elbow (with light bulb) of the left light pole in the image. Besides, its bounding box is not coherent with the segmented pixels; it is wider than expected due to the camera distortion. Thus, other object features from the segmented mask can be included. For instance, the thickness of the left segmented pole (e.g. from number of pixels) could be obtained and also used as an input feature for the distance prediction model. This could improve the model accuracy and hence justifies the use of an instance segmentation algorithm.

Despite all recent improvements on the algorithms for instance segmentation, the Mask R-CNN method [maskrcnn] remains a versatile state-of-the-art technique [tian2020conditional]. Briefly, Mask R-CNN is based on two-stage object detector, such as the Faster R-CNN [ren2016faster], to predict bounding-box for each instance and a compact fully convolutional network (FCN) for mask prediction, besides some operations on regions-of-interest (ROIs) of the network feature maps.

Figure 2: Relative distance prediction with GPR based on Landmarks (light poles) instance-based segmentation.

Thus, Mask R-CNN is used in this paper since our aim is to show the relevance of using instance segmentation to predict camera-to-object distance. Any other recent, more accurate and faster algorithm proposed in the literature could alternatively be promptly applied.

2.2 Distance Prediction with GPR

After detecting and segmenting a known landmark, the next step of our approach is to predict the distance between the landmark (object) and the robot/vehicle in order to implement constraints for the localization algorithm, as shown in Figure 2. Considering real environments, the bounding boxes and the segmented masks of the detected objects are not perfect. For instance, partial object occlusion may occur or even the scene illumination may not be proper for correct object detection.

In this way, some information about how reliable is the predicted distance should be used in the vehicle localization algorithm. Some papers have included uncertainties about object class identification for vehicle localization estimation, such as [semantic2020, Ganti_2019]. However, they were applied to visual SLAM approaches and not for landmark-aided approaches such the one presented in this paper. Besides, the use of distance prediction uncertainty in vehicle localization algorithm were not found in the literature.

Considering some landmarks of a map are well known, such as light poles or traffic signs, a regression model is implemented to predict the aforementioned distance with uncertainty. Each kind of landmark must have its own regression model since this model is designed using features of this specific landmark. One interesting approach for this task is the Gaussian Process Regression (GPR) model, a nonparametric probabilistic model based on Bayes theory [rasmussen2006]. An advantage of GPR is that the data define the input-output mapping complexity since no specific model structure is fitted [barbosa20tire]. Thus, GPR can handle complex relations between inputs and outputs with a relatively simple structure based on the mean and covariance functions [SCHULZ20181]

. Providing the prediction variance value distinguishes GPR from other machine learning methods

[jin2015] and make it suitable for distance prediction in our proposed approach.

In order to build a GPR model to predict the relative camera-to-object distance (desired output) considering some landmark features (input variables), assume that a data set is available such that , where , represents the known euclidean distance between the landmark and the camera (for instance in UTM coordinates), represents the extracted features, is the number of samples (detected landmarks available in acquired frames) and is the number of features used as model’s inputs.

Consider that a function is distributed as a Gaussian process, and that , with ) as the i.i.d noise term. The observed target prior distribution can be described as where is the covariance matrix between all observed data and

is the identity matrix

[barbosa20tire, rasmussen2006].

To make a prediction

for a new input vector

, the joint distribution of the observed target values and

follow a joint (multivariate) Gaussian distribution given by

The mean and variance values of the posterior distribution are expressed as, respectively,


To improve the GPR model accuracy (same object can have very distinct features according to the scene position as shown in Figure 2) and to reduce its computational complexity for real-time application (due to matrix inverse computation) a Mixture of Experts approach is employed [liu2020gprscale]. In this case, the whole input space is divided by a gating network into

smaller sub-spaces within which a simpler GPR model (expert) is used. The gating network can be seen as a clustering algorithm and different methods may be used, such as the Gaussian Mixture Model (GMM). The output of the ME approach could be a weighed combination of the

experts outputs or even the output of the most appropriate one as implemented in this work.

Note that the coordinate frames involved in the labelling of the training data as well as for real-time implementation includes: GPS, camera, and image frame. Additionally, knowing the transformation between these frames allows for local to global frame conversion and vice-versa.

After predicting the object-to-camera distance and considering that the camera coordinates origin is the lens optical center, this predicted distance should be converted to object-to-vehicle body coordinate. Considering the vehicle body is rigid and the camera is installed in a fixed and known position, the distance between the object and the vehicle body coordinates origin can be obtained regarding the bearing angle between the camera and the landmark is also known. To obtain the referred angle, it is necessary to convert image frame positions (pixel coordinates) to the camera coordinates, this can be done by intrinsic parameters calibration using chessboard, as suggested in [trafficlight2019].

3 Vision-Map Localization

After obtaining the landmark-to-vehicle relative distance, the landmark detected in an image must be associated to a landmark present in the available map (database), this is known as the association or landmark matching problem [polemap2020]. An approach to landmark matching is to apply a nearest neighbor search around the estimation of the vehicle’s pose in preceding frames [trafficsign2015]. If more than one match is found, the predicted landmark-to-vehicle relative distance can be used to find the proper correspondence [polemap2020]. However, this method may be unreliable when there are many landmarks in the map (dense map), or when the vehicle has a large displacement. The matching problem is widely discussed in the literature and it is not the focus of this work.

3.1 Soft Constrained Particle Filter Design

The vehicle motion is usually represented in a discrete form by a dynamic state model (or process model) describing how the vehicle states evolve over time and an observation model (or measurement model) that describes the relation between the states and the available observations, such that


where is the process model and is the observation model, is the state vector, are measured outputs, and are zero-mean uncorrelated process and measurement noises, respectively. Uncertainty models are included in order to deal with unmodeled dynamics, perturbations or measurement errors [merlinge2019boxPF]. In this context, Bayesian filters are of paramount importance, e. g.Kalman Filter (KF) and Particle Filter (PF), and applied to estimate the posterior distribution over the current vehicle state denoted as .

In this paper, Particle Filter is applied to vehicle position estimation due its capability of dealing with nonlinear and non-Gaussian dynamic systems [arulampalam2002PF]. Aiming at improving the localization estimation, additional information about the system besides the vehicle fundamental dynamic are incorporated as states constraints in the state estimation problem, restricting the estimation to a certain feasible region. The use of state constraints may improve the accuracy of the estimation and reduce its uncertainty degree [liu2019cPF].

This information could be directly extracted from the available HD map, such as the road network (Road Constraints). It could be also related to known kinematics constraints of the object, physical laws, speed constraints, among others (Kinematic and Other Constraints) [SHAO2010143]. Particularly, we are interested in additional external knowledge obtained from the vehicle perception system, i. e. information extracted from images acquired by the monocular camera. More precisely, the objective is to implement state constraints by means of the predicted relative distance between the detected objects (matched landmarks) and the vehicle (Landmark Constraints).

State constraints are normally defined as a set of linear or nonlinear inequalities which can be expressed as,


where are hard constraint functions at time , and the inequality sign holds for all elements.

Considering that there are also uncertainties related to the constraints, such as in the landmarks global position or in the predicted relative distance, designing soft constraints instead of hard constraints is advisable, as discussed in [liu2019cPF]. In this case, and for convenience, Eq. 4 can be replaced by [liu2019cPF]:



is an unknown vector of nonnegative random variables,

with , that follows a pdf , which makes Eq. 5 nondeterministic. Considering the random variables are independent of each other, , and the pdf

must be defined prior to state estimation. Two distributions are considered in this work, although others could be also selected. The first one is the exponential distribution with mean

defined as,


that may be used for road and speed constraints, as suggested by [liu2019cPF]. The second one is the zero-mean and truncated Gaussian distribution with variance , given by


that can be promptly applied to the landmark constraints since the proposed GPR model already provides the variance of the predicted relative distance that could replace in the above equation. It is worth mentioning that this approach makes the soft constraint adaptive to the relative distance prediction uncertainty, relaxing the constraint when the prediction model is not reliable or tightening it in the opposite scenario.

Input: Data set: consisting of images Road Boundaries (polynomials) Kinematics Constraints GPS co-ordinates of features for Landmark Constraints State Constraints
for img  do
         Distance Prediction:
        (bbox,mask) (img);
        (,) (bbox,mask) from Eq. (1) and (2);
         State Estimation - scPF:
        Compute from Eq. (6) for road and kinematic (e.g. speed) constraints;
        Compute from Eq. (7) using and for landmarks based constraints - i.e. distance predictions;
        Stack the obtained constraints;
        Estimate the state using a Bayesian filter (e.g. PF) with the stacked constraints: and the process model in Eq. (3);
Output: : Vehicle position and velocities
Algorithm 1 Soft Constrained Navigation

Thus, assuming that matched landmarks with known global localization are detected in image , acquired at discrete time , and their distance to the vehicle is already predicted by the GPR model with mean and variance, and assuming that other external knowledge may be also available, such as vehicle maximum speed and road network, the problem is to estimate the vehicle localization at time given the HD map, the system state space-model and the designed soft constraints. In order to solve this state estimation problem, the Soft Constrained Particle Filter (scPF) proposed in [liu2019cPF] is implemented.

A remarkable feature of the proposed approach is that it can be used together with other localization methods, such as SLAM. Moreover, if there is not landmark detected in the current camera image, the state estimation proceeds normally as an unconstrained approach. The proposed approach is summarized in Alg. 1.

Figure 3: WATonoBus: (a) photo of the bus, and (b) dimensions (in meters) and position of the sensors (red) relative to the vehicle body – heights above ground (green) are measured with respect to the road surface, transformations between sensors are shown in blue. Note that the LiDAR frame is used as a reference for measurements, but is not required by the algorithm. However, it might be required if the approach in Figure 10 is pursued for further enhancement.

4 Experiments and Discussions

The proposed framework was evaluated using a data set acquired from WATonoBus, the University of Waterloo autonomous shuttle bus (Figure 3 (a)). WATonoBus is an electric vehicle with a capacity of 7 passengers. Its main dimensions, the sensors mounting positions, reference frames, and transforms are shown in Figure 3 (b). The measurements from the monocular camera and the GPS+IMU form the necessary sensory inputs. In our experiments, we used a 3.2MP Basler camera and Applanix POS LVX to obtain these measurements respectively.

The shuttle bus was driven on part of the university Ring Road (a route of 1.2 km concluded in approximately 5 minutes as shown in Fig. 4 (a)), and a data set composed of 4000 samples was built. Ground truth of the shuttle bus position was acquired from Applanix POS LVX to calculate the algorithm errors and also to yield position measurements (observation data) to emulate rough GPS position data corrupted by a white Gaussian noise with variance . Images were acquired from the monocular camera, and the overall sampling frequency was 13 Hz during the experiment.

4.1 Map Features

4.1.1 Ring Road and Poles Localization

It is important to have an accurate position of the poles locations for GPR models training purposes and also for the algorithm operation. However, in order to make our approach more practical, we obtained the 99 Ring Road poles locations directly from Google Maps although we know this may lead to errors in the range of 1-2 meters. Besides, the Ring Road center line was obtained directly from Applanix POS LVX measurements. The Ring Road and mapped Light Poles are shown in Figure 4 (a).

Figure 4: Ring Road and light poles localization (a), details of constraints designing (b), constraints’ surface response (c), and some prediction of the proposed scPF localization approach (d).

4.1.2 Poles Detection and Segmentation

In this work, Mask R-CNN instance-based segmentation algorithm was implemented to detect and segment Ring Road light poles in the monocular camera frames. To this end, 250 images out of the collected 4000 images were labeled (pole’s bounding box and segmentation) using Labelme annotation tool, from which 200 images (or 314 poles instances) were used for model training and the remaining 50 (or 72 poles instances) for model validation. We fine-tuned a Mask-RCNN with Resnet 50 and Feature Pyramid Network (FPN) backbone (ResNet-50-FPN) pretrained on COCO (Common Object in COntext) dataset


. This procedure was realized on Detectron2, an open source PyTorch based modular computer vision model library developed by the Facebook AI Team. After fine tuning the model over 10000 epochs, no false positive was detected by the model on both training and validation data, the threshold used for detection was 0.9. Besides, 16 out of 386 (6 %) poles were not detected (false negatives), 10 of training data and 6 of the validation one. These not detected poles were far from the vehicle, making the detection task more difficult. It is important to emphasize that precision is more important than recall in our approach since a false positive would add an erroneous constraint. Figure

2 presents two light poles detected and segmented by the trained Mask-RCNN in a sample image. As a role of thumb, we observed that bounding boxes were well predicted in all samples, but closer poles (those with vanished light bulbs) were much better segmented than the others. This is expected since closer poles are much well defined in a pixel-level, which may be useful to predict the distance between these poles an the vehicle, as discussed next.

4.1.3 Poles Distance Prediction

In order to predict vehicle-to-landmark distance, a mixture of experts approach composed of two GPR models was developed. The implementation of this approach has two motivations: i. different models should be used since detected landmarks shapes may vary (for instance, bounding boxes’ height and width ratios or the availability of segmented pixels could be very distinct as shown in Fig. 2); ii. GPR models may be very computational demanding if large data sets are used to train, although this is not an issue in the present work.

For developing the mixture approach, we implemented a Gaussian Mixture Model (GMM) clustering algorithm to find two clusters, one to represent the closest poles (without lamp bulbs) and another related to poles that are far from the vehicle (normally with lamp bulbs). We defined two features as inputs for the clustering technique: ratio between bounding box’s width and height and pole thickness (see Fig. 2). Thus, the GMM was trained using the training output of Mask-RCNN, during 1000 iterations, considering diagonal co-variance.

Although we have implemented an unsupervised method, we also labeled the poles in order to validate the clusters found by the GMM. In this case, poles with light bulbs were defined as 1s, poles where the upper parts are completed vanished were designed as 0s, and poles which the upper part does not contain the lamp bulb and is not completely vanished, we attributed numbers between 0 and 1, accordingly. Figure 2

shows poles assigned values used to verify the clusters separation surface. As can be observed, the GMM separation surface defined well the two kinds of detected poles (poles are assigned to the cluster with highest posterior probability), and those poles we labeled with values different from 0 or 1 are normally found close to the separation curve (the color bar represents our labeled values and poles which assigned values are greater than 0.5 should be classified as 1 and


After clustering the poles, two GPR models were trained, one for each cluster. Both were trained using a holdout cross-validation (CV) procedure (100 repetitions) where 80% of data were used for training and the remaining 20% for validation (training data was the same as of Mask-RCNN). The selected kernel function was from class Matérn with parameter and with automatic relevance determination (ARD) [rasmussen2006]:


where hyperparameters

are positive, is the output scale and is the input scale. The hyperparameters are obtained through a gradient-ascent based optimization tool, by maximizing the log-likelihood of the training data [rasmussen2006].

The GPR model trained with closer poles cluster was fed with four bounding boxes features (minimum and maximum pixels values on and image frame - ) and one segmentation feature (thickness of segmented pole on central part of the bounding box). The GPR model trained with far poles cluster was fed with only the four bounding boxes features. Figure 5

(a) and (b) presents some poles-to-vehicle distances predicted by the trained GPRs over validation data. It is interesting to observe that the target outputs are inside the GPR provided confidence bars proving that GPR is a very promising approach for this application. The mean value of the CV predicted distance absolute error, for training and validation data on both clusters were, respectively,

 m and  m (close poles cluster), and  m and  m (far poles cluster).

Figure 5: Comparison between GPR performance for some samples of validation data (cross-validation): (a) Cluster of close poles and (b) Cluster of far poles.

To verify that the use of segmentation feature for closer poles helped to improve the distance prediction, we compared the previous cross-validation prediction results with those from a GPR model with only Bounding Box information as inputs. A Tukey test procedure with 95% confidence was preformed and shown in Fig. 6 (a), from where we can observe that the extracted thickness of the poles improved the prediction accuracy. It is worth mentioning that only one simple feature based on segmentation was used in this work, we believe that adding other segmentation features may further improve the model prediction. Besides, the use of a mixture of experts (ME) is also advisable, since our ME approach outperformed a single model using the same inputs as can be inferred from Fig. 6 (b). The use of ME is recommended for large data sets.

Figure 6: Comparison of models distance prediction absolute error on cross-validation data: (a) using only bounding box features (BB) or using bounding box together with segmentation features (SEG); and (b) using single model or mixture of expert (ME) approach.

4.2 Vehicle Navigation

For the sake of simplicity, we assume that the shuttle bus dynamics can be expressed by the discrete-time nearly constant velocity (CV) model (process model) [li2003surveydyn],


where is the state vector composed of the vehicle positions and velocities in and directions, is the sampling time and is a zero-mean white Gaussian noise with covariance , given by [liu2019cPF]:


where is the process noise intensity. Besides, we consider the road surface flat, thus neglecting any height information in the vehicle movement and in the map data. It is worth mentioning that since our proposed approach is based on designing constraints for state estimation, any other vehicle motion model could be used, such as the Constant Turn Rate and Velocity (CTRV) model among many others, as described in [li2003surveydyn, schubert2011models]. The same applies to the observation model here designed, where only rough GPS and position measurements are used, such that , and is the sensor noise modeled as a two-dimensional zero-mean white Gaussian noise with covariance .

It should be noticed that the proposed approach does not depend on the implementation of any observation model, regarded the detected landmarks in the current frame are correctly matched and the vehicle initial state is known. That is, after initialization, our approach can be executed without GPS signal. Nonetheless, the observation model was implemented in our experiments to verify the influence of in the localization results and to compare the proposed approach with an unconstrained PF.

In addition to the implementation of the aforementioned state space model, three types of soft state constraints were designed to implement the scPF:

where , and are nonlinear polynomial functions of related, respectively, to the Ring Road constraint, the Speed constraint and the Light poles constraints, where and is the number of detected landmarks. Besides, are random variables that account for the uncertainties about the road boundaries, vehicle speed limit and landmark distance prediction. It should be pointed out that many other constraints could be also included here, such as curb or lane distance prediction, some ambient signals of opportunity (SOPs) as the cellular signals, among others.

The Ring Road constraint is used to constrain the vehicle position by the road boundaries. Considering that the centerline of the Ring Road is available, we assume that penalties would be only applied to a certain Cartesian point (, ) if it is more than 4 meters away from the centerline. Figure 4 (b) presents examples of Ring Road boundaries where the light blue area is considered unconstrained. To this end, and considering the smoothness and shape of the Ring Road, the road centerline was decomposed into a sequence of road segments represented by third order polynomials, as used in [liu2019cPF]. The number of segments is the same as the number of light poles (99) since each segment represents the centerline in the vicinity of the respective light pole, such that the road constraint can be written as:


where represents the coefficients of the segment . Thus, the coefficients to be used in the constraint are those related to the light pole closer to the last estimated vehicle position. The interested reader could refer to other papers in order to apply more efficient and accurate approaches [chen2010, schindler2013]. Since we are using soft constraints, the localization approach may deal with some road network uncertainties. In this regard, is defined as a random variables that follows an exponential distribution as presented in Eq. 6, with equals to 0.25 as suggested in [liu2019cPF].

Considering the shuttle bus has a speed limit of 12 m/s, the following nonlinear function was designed,

and with equals to 1 as in [liu2019cPF].

Finally, for light poles (landmarks) constraints, the distance between the vehicle and the landmark predicted by the GPR model (Eq. 1) is employed, such that

where and represent the detected landmark localization and is the predicted distance considering the current camera frame, as presented in Fig. 4 (b). Besides, the predicted distance variance, (Eq. 2), is used as the variance of the zero-mean truncated Gaussian distribution (Eq. 7). It is worth mentioning that, differently from the other implemented constraints, the landmark constraints uncertainties can change over time, according to the GPR variance predictions. This is an interesting feature of the proposed approach since unforeseen situations may occur during real application, e. g. object partial occlusion, and the predicted variance should be higher in those scenarios indicating that the predicted distance is probably not reliable. In other words, the variance ponders the use of that specific landmark constraint during the estimation of the vehicle localization.

In order to verify the scPF performance, the acquired data set composed of 4000 samples with sampling time  s was used. Applying the trained Mask-RCNN model to all 4000 images, it was possible to observe that the number of poles detected varies for each frame, as shown in Fig. 7. The most frequent scenario is the detection of only one pole, although some occurrences of 2 or even 3 detected poles were found. Moreover, in some frames no poles were detected and then no landmark constraints are applied to the PF in that specific time step.

Figure 7: Number of poles detected by Mask-RCNN in each image frame.

To analyze the proposed scPF performance, 50 Monte Carlo (MC) runs were performed where different realizations of the measurement noise were generated. The proposed approach was also compared with an unconstrained PF, and both methods were tested on different measurement noise standard deviations

in order to observe the influence of this noise on algorithm’s performance. To achieve the best results of each algorithm, the process noise intensities were defined as for the unconstrained PF and for the scPF. The initial condition of the filters was achieved by the initial frame ground truth localization corrupted by noise with covariance matrix as in [liu2019cPF].

The results of the scPF and PF algorithms based on 500 particles over different measurement noise levels are presented in Fig. 8. It is shown that the proposed scPF can reach low localization error (below 1 meter) even increasing the level of the measurement noise, which is not true for the unconstrained PF. Indeed, it can be concluded that the proposed approach does not depend on observation data since vehicle absolute localization information is available from the detected poles and the respective vehicle-to-pole distance predicted by the GPR models. Besides, when the measurement noise level is low, the localization performance is still improved, achieving a mean value of 0.88 m when  m.

Figure 8: Mean localization error and confidence levels of 95% over 50 Monte Carlo runs of PF and scPF algorithms using different measurement noise levels.

It is worth mentioning that the proposed scPF is very promising since good localization results were found using a very simple process model and using a limited number of detected landmarks for constraints design. To make the roles of the designed constraints on the proposed approach clearer, we presented in Fig. 4 (c) the sum of the constraints values (two landmarks constraints together with the road and speed constraints) for a specific time step on the testing route (Fig. 4 (c) should be compared with Fig. 4 (b)). Coordinates that do not violate the constraints have dark blue colors whereas coordinates with yellow color strongly violate one or more constraints. Thus, the constraints add supplementary information to help the PF algorithm predict the vehicle localization basing on the previous system states.

Figure 9: Localization error of scPF for each time step on the Ring Road testing route.

Figure 9 presents the localization error for each time step of the shuttle bus trajectory predicted by the scPF (  m). Some predicted localization values are also shown in Fig. 4 (d). The mean error of this specific run of the scPF was 0.98 m and the median error was 0.73 m. In general, lower errors were found in frames with more landmark constraints (more poles detected), as expected. On the other hand, higher errors were achieved when no poles were detected, in these situations the localization approach relies only on the process and observation models. Since both models have high associated uncertainties due to their simplicity or measurement errors, the predicted localization error is higher.

Figure 10: Sensor coordinate frames as well as the position vectors including the static translation between the GPS frame {A} and the LiDAR frame {V}, and the position vector of the GPS frame {A} and the feature point of interest P in the fixed local reference frame {R}.

Besides, it is worth mentioning that obtaining pole locations directly from mapping software such as Google Maps is subjected to uncertainties, which might have also worsened our algorithm performance in some frames. In order to overcome this shortcoming, an approach based on LiDAR measurements (if available) may be applied to obtain the position of the poles with respect to the moving LiDAR frame, {V}, and subsequently transform it to a fixed local reference frame, {R}, using a sub decimeter accurate GPS, as presented in Figure 10.

Thus, in order to improve the localization performance we recommend to use more mapped landmarks (for instance, traffic signs), to obtain the landmark location based on a precise method, and to implement a process model that explains the vehicle dynamics in more details.

5 Conclusion

In this paper a generic feature-based localization approach using soft constrained Particle Filter is proposed to enhance autonomous vehicle navigation. Examples of soft constraints based on road location, on landmark-to-vehicle distance prediction and on vehicle maximum speed were provided. The use of soft constraints in this application is very promising since uncertainties are expected to be found, for instance, on road location, landmark position and landmark-to-vehicle distance prediction. Localization error below 1 m was found even in scenarios with very noisy observation GPS data, showing that the proposed approach can be used as an add-on to other navigation approaches is such situations.

Regarding the landmark based constraints, a novel approach was introduced where Gaussian Process Regression models arranged in a Mixture of Experts scheme were obtained to predict the distance between the vehicle and known mapped landmarks. In this way, GPR predicted distance’s mean and variance were both used for soft constraints designing, meaning that the constraints are continuously adapted to each frame. The output variance provides a reliability index to the current constraint to predict the vehicle location in that specific time step, i. e. the greater the variance the less the effect of the constraint. Moreover, in order to improve the vehicle-to-landmark distance prediction, features based on landmark instance-based segmentation were used as GPR inputs.