Using Machine Learning to Detect Ghost Images in Automotive Radar

07/10/2020 ∙ by Florian Kraus, et al. ∙ Daimler AG 0

Radar sensors are an important part of driver assistance systems and intelligent vehicles due to their robustness against all kinds of adverse conditions, e.g., fog, snow, rain, or even direct sunlight. This robustness is achieved by a substantially larger wavelength compared to light-based sensors such as cameras or lidars. As a side effect, many surfaces act like mirrors at this wavelength, resulting in unwanted ghost detections. In this article, we present a novel approach to detect these ghost objects by applying data-driven machine learning algorithms. For this purpose, we use a large-scale automotive data set with annotated ghost objects. We show that we can use a state-of-the-art automotive radar classifier in order to detect ghost objects alongside real objects. Furthermore, we are able to reduce the amount of false positive detections caused by ghost images in some settings.



There are no comments yet.


page 1

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Advanced driver assistance systems and automated vehicles are major trends in the current automotive industry. For the environmental perception, a wide sensor suite is used to provide a high robustness and fulfill safety requirements. The most popular three sensors in this regard are camera, lidar, and radar sensors. Each sensor has its own advantage, e.g., camera has high angular resolution, lidar has the most dense 3D perception, and radar provides single shot velocity estimation via the Doppler effect. Another major difference for radar sensors is there substantially larger wavelength around

for a radar. This long wavelength allows the radar signal to pass through many objects and conversely makes it much more robust to adverse weather conditions such as snow, rain, and fog. Nevertheless, a disadvantage for radar waves is that many objects in real world scenarios act like mirror surfaces due to their highly specular reflection properties. Recently, we could show that mirrored non-line-of-sight detections have the potential to be utilized as an early warning system for collision prevention [1]. While applications like this are certainly beneficial towards autonomous driving, the task to detect the presence of a multi-path detection still relies on additional sensors. In this article, we use a large data set in order to investigate this challenge in real world scenarios. We utilize machine learning techniques to discriminate mirror objects from real objects and background detections by using only radar data without any additional information. To this end, we apply PointNet++ [2]

, i.e., an end-to-end neural network architecture which directly processes radar point clouds. We compare several interesting settings and report quantitative and qualitative classification results.

Fig. 1: Real world scenario with multi-path reflections (ghost objects) of a cyclist. Different types of multi-path reflections are highlighted alongside the original ones. The small dots correspond to a lidar reference system, the bigger ones are the radar reflections. Both, radar and lidar, indicate the presence of a reflective wall which is responsible for the multi-path wave propagation.

Ii Related Work

Multi-path radar detections are a well known phenomenon in radar processing. They can be utilized, e.g., to detect objects via reflections beneath a car that occludes the direct path, or for height estimation with conventional 2D radar sensors [3]. Different types of multi-path occurrences are analyzed, e.g. in [4]. Many of the possible multi-path types can be removed by using an active beam steering method for the transmission signal as well as direction-of-arrival estimation in the receive array [5]. However, this method requires several repeated measurements which decreases the sensor update rate. Moreover, it cannot remove all types of multi-path effects which motivates the search for a detection system distinguishing multi-path from direct-line-of-sight objects. To this end, other authors detect and even reconstruct ghost objects, but require knowledge about the reflector geometry [6, 7, 8, 9] or assume rigid objects with known motion model [10] or object orientation [11]. These assumptions can be eliminated by using a lidar system for detecting reflective surfaces as shown in [1]. In the latter, the ghost images are even used for a collision prevention system, highlighting the severity of mirror objects if they are undetected. A first attempt to remove multi-path using machine learning was done by [12]

. They used handcrafted features with random forests and SVM variants on a small automotive data set.

Iii Multi-Path Detections

Fig. 2: Illustration of different multi-path occurrences.

Ghost detections occur whenever a multi-path reflection is received by the sensor and not filtered out during early preprocessing. For example, the signal could first bounce off a reflective surface, then to the object of interest, and back to the receiver. This example would lead to a detection at the same angle as the original object but would be perceived but in greater distance.

There are two fundamental types of multi-path reflections. Those where the last bounce happens on the object and those where it bounces off the reflective surface. To distinguish between both, we use the convention from [13] and refer to them as type-1 and type-2 reflections. Type-1 detections bounce back from the object and type-2 from the reflective surface, cf. Fig. 2. Furthermore, multi-path reflections are categorized by the number of objects they bounce off. A signal which is directly reflected off an object is called a first-bounce detection. Each additional bounce increases the order leading to second-, third-, and higher-order bounce detections. With this nomenclature, the above example would be classified as a type-1 second-bounce multi-path reflection, or type-1 second-bounce detection.

Each bounce consists of a diffuse and a specular reflection part. Hence, not all energy is preserved in the signal, i.e., higher-order bounces correspond to a lower received signal energy. This leads to a rapid decrease in the amount of detection points at higher-order bounces for most reflection surfaces. In this article, we focus on third-bounce reflections of type-2. This focus has several reasons: First, type-2 odd-bounce reflections can occur irrespective of the presence of a direct-line-of-sight path to the real object. While this gives the opportunity to predict objects before they would be visible otherwise (cf.

[1]), there are no direct reflections for additional reasoning about the likelihood of the estimated detections to be ghost reflections in this case. Second, the independence from direct-line-of-sight objects increases the amount of data that can be utilized in scenarios where the object of interest is occluded over longer periods of time, as it is case in the utilized data set. In addition, higher-order bounces, e.g. fifth or seventh, usually result in substantially less detection points and lower signal amplitudes, i.e., they are less likely to be mistaken for a real object.

Iv Data Set

(a) Model trained on: background and object. Object is correctly detected (green) and ghost is correctly classified as background (blue). 

(b) Model trained on: background and ghost object. Object is correctly detected as background (black) and ghost is correctly classified (yellow).

(c) Model trained on: background, object, and ghost object. Object is correctly detected (green) and ghost is correctly classified (yellow).

(d) Model trained on: background, object, and ghost object. All the images in this row are all one frame apart. Here everything is correctly classified.

(e) In the next frame the ghost object is confused with a real object. 

(f) Back to correctly detecting real and ghost object. 

(g) Model trained on: background and object. Object is correctly detected (green) but some ghost detections are confused with a real object (red) whereas others are correctly classified as background (blue).

(h) Model trained on: background and object. Object is correctly detected (green) but one ghost detections is confused with a real object (red). 

(i) Model trained on: background and object. Object is correctly detected (green) but the type-1 second-order multi-path reflections are wrongly classified as a real object (pink). 
Fig. 3: Qualitative evaluation: The first row shows the same frame for different models all classified correctly. In the second row, a failure example is shown where a ghost object is wrongly classified as a real object during a single time frame. In the last row, different failure modes are highlighted. In all figures the cyan colored points represent correctly classified background. For better visualization we only show a small region of interest, as a result the ego-vehicle is not visible.

For all our experiments, we use the same data set as recorded in [1]. The data set consists of 25 different scenarios which are repeated four times on average, resulting in a total of 100 data recordings. Each recording contains a single vulnerable road user (VRU) which is either a pedestrian or a cyclist. Each scenario starts with the VRU moving away from the ego-vehicle on a fixed path next to a reflector. Reflectors comprise, e.g., parked cars, building structures, curbstones, or guardrails. The recordings continue until the object is out of direct sight, turns around, and approaches the vehicle again. Once the VRU is back at the ego-vehicle, the measurement stops.

For data recording, we use two experimental radar sensors mounted in the front bumper of a test vehicle. The sensor specifications can be found in Tab. I.

TABLE I: Radar sensor specification.

The upper half of the table represents the frequency range of the emitted signal and the operational bands for range (distance) , azimuth angle , and radial (Doppler) velocity respectively. In the second part the resolutions and for , , , and time are noted. The radar data is labeled using a global navigation satellite system (GNSS) reference which is mounted in a wearable backpack following [14]. All automatically labeled data were manually checked and corrected if necessary. This step is important because high buildings, used as reflective surfaces, sometimes lead to severe multi-path errors in the GNSS signal.

Our data set originally consists of five different classes, pedestrians and cyclists, their corresponding type-2 third-bounce ghost objects, and other background detection points. The background class consists of static points, measurement artifacts, and other clutter.

The total recording time adds to about , the class distribution among detection points is given in Tab. IV.

Pedestrian Ghost Ped. Bike Ghost Cycl. Garbage
TABLE II: Data set detection distribution over all classes. Ghost detections represent type-2 third-bounce reflections in this case.
(a) Model trained on: background and object. 
(b) Model trained on: background and ghost object.
(c) Model trained on: background, object, and ghost object.
(d) Model trained on: background, cyclist, and pedestrian.
(e) Model trained on: background, ghost cyclist, and ghost pedestrian.
(f) Model trained on: background, cycl., ped., ghost cycl., and ghost ped.
Fig. 4: Confusion matrices for all trained models.

V Methodology and Experimental Results

Our method expects as input a radar point cloud which is resolved in range , azimuth angle , Doppler velocity , and time . Furthermore, each point is described by an amplitude which is an estimation of the radar cross section of the reflecting object part.

V-a PointNet++

To segment the radar point cloud, PointNet++ is utilized [2]. PointNet++ is an neural network architecture for processing point cloud data (PCD). It is based on the earlier PointNet architecture which was developed to consume point clouds without preprocessing steps such as rendering the point cloud into an image grid. PointNet++ can be used for classification and semantic segmentation of point clouds.

When used for segmentation tasks the architecture of PointNet++ resembles those of hour-glass networks used for image based semantic segmentation.

The point cloud is downsampled multiple times into a coarser representation by using ”multi scale grouping” (MSG) modules. Each MSG module consists of three stages. First, iterative farthest point sampling is applied to collect center points. Second, clusters are build around those center points by grouping all points within a epsilon region around the center point. This second grouping stage is applied for different radii, thus the name ”multi scale grouping”. Then, in a third and final step, each cluster or local region is processed by a mini PointNet to encode local features. The output of the final stage equals the new input to the next MSG module.

In a second upsampling stage, the downsampled data is upsampled to reconstruct the original point cloud and provide a label for each original data point. The upsampling is handled by ”Feature Propagation” (FP) modules. These modules propagate the low-level features to the original points.

After the upsampling, the features are processed by fully connected (FC) layers with dropout for a final classification of each point.

Since PointNet++ was designed to process 3D data, the original code is changed to handle the 2D data provided by radar sensors, as demonstrated in [15].

For more information on PointNet++ see the original paper [2] and [15] where it was first used to segment radar data.

The architecture proposed by [15] is used. It consists of three MSG modules with three corresponding FP modules and three subsequent FC layers. The input point cloud is fixed to 2048 points. We accumulate the radar data over a time window of , this is a common approach to counteract the sparsity of radar data. Furthermore this is an easy and effective way to combine the data of our dual-sensor setup. In [15], they accumulated the data over due to a sparser radar system than the one used by us. If the accumulated point cloud does not consist of at least 2048 points, the ones with the highest amplitude are duplicated. If there are too many points, then the points with the lowest amplitude are discarded.

To account for class imbalances present in the data set, the loss for each point is scaled relative to the inverse proportion each class takes up in the data set. This results in the following loss :


with the ground truth , the prediction , the total number of classes and the proportion of the occurrence of in the data set.

For each experiment, the network is trained for 100 epochs and the best checkpoint is selected based on the performance on the validation split. A random subset of

from the training set is used as a validation set.

V-B Experimental Setup

Index Classes used during training
1 bg, obj
2 bg, ghost-obj
3 bg, obj, ghost-obj
4 bg, ped, cycl
5 bg, ghost-ped, ghost-cycl
6 bg, ped, cycl, ghost-ped, ghost-cycl
TABLE III: Training Setups
Score Obj Ghost-Obj Ped Ghost-Ped Cycl Ghost-Cycl Averagefootnotemark: Bg Experiment
F1 1
Recall 1
Precision 1
TABLE IV: Scores for all experiments.

For training, the labels are split into two experiments. One experiment with the original five labels described in Sec. IV: Pedestrians (ped), cyclists (cycl), ghost pedestrians (ghost-ped), ghost cyclists (ghost-cycl), and background (bg). The second experiment groups (ghost) pedestrians and (ghost) cyclists together in the (ghost) object class (obj, ghost-obj). The first experiment aims not only to classify vulnerable road users against the background but also to discriminate between different kinds of road users. In the second experiment, the aim is to separate relevant objects from the background without further differentiation.

For each experiment, three models are trained. One standard model which only classifies objects (ped, cycl), another which classifies only ghost objects, and a third one classifying ghost and real objects. An overview is given in Tab. III.

When evaluating each model, all classes from the associated experiments are used to create a full confusion matrix.

During evaluation, all respective classes are used, e.g., the model trained on background vs. object was evaluated against background, object, and ghost object. In this manner, it is possible, to evaluate the confusion of each model between ghost and real objects.

Vi Results

In this section, the quantitative and qualitative results of the experiments described in Sec. V-B are presented.

Vi-a Quantitative Evaluation

Vi-A1 Performance Metrics

For the evaluation, we employ three different scores: precision, recall, and F1. Where the latter is the harmonic mean between recall and precision. We employ those scores for each class separately. The scores are defined as follows. Precision is the percentage of correct predictions per class:


Recall is the percentage of correctly identified points per class:


The F1 score combines the above into a single score as the harmonic mean of precision and recall:


Vi-A2 Evaluation

The results for this evaluation are listed in Tab. IV. Furthermore, confusion matrices are presented in Fig. 4.

First, we compare the difference in performance when training only on objects, ghosts, or both at once. We note a slightly higher F1 score for the object class when only training on real objects compared to training on real objects and ghosts. However, the difference of to is minor, highlighting that it is possible to detect ghost objects without sacrificing performance on real objects. When only training on ghost objects vs. background, we detect an increase of from to in the F1 score on ghost objects compared to the model jointly trained on ghosts and objects. However, this increase in the F1 score is due to an large increase in recall ( to ). At the same time, a decrease in precision is present ( to ). The only slightly increased F1 score reflects this loss in precision. 11footnotetext: Average in table does not include background (bg) score, since only the scores on the foreground classes are relevant for the overall (average) model performance.

This is further demonstrated by the two confusion matrices in Fig. 3(b) and Fig. 3(c). The model trained on ghosts and background correctly classifies ghost detections, but also misclassifies real objects ( background points) as ghosts. The model trained on objects, ghosts, and background, however, only misclassifies () points while only correctly classifying ghosts.

Based on those results, it seems beneficial to train combined on objects and ghost objects at the same time to reduce confusion between the two.

When evaluating the model only trained on object vs. background, another interesting fact is gained from those experiments. Only out of false positives are due to misclassification of a real object as a ghost, accounting for only of all false positives, cf. Fig. 3(a)

. This is certainly less than expected, showcasing that modern deep-learning-based approaches already can distinguish between real and ghost objects. This percentage is slightly increased to

when jointly training on objects and ghost objects. However, in a qualitative analysis we find that misclassified ghosts are often correctly classified as ghosts a few timesteps earlier. This is further discussed in Sec. VI-B.

A similar picture is painted when looking at the models trained on pedestrians and cyclists separately. However, the performance difference when jointly training on ghosts and real objects versus only training on real objects is slightly larger.

The most interesting fact occurs when comparing the model trained on background, pedestrians, and cyclists to one trained on background and object. A much larger proportion of false positives are due to confusion with ghost objects, cf. Fig. 3(d). For the cyclist class out of false positives are due to misclassified ghosts or . If we ignore the false positives on the pedestrian class, this increases to . On the other hand, for the pedestrian class we only have or . This suggests that when trying to discriminate between pedestrians and cyclists, the model has a higher chance to get confused by ghost objects. This is also highlighted by the fact that the model trained on pedestrians and cyclists has a total of false positives on ghosts compared to the of the model trained on background vs. object. This is an increase of roughly . Whereas, the false positives on the background only increases by roughly : compared to .

Nevertheless, this effect vanishes completely when jointly training on background, pedestrians, cyclists, ghost pedestrians, and ghost cyclists, cf. Fig. 3(f). In this scenario only false positives are due to ghost objects, an improvement of almost .

But on the other hand, this yields a higher confusion between real pedestrians and cyclists.

From this we conclude, it is beneficial to add ghost labels when training on multiple classes. Even if this increases intra-class confusion it greatly reduces the amount of false positives due to ghost objects. In either case adding ghost objects to the training data yields a respectable F1 score of around for ghost objects. Although not on par with the detection rate of real objects this can be beneficial in scenarios where ghost objects are desired as, e.g., in [1] or when trying to estimate the existance and orientation of a mirror-like surface.

Vi-B Qualitative Examples

In Fig. 3, qualitative results are shown. The first row consists of images where all models correctly segment the scene according to their different training objectives.

In the second row, an interesting case is highlighted. Each image in this row is one frame apart from the other. In the first and the last image the ghost object is correctly classified as such (Fig. 2(d), Fig. 2(f)). However, in the middle frame the ghost object is suddenly misclassified as a real object (Fig. 2(e)). This shows that using temporal information could be of great value when combined with our work. For example, applying a tracker could help to suppress those types of failure modes.

In the last row, two confusions of ghost objects with mirrors are showcased, cf. Fig. 2(g) and Fig. 2(h). In Fig. 2(i), we showcase a failure mode of our training data. The model wrongly classifies a type-1 second-bounce object as a real object. These second-bounce objects are not annotated in our data set and, thus, are easily misclassified as real objects.

Vii Discussion

In this article, we evaluate the effects of ghost objects on modern deep-learning-based approaches on a large-scale automotive data set. Furthermore, we show that by using labeled ghost objects during training, we can detect those challenging objects with a precision and recall of around and , respectively. We found that adding ghost labels to the training scenarios with multiple positive labels, e.g., pedestrians and cyclists, dramatically reduces false positives. There are also certain shortcomings to our approach, all due to missing data. In a qualitative analysis we found false positives due to unlabeled second-bounce multi-path reflections. Those kinds of reflections are currently not annotated in our data set. Labeling those other multi-path reflections is challenging task and requires a lot of resources. Annotating even the real first-bounce reflection is hard, because of sparse and noisy radar data, therefore, we used an auto-labeling system for our annotations. Due to the largely unknown reflection characteristics of the VRUs in our experiments, this approach is not feasible for second-order multi-path reflections. The insights of this article indicate the potential of our approach and database. It shows that modern machine learning algorithms can indeed distinguish real objects from ghost detections. For future work, we plan to annotate all type-1 and type-2 reflections manually which will allow for a better analysis of ghost objects in radar data and to evaluate more sophisticated temporal approaches such as tracking. Furthermore, it would be interesting to investigate ghost objects caused by other road users such as cars or motorcycles.


The research leading to these results has received funding from the European Union under the H2020 ECSEL Programme as part of the DENSE project, contract number 692449.


  • [1] N. Scheiner, F. Kraus, F. Wei, B. Phan, F. Mannan, N. Appenrodt, W. Ritter, J. Dickmann, K. Dietmayer, B. Sick, and F. Heide, “Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar,” in

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    , Seattle, WA, USA, jun 2020, pp. 2068–2077.
  • [2] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in 31st International Conference on Neural Information Processing Systems (NIPS).   Long Beach, CA, USA: Curran Associates Inc., dec 2017, pp. 5105–5114.
  • [3] A. Laribi, M. Hahn, J. Dickmann, and C. Waldschmidt, “Vertical Doppler Beam Sharpening Goes Self Parking,” in 2018 IEEE Radar Conference (RadarConf18).   Oklahoma City, OK, USA: IEEE, apr 2018, pp. 0383–0388.
  • [4] A. Kamann, P. Held, F. Perras, P. Zaumseil, T. Brandmeier, and U. T. Schwarz, “Automotive Radar Multipath Propagation in Uncertain Environments,” in 2018 21st Intelligent Transportation Systems Conference (ITSC).   Maui, HI, USA: IEEE, nov 2018, pp. 859–864.
  • [5] I. Vermesan, D. Carsenat, C. Decroze, and S. Reynaud, “Ghost Image Cancellation Algorithm Through Numeric Beamforming for Multi-Antenna Radar Imaging,” IET Radar, Sonar and Navigation, vol. 7, no. 5, pp. 480–488, 2013.
  • [6] A. Sume, M. Gustafsson, M. Herberthson, A. Janis, S. Nilsson, J. Rahm, and A. Orbom, “Radar Detection of Moving Targets Behind Corners,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 6, pp. 2259–2267, 2011.
  • [7] L. Qiu, T. Jin, and Z. Zhou, “Multipath Model and Ghosts Localization in Ultra-Wide Band Virtual Aperture Radar,” in 2014 12th International Conference on Signal Processing (ICSP).   Hangzhou, China: IEEE, oct 2014, pp. 2149–2152.
  • [8] R. Zetik, M. Eschrich, S. Jovanoska, and R. S. Thoma, “Looking Behind a Corner Using Multipath-Exploiting UWB Radar,” IEEE Transactions on aerospace and electronic systems, vol. 51, no. 3, pp. 1916–1926, 2015.
  • [9] O. Rabaste, J. Bosse, D. Poullin, I. Hinostroza, T. Letertre, T. Chonavel, et al., “Around-the-Corner Radar: Detection and Localization of a Target in Non-Line of Sight,” in 2017 IEEE Radar Conference (RadarConf).   IEEE, may 2017, pp. 0842–0847.
  • [10] F. Roos, M. Sadeghi, J. Bechter, N. Appenrodt, J. Dickmann, and C. Waldschmidt, “Ghost Target Identification by Analysis of the Doppler Distribution in Automotive Scenarios,” in 2017 18th International Radar Symposium (IRS).   Prague, Czech Republic: IEEE, jun 2017.
  • [11]

    I. H. Ryu, I. Won, and J. Kwon, “Detecting Ghost Targets Using Multilayer Perceptron in Multiple-Target Tracking,”

    Symmetry, vol. 10, no. 1, 2018.
  • [12] R. Prophet, J. Martinez, J.-C. F. Michel, R. Ebelt, I. Weber, and M. Vossiek, “Instantaneous Ghost Detection Identification in Automotive Scenarios,” in 2019 IEEE Radar Conference (RadarConf), apr 2019.
  • [13] J. Liu, L. Kong, X. Yang, and Q. H. Liu, “First-Order Multipath Ghosts’ Characteristics and Suppression in MIMO Through-Wall Imaging,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 9, pp. 1315–1319, 2016.
  • [14] N. Scheiner, S. Haag, N. Appenrodt, B. Duraisamy, J. Dickmann, M. Fritzsche, and B. Sick, “Automated Ground Truth Estimation For Automotive Radar Tracking Applications With Portable GNSS And IMU Devices,” in 2019 20th International Radar Symposium (IRS).   Ulm, Germany: IEEE, jun 2019.
  • [15] O. Schumann, M. Hahn, J. Dickmann, and C. Wöhler, “Semantic Segmentation on Radar Point Clouds,” in 2018 21st International Conference on Information Fusion (FUSION).   Cambridge, UK: IEEE, jul 2018, pp. 2179–2186.