Parking slot detection plays a critical role in valet parking, which necessitates the reliable method – being efficient and universal in practice [1, 2, 3, 4]. Recently, lots of works have been proposed to tackle this problem including free-space-based parking slot detection , ,  and vision-based method.
Free-space-based methods utilize sensors to detect free slots whose neighboring slots are occupied. Despite being simple yet effective in some cases, these methods are limited in realistic cases where neighboring slots are also free. Vision-based methods locate parking slots by recognizing parking slot markings from images such as line segments [9, 10] and marking points[4, 17]. In contrast with free-space-based method, it shows the great potential for the universal parking slot detection due to the rich contextual information from images, while overcomes the inability of localizing parking slot without nearby slots being occupied. Therefore, we base our method on vision.
However, it is still very challenging to detect the parking markings because of immense images complexity. Lots of traditional methods have been proposed – e.g., Hough Transform for line segment detection. However, the shortcomings of these methods have been widely recognized. Despite being efficient, it still has the poor performance due to the considerable variability of parking markings in practice. Nevertheless, the recent deep methods have endowed parking slot detection with the ability of tackling the variability by increasing the capacity of networks. Besides, these methods aim to detect marking points – intersection of line segments, to leverage point’s simplicity. By doing so, marking points based deep methods have dominated the parking slot detection.
Seminal works including DeepPS  and DMPR-PS  have been proposed to identify marking points for parking slot detection. The main difference of these two works lies in the manner of describing marking points. DeepPs  utilizes the rectangular descriptor to extract the pattern within the rectangular neighborhood of the parking slot vertexes . However, the rectangular descriptor is sensitive to the change of direction. Thus, directional descriptors with the T/L templates have been applied in DMPR-PS  to describe the vertex patterns. While this descriptor is more robust to the direction variation, it can only extract the vertex patterns of T/L-shaped parking slots, which is not suitable for describing the complex non-T/L-shaped scenarios such as oblique and trapezoid parking slots. In this work, we aim to improve this limitation.
To do so, we argue that there is no fixed pattern for the various parking slot vertexes, which makes it hard to find a universal way to describe different parking slot vertexes. To address this issue, we proposed a deformable circular descriptor in this paper to enable the network to learn the feature patterns of different types of parking slot vertexes. For different types of parking slot vertexes, the corresponding type of feature mode is used as the descriptor of the parking slot vertexes. Therefore, this descriptor can be compatible with different types of parking slot detection tasks and has better generalization ability. The comparison of different parking slot detection methods is depicted in Table I.
Additionally, the computation overhead of network severely restricts the application of deep learning algorithms in practical engineering application. For example, DeepPS and DMPR-PS  require powerful GPU to run deep learning algorithm. However, the mass-produced embedded environments merely have CPU, or less powerful GPU. Even though DMPR-PS is designed for the task of embedded system, it is still difficult to process real-time detection without powerful GPU. Given this situation, it is imperative to seek a highly efficient way of slot detection algorithm.
To this end, we tackle the task in the coarse-to-fine style to reduce the model complexity of the networks. Specifically, our algorithm decomposes the task into two stages as shown in Fig. 1. In particular, the first stage learns to regress the coarse position of the marking points. This denotes that the optimization of the first stage has the fast convergence due to the simplicity of task. The second stage takes as input the cropped sub-images centered at the predicted coarse position and outputs the finer position to further boost the performance – offset between coarse position and ground truth. Please note that we use the circular descriptor with different sizes for two stages. Coarse stage (i.e., the first stage) utilizes the larger size of circular descriptor than the fine stage (i.e., the second stage). In this way, our network (i.e., PSDet–Parking slot detection) is capable of being real-time while effective in practice.
Besides, to validate the performance in real-world applications, we collect and annotate a large-scale benchmark dataset – Parking Slot Detection Dataset (PSDD) which consists of 7 parking scenarios including brick, grass, oblique, trapezoid, open, rectangular and stereo parking slot. We empirically demonstrate the effectiveness and efficiency of our methods on the PSDD and ps2.0 datasets. The experimental results show that PSDet has much smaller computational complexity than other top-performing methods while achieving the competitive performance. In particular, PSDet achieves state-of-the-art precision rate of 95.67% and recall rate 98.21% on PSDD with 71 kb parameters and 25.3 FPS on CPUs – being much faster than the top performers.
In short, we summarize our contributions as follows:
We propose the two-stage PSDet to realize practical parking slot vertex detection – being real-time while achieving state-of-the-art precision rate and recall rate.
We annotate the large-scale benchmark dataset – PSDD which includes 7 different scenarios of parking slot. To the best of our knowledge, it is currently the dataset with the largest size and most types of parking slots from real world. We release the dataset for the benefit of community.
Ii Related Work
|Free-space based detection||Simple to implement|
|Sensitive to the surrounding environments|
|Vision based detection||Line Based detection||Sensitive to the appearance of lines, and lines are not uniform|
|Weak illumination robustness|
|Marking point based detection||Direction descriptor||Better detection of T-shaped or L-shaped|
|Not suitable for other type parking slot detection such as oblique|
|Rectangular descriptor||Weak ability to extract common pattern|
|Sensitive to rotation|
|Circular descriptor||More stable|
|Better ability to extract common pattern|
|Strong rotation robustness|
Dataset for Parking Slot Detection
As the largest and completely labeled public benchmark dataset, ps2.0  has been widely used for parking slot detection. It contains 12165 surround-view images, which are collected from typical indoor and outdoor parking slots under different lighting conditions, and provides the locations of the marking points. We recognize that the release of ps2.0 has greatly promoted the development of the valet parking field, and provided a benchmark to measure the effectiveness of different methods. However, we observe that the most of parking slots in ps2.0 have simply shapes such as T-shaped and L-shaped. It ignores the complex shapes in real-world such as brick, trapezoidal, stereo and oblique parking slots. Thus, networks trained with this dataset will be biased towards simply shape. To overcome this limitation, we annotate the PSDD to further boost the performance in real-world applications.
Descriptor for marking points
The existing parking slot vertex descriptors include rectangular descriptor and directional descriptor ,  – the details are shown in Fig. 2. The rectangular descriptor used in DeepPS  is a template for describing features in the rectangular neighborhood of parking slot vertexes, which can extract features of fixed-type parking slots, such as T-shaped, L-shaped, or oblique parking slot. Although it has less limitation than the line-based approach, it is sensitive to the orientation change of the parking line due to its rectangular shape, which degrades generalization ability to different scenes. To address this issue, the directional descriptor is proposed in DMPR-PS , which is a circular template with T-shape or L-shape vertexes inside. In this method, the position, direction and shape of the parking slot vertex are detected to obtain the parking slots. DMPR-PS has achieved state-of-the-art performance on ps2.0 dataset and argued that the architecture of combining marking point detection and deep learning networks is effective in parking slot detection tasks. Despite the excellent performance of the directional descriptor in T-shaped and L-shaped parking slot detection schemes, it is still limited by the inability of handling complex parking scenarios such as oblique, trapezoid and stereo parking slots. We demonstrate the comparison of the different detection algorithms and the corresponding supporting scenarios in Table III. Inspired by the constraints of the other methods, we propose the deformable circular descriptor to extract the universal vertex feature pattern, which can describe different types of parking slot vertexes. In this way, our method exhibits the capability of detecting parking slots in more complex scenarios.
|Wang et al.’s method ||L||✓|
|Hamada et al.’s method ||L||✓|
Efficient Parking Slot Detection
Among the prior parking slot detection methods based on deep learning, DeepPS  is the first network structure to utilize CNN for parking slot detection. In order to improve the efficiency of the parking slot detection task, Lin Zhang et al.  have proposed a concept named directional marking-point and designed a network structure DMPR-PS. By predicting position, shape and orientation of all directional marking-points of given surround-view image in a single forward evaluation, DMPR-PS obtains the information of marking-points and their neighborhood. As a consequence, DMPR-PS is more efficient than DeepPS. Although DMPR-PS has achieved tremendous progresses in efficiency, it can only perform real-time detection on GPU. However, in the mass production of embedded environment, many platforms only support CPU. Even though some resort to GPU, the computing ability of the GPU is also limited. Some previous researchs ,  have verified that using decoupling method can reduce the difficulty of network convergence by experiments. Inspired by these works, we use the idea of cascade to decouple this problem. In particular, detecting marking point from the input image can be divided into two stages. In the first stage, we employ a circular descriptor to obtain the initial position of the marking point in a grid. In the second stage, we use the initial position as the center to crop the sub-image in the fixed size from the original image. Then we employ a smaller circular descriptor than the first stage to regress the position shifted by the initial position for a higher accuracy. We corroborated the state-of-the-art effectiveness and efficiency of PSDet in experiments conducted on the benchmark dataset ps2.0 and PSDD.
Iii Universal Representation of Vertex Feature
Among the existing parking slot detection approaches, it is difficult to find a universal feature descriptor to describe the parking slot vertexes with complex and variable types. Therefore, we define various types of parking slot vertexes as a universal feature paradigm, and use this paradigm to describe different types of parking slot vertexes.
Compared with the previous rectangular descriptor and directional descriptor, circular descriptors proposed in our paper can describe different types of parking vertex patterns. This chapter will mainly introduce the definition of vertex paradigm and circular descriptors.
Iii-a The Concept of Vertex Paradigm
Vertex Paradigm. The vertex paradigm is a common pattern of the neighbor pixels around the marking point, which represents the overlapping relationship between the deformable marking-lines around the marking point, as illustrated in Fig. 3.
Vertex Area and Non-Vertex Area. The vertex area is a collection of pixels containing the vertex of the parking line, represented as . The Non-Vertex Area is a collection of pixels that are not centered by any parking slot vertex, represented as , as depicted in Fig. 3. And represents the image.
The Metric of Vertex Paradigm. The boundary of the vertex area and the marking lines of a parking slot intersect at 2 points and . We take the vertex as the initial point and the intersection points and
as the end points to obtain two vectorsand , respectively, as illustrated in Fig. 3. The metric of the vertex paradigm is as follows:
, which represents the inner product between vectors and .
Circular Descriptor. To describe the vertex paradigm in the parking slot vertex area , we introduce a circular region descriptor. Circular descriptor is a deformable circular template that can contain various types of parking slot vertexes with a sufficiently large radius. Circular descriptors of different parking slots are depicted in Fig. 4. Different from the T-shaped or L-shaped marking-line pattern around a marking point , the circular descriptor is able to extract more common pattern and help to solve the non-L-shaped and non-T-shaped situation, such as oblique, brick and trapezoid etc. Circular descriptors are able to contain various categories of patterns. These circular descriptors can be learned according to the paradigm of corresponding feature patterns given by different labels, as illustrated in Fig. 4.
Iii-B Characteristics of the Vertex Paradigm
In order to obtain the bounds of the circular descriptor, we will explore the characteristics of the vertex paradigm to determine the boundary range of the circular descriptor.
The Lower Bound of Vertex Paradigm. The lower bound is the minimum area of the vertex paradigm on the image, which is centered at the vertex with the minimum radius that intersects the parking line, as show in Fig. 5:
where is the minimum point set containing the vertex, is the circle with the radius of , is the lower bound of vertex paradigm.
The Upper Bound of Vertex Paradigm. The upper bound represents the maximum area of the vertex paradigm on the image, as shown in Fig. 5:
where is the maximum point set containing the vertex, is the circle with the radius of , is the upper bound of vertex paradigm.
Iv PSDet: EFFICIENT AND UNIVERSAL PARKING SLOT DETECTION
In this section, we describe the proposed parking slot vertex detection approach (i.e.,PSDet) in detail.
Iv-a The Structure of PSDet
To achieve an efficient and universal parking slot vertex detection, we propose a two-stage cascade network PSDet. The structure of PSDet is depicted in Fig. 1.
Cascade Network Structure. It is very challenging to regress the accurate position of marking points from a large image due to immense complexity. In this work, we tackle this problem in the manner of divide-and-conquer. We propose the two-stage PSDet that firstly computes vertex region proposal and then regresses to the accurate vertex location. To more precise, in the first-stage, we extract the approximate region of the vertex appear to coarsely locate the marking points initially. We then crop from the input image sub-images centered around the vertex proposals resulted from the first stage. Additionally, we utilize the second-stage network to regress the accurate vertex position from sub-images in the form of offset to the coarse vertex proposal.
As shown in Eq. (4), the complexity of detecting marking points directly in the whole image is much higher than detecting marking points in a smaller sub-image.
where is the complexity of detecting marking points in an original image of size . The size also indicates the detection range of the first stage. is the complexity of detecting marking points in an image of size , where and represent the down sampling factor of the original image in the vertical and horizontal direction, respectively. is the complexity of detecting marking points in an image of size , where represents the detection range in the second-stage, as demonstrated in Fig. 6. In this way, the proposed PSDet achieves the best performance while being real-time in practice.
First Stage. Given a 320240 surround-view image , two 32096 images are cropped with the left and right sides of as the initial boundaries. Then a set of feature maps is extracted from the 32096 image, as depicted in Fig. 1. In addition, the pyramidal network is employed to extract feature maps with different resolutions, which can introduce scaling robustness to the network. Afterwards, these feature maps are interpolated to a fixed size and concatenated into synthesized feature maps. Consequently we obtain a series of feature map of size , as shown in Fig. 6. As an example, We named one of the feature maps as and the value of point with coordinates on as . can be regarded as the point response intensity of input image to the first-stage circular descriptor template. Furthermore, is normalized to [0,1] through softmax, as shown in Eq.(5). Finally, the point position whose normalized value is retained as vertex proposal of parking slot.
Second Stage. After obtaining the initial location of the marking points in the first-stage, we utilize the positions of vertex proposals as centers to crop a series of sub-images from the input image. Then a CNN-based regression model and the second-stage circular descriptor template are used to further detect all vertexes in the sub-images. Finally the position of the point with the highest response intensity on the output feature map is retained as the final position of parking slot vertex, and accordingly correcting the position deviation of the parking slot vertex proposal in the first stage. In this way, the accurate position of the parking slot marking point is detected.
The loss used in PSDet is defined as the sum of squared errors between predictions and ground-truths, and the details will be discussed as follows:
First Stage Loss. The loss used in the first-stage is expressed as following equation:
where and represent the width and length of the output feature map from the first-stage. is the predicted value in the neighborhood with as the center, as depicted by the red circle in Fig. 6, where is the radius of the first-stage circular descriptor. is the circular descriptor of the first-stage, as depicted by the green circle in Fig. 5. We measure the response of each pixel in the output feature map with the method shown in Eq. (1), and obtain the metric result
. With regression objectives defined, the loss function in in the first-stage is defined as the sum of square errors between predictionsand ground-truths .
Second Stage Loss. The loss used in the second-stage is defined below:
where and represent the width and length of the output feature map from the second-stage, respectively. In this experiment, both width and length are 25. is the predicted value in the neighborhood with as the center and as the radius, as depicted by the red circle in Fig. 6, where is the radius of the second-stage circular descriptor. is the circular descriptor template of the second-stage, as illustrated by the red circle in Fig. 5.
V-a Benchmark Dataset
In this work, we annotate the large-scale benchmark PSDD which is composed of 14628 calibrated surround-view images collected from typical indoor and outdoor parking slots. We sample the images from 21 video sequences which are captured in 7 different scenarios – 3 sequences per each scenario. Samples in PSDD are filtered after frame splitting which results in 14628 samples. The amount of each category of data in the dataset is different, owning to the diverse popularity of the different parking slots in real-world application, such as the rectangular and open parking slots are most frequent. There are 3342 samples in open parking slot category, 5667 samples in rectangular parking slot category, 1242 samples in grass parking slot category, 63 samples in stereo parking slot category, 1946 samples in trapezoidal parking slot category, 500 samples in oblique parking slot category, and 1868 samples in brick parking slot category respectively. A set of samples are shown in Fig. 7, and the ratio of training set to testing set in all experiment is 1:1.
V-B Deformable Marking Point Detection Experiment
The circular descriptors and heatmaps of different parking slot are depicted in Fig. 8
. It shows that our approach has better performance. Besides, the precision and recall rate curves obtained after the PSDet first-stage and second-stage are illustrated in Fig.9 and Fig. 10, respectively. From all these figures, we observe that the second-stage network significantly improves the precision of the first-stage network while slightly degrades the recall. Please note that the second-stage network is not able to improve the recall due to the hard selection of vertex proposals – False Positive vertexes are discarded in the second-stage.
V-C Parking Slot Detection Experiment
We validate the performance of PSDet on both available benchmarks – ps2.0 
and PSDD respectively. Precision and recall rates serve as the evaluation metrics. To compare with other methods, we also evaluate the performance of various existing parking slot detection approaches. TableIII shows the results on ps2.0. Compared with the most competitive performer – i.e., DMPR-PS, our PSDet tends to have better recall rate but worse precision. Despite this, we argue that our PSDet achieves the similar precision and recall while with approximately 5 faster speed on simpler dataset. Furthermore, we compare competitors on our more complex PSDD in Table IV which has more types of parking slots. In this case, PSDet achieves state-of-the-art in both precision rate and recall rate.
|Wang et al.’s method ||L||98.29%||58.33%|
|Hamada et al.’s method ||L||98.45%||61.37%|
|Wang et al.’s method ||L||60.13%||10.24%|
|Hamada et al.’s method ||L||40.96%||12.11%|
We qualitatively compare the performance of DMPR-PS and our PSDet on parking slot detection. As illustrated in Fig. 11, we observe that PSDet is more accurate and suitble for diverse scenarios. For example, our method is able to accurately detect the parking slots in some extremely challenging cases such as grass scenes with blurry parking lines shown as Fig. 11(a). Scenes with strong light reflection are shown in Fig. 11(c) and (d). Reflectional and unclear parking lines of stereo parking slot are shown in Fig. 11(f).
V-D Running Speed of PSDet
PSDet is implemented in C++, and the model is transformed from caffemodel to NCNN format . We conduct experiments on the CPU platform Qualcomm 820a, which is low-cost and can be used in mass production. The experimental results show that after converting the model into NCNN format, it can run at speed up to 25.3 FPS on CPU platform Qualcomm 820a (101.8 FPS on GPU platform Qualcomm 820a). In this way, our method is capable of detecting parking slot in real-time. To distinguish our improvement in terms of efficiency, we summarize the time efficiency of other methods on CPU in Table V. All methods are converted to NCNN format before running speed tests. We observe that PSDet achieves the fastest speed.
V-E Potential Practical Application
We show the potential of the proposed method to be applied in practical scenarios. With the known camera parameters, we can easily project the detection results into the world coordinate system. By doing so, we annotate the parking slots in the local semantic map, which is a critical process in self-parking. Furthermore, it is of great significance for the valet parking and autonomous system as well.
V-F Failure Cases of PSDet
We summarize two main reasons for failure cases of PSDet: (1), Missing or false detection occurs when image contains adverse factors, such as different light conditions and blur. (2), Severe occlusion of parking lines leads to the inaccurate orientations of them. To name a few examples, we illustrate some typical failure cases in Fig. 12.
Vi Conclusion and Future Work
In this paper, we have proposed an efficient and universal method for parking slot detection, named PSDet. In contrast with existing work, PSDet is capable of detecting parking slots in various scenarios such as trapezoid, brick and grass parking slots. In addition, PSDet achieves a new state-of-the-art running speed on CPU in embedded platform Qualcomm 820a with a smaller model size – being much faster than the top performers. With improved effectiveness and time efficiency, our method therefore provides the technical feasibility for various applications which has limited computing resources such as mobile devices. Besides, we annotate and release the large-scale benchmark dataset PSDD, which has more types of parking slots and more complex than the existing datasets. We believe that it will be beneficial to the future efforts in enhancing parking slot detection for real-world applications.
-  Y. Song and C. Liao, Analysis and review of state-of-the-art automatic parking assist system, in 2016 IEEE International Conference on Vehicular Electronics and Safety (ICVES).
-  J. K. Suhr and H. G. Jung, Automatic parking space detection and tracking for underground and indoor environments. IEEE Transactions on Industrial Electronics, 63(9), pp. 5687–5698, 2016.
-  J. K. Suhr, H. G. Jung, K. Bae and J. Kim, Automatic free parking space detection by using motion stereo-based 3d reconstruction. Machine Vision & Applications, 21(2), pp. 163–176, 2010.
-  L. Li, L. Zhang, X. Li, X. Liu, Y. Shen and L. Xiong, Vision-based parking-slot detection: A benchmark and a learning-based approach, in 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 649–654, July 2017.
-  J. K. Suhr and H. G. Jung, Sensor fusion-based vacant parking slot detection and tracking. IEEE Transactions on Intelligent TransportationSystems, 15(1), 21–36, Feb. 2014.
-  M. R. Schmid, S. Ates, J. Dickmann, F. V. Hundelshausen, and H. J. Wuensche, Parking space detection with hierarchical dynamic occupancy grids, in Intelligent Vehicles Symposium, 2011.
N. Kaempchen, U. Franke, and R. Ott, Stereo vision based pose estimation of parking lots using 3d vehicle models, in Intelligent VehicleSymposium, 2002. IEEE, volume 2, pp. 459–464 vol.2, June 2002.
-  H. G. Jung, D. S. Kim, P. J. Yoon, and J. H. Kim, 3d vision system for the recognition of free parking site location. International Journal of Automotive Technology, 7(3), 351–357, 2006.
H. G. Jung, S. K. Dong, P. J. Yoon, and J. Kim, Structure analysis based parking slot marking recognition for semi-automatic parking system, in Structural, Syntactic, & Statistical Pattern Recognition, Joint Iapr International Workshops, Sspr & Spr, Hong Kong,China, August 2010.
-  C. Wang, H. Zhang, Y. Ming, X. Wang, and C. Guo, Automatic parking based on a bird’s eye view vision system. Advances in Mechanical Engineering, 6, pp. 847406–847406, 2014.
-  H. H. Barrett, The radon transform and its applications. Volume 21 of Progress in Optics, pp. 217 – 286, Elsevier, 1984.
J. Matas, J. Kittler, and C. Galambos, Robust detection of lines using the progressive probabilistic hough transform. Computer Vision & Image Understanding, 78(1), pp. 119–137, 2000.
-  K. Hamada, Z. Hu, M. Fan, and C. Hui, Surround view based parking lot detection and tracking, in Intelligent Vehicles Symposium, 2015.
-  M. Sonka, V. Hlavac, and R. Boyle, Image processing. Analysis and and machine vision, 2008.
-  G. Borgefors, Hierarchical chamfer matching: A parametric edge matching algorithm. Pattern Analysis & Machine Intelligence IEEE Transactions on, 10(6), pp. 849–865, 1988.
-  S. Lee and S. W. Seo, Available parking slot recognition based on slot context analysis. Iet Intelligent Transport Systems, 10(9), 594–604, 2016.
-  J. Huang, L. Zhang, Y. Shen, H. Zhang, S. Zhao, and Y. Yang, Dmpr-ps: A novel approach for parking-slot detection using directional marking-point regression, in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 212–217, July 2019.
-  T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, .Feature Pyramid Networks for Object Detection. arXiv e-prints, arXiv:1612.03144, Dec 2016.
-  Z. W. Cai and N. Vasconcelos, Cascade R-CNN: Delving into High Quality Object Detection, arXiv:1712.00726, Dec 2017.
-  A. Simonelli, S. T. Bulo, L. Porzi, M. L. Antequera and P. Kontschieder. Disentangling monocular 3d object detection. In ICCV, 2019.
-  C. Vestri, S. Bougnoux, R. Bendahan, K. Fintzel, and T. Kakinami, Evaluation of a vision-based parking assistance system, in Intelligent Transportation Systems, 2005.
-  Tencent.Inc. Ncnn. https://github.com/Tencent/ncnn/, 2017.