, outdoor scene understanding[7, 25], and de-identification for privacy protection . In the last few years, LPR has been widely studied in theoretical, experimental and numerical ways to provide robust image representation. Many LPR methods [2, 1, 11, 20] are capable of capturing the structural properties of images and noise for carefully constrained settings. Despite the recent success, recognizing license plate in the wild is still far from satisfactory due to the variations that suffer from appearance, noise, angle, and illumination.
Recently, due to the hierarchical feature extraction and learning capability, deep convolutional neural networks (CNNs) have made remarkable advances in many computer vision applications, such as object detection[30, 29], semantic segmentation [23, 31], action recognition 
, and face recognition[36, 27]. As a result, CNN-guided LPR methods are also extensively applied to handle the problem of recognizing license plate captured directly real-world camera. For example, Zhuang et al.  transform license plate into a semantic segmentation result with the counting network to handle appearance variations. Although numerous LPR methods have been developed [35, 41], they are not still capable of learning all types of samples in the wild. For this reasons, their algorithms practically assume a high-quality image as an input. Generally, the typical appearance of the license plate collected in real-world scenes might contain the aforementioned challenges, causing deterioration in LPR performance. Hence, developing and implementing robust LPR framework are highly indispensable, especially for real-world scenes.
In this paper, we design an end-to-end single noisy image denoising and rectification network (SNIDER) for better LPR based on multiple auxiliary tasks. Figure 1 illustrates the LPR framework in which the proposed SNIDER is combined with a pre-trained LPR network. The SNIDER consists of two sub-networks: a denoising network and a rectification network. Motivated by the success of U-Net 
in recovering the object details, we employ U-Net structure as an image recovery backbone network, attempting to extract visual content at structural-level details. In the denoising sub-network (DSN), we try to transform a low-quality image to a high-quality image pixel by pixel directly. The DSN can penalize the loss between noisy and noise-free image pairs and thus acquire the output image with the fine textures of the clean component, learning an independent realization of the noise. However, even with such sophisticated DSN, denoising images are unsatisfactory because they still have arbitrary geometric variations. Therefore, the rectification sub-network (RSN) is proposed to correct geometric distortions of denoising license plates and generate more accurate correction image distortion. Furthermore, we propose to leverage the new auxiliary tasks to further optimize the image recovery sub-networks (DSN, RSN) of SNIDER. There are two auxiliary tasks: a text counting module and a segment prediction module. Specifically, we solve each auxiliary module using CNN as a decoder. The counting module is used to predict the number of text in the image as a classification problem. In this module, despite the ambiguous boundary of consecutive text, text counting can distinguish single text, which makes the image quality suitable for text detection. For the segment prediction module, we propose a binary segmentation to emphasize the foreground over the background. The generated segmentation result makes the license plate clean for text recognition. Finally, learning the auxiliary tasks will lead the intermediate features of the recovery main task networks to enhance the difficulties such as geometric variations and low-quality information. More importantly, we introduce a new loss function that trains the SNIDER with auxiliary tasks, which provide significantly higher license plate quality for robust LPR.
To sum up, we highlight the main contributions of this paper as follows:
We propose a novel end-to-end license plate recovery network, where denoising and rectification network are used to generate a clear recovery image for robust LPR performance.
We present the auxiliary tasks to leverage the quality of the license plate recovery from low-quality. Mainly a new loss is introduced to provide regularization effects to the backbone SNIDER for robust representation and license plate recovery.
Finally, we demonstrate the effectiveness of the proposed method in recovering a high-quality license plate from a low-quality license plate in the real-world and show that the LPR performance outperforms the state-of-the-art methods on two challenging datasets, AOLP-RP  and VTLPs dataset newly collected on the most challenging real-world environments.
2 Related Work
In this section, we briefly review on low-quality image recovery methods and license plate recognition methods that is most related to this work.
2.1 Low-Quality Image Recovery
To obtain the high-quality image, most of the existing methods depend on the assumption that both signal and noise arise from particular statistical regularities by using hand-crafted algorithms, such as anisotropic diffusion  and total variation 
. Besides, non-parametric models[8, 26]
were developed to model image noise, but they were also not robust to the unconstrained environment in the wild due to priors estimated from limited observations. Recently, due to the advances in deep learning, most denoising algorithms are designed with deep neural network architectures and data-driven approach rather than relying on the priors. Burgeret al. 
employ multi-layer perceptrons with a data-driven technique based on an extensive image database. Zhanget al. 
train the deep CNN by utilizing batch normalization (BN) and residual learning .
Though useful for estimating a clean image, text classifiers are still hard to recognize due to the irregular text geometry. It motivates research for image recovery to extend image rectification. Shiet al. 
develop a spatial transformer network (STN) for rectifying text distortion. Chenget al.  adopt more in-depth representations of images by a residual network. Different from the existing methods, in this paper, we extract deep representations of images using the U-Net-based CNN for denoising as well as rectification. To the best of our knowledge, our research may be first work to apply the two modules mentioned above for LPR at the same time.
2.2 License Plate Recognition
Before the advent of deep learning, most of the traditional LPR methods [16, 1, 13, 40] employ two-stage process flow, involving text detection and following text recognition. After the advancement of deep learning, many approaches employ a one-stage process flow without text detection. Li et al. 
extract deep feature representations by using RNN with LSTM for acquiring sequential features of the license plate. Bulanet al.  estimate domain shifts between target and multiple source domains for selecting a domain that yields the best recognition performance based on fully convolutional network . However, these methods only consider high-quality license plate image except for low-quality image, which is easily led to low performance in real-world scenes. Moreover, their methods lack little or no effort to improve image quality, while requiring a lot of computing power. In this work, unlike existing methods, we adopt image recovery for high LPR performance under the low-quality image in real-world scenes. To the best of our knowledge, this is the first time we apply sophisticated image recovery to handle a challenging real-world environment. Besides, our methods are computationally efficient and capable of real-time recognition despite additional recovery modules.
3 Proposed Method
The proposed approach consists of three parts: 1) main tasks prediction networks and for denoising and rectification; 2) auxiliary tasks prediction networks and for count classification and segment prediction; 3) LPR network for text detection and classification. The proposed architecture is illustrated in Figure 2. For training, dataset for main tasks and auxiliary tasks can be inferred from the intentionally transform operation by simple rotation (for rectification) and down-resizing (for denoising), as shown in Figure 3. In particular, only one sample of original image simply can generate four training samples that have been transformed by different angles. Given the training samples for , for , for and for , , the main tasks and extract recovery result from input image and corresponding samples. LPR network then takes to recognize a recovery image.
In the following subsections, we introduce the method to predict the main tasks in Section 3.1. Then, we also address the auxiliary tasks for prediction in Section 3.2. Then, we describe the network training of the proposed architecture in Section 3.3. Finally, we illustrate the testing process in Section 3.4.
3.1 Denoising and Rectification Network
Our main task networks include two sub-networks (i.e. denoising sub-network and rectification sub-network), and the first sub-network takes the low-quality image as the input, and the output is the recovered image. In this paper, we design the rectification network to rectify the denoising results from the denoising network.
The image recovery results  have shown the effectiveness of the U-Net since it can provide high-quality overall details of an image object, without a negative impact on the image generation. Therefore, we adopt a U-Net-based architecture adding skip connections that shuffle low-level information shared between input and output across the network. In contrast to their network, our recovery network includes two sub-networks, which are also the U-Net architecture. As shown in Table 1 and Figure 2.(b,c), our denoising network and rectification network consist of the encoder and decoder module.
To achieve the main tasks, we first feed into to generate denoising results. Given a pair of input image and non-rectified ground-truth denoising image , loss function for the is the pixel-wise MSE loss, and it is calculated as Eq. (1):
where is the parameters of denoising network. Such loss function encourages the to not only extract the content information of input image but also generate a high-quality natural image in pixel level.
Then, the rectification sub-network processes the output from , and outputs a rectified high-quality image, which is easier for the LPR network to recognize the identification text. With the training pairs of , the can be trained using a L1 loss for the predicted result :
where is the parameters of the rectification network. Unlike L2 loss, using L1 loss in the pixel level helps to preserve the appearance of an object, such as image color, intensity, and illumination, and leads to denoising result capable of only geometric transformation. Therefore, we can only perform geometric transformations without the appearance damage of the image during the rectification process, which forces the recognizer to be helpful.
3.2 Auxiliary Tasks Prediction
Due to the complex real-world environments such as the extremely irregular geometric shape of text as well as the complicated image background, the binary information of the license plate is often noisy. Although we intend and to capture robust features for image recovery, the results by this structure do not always guarantee a well-enhanced output. Therefore, our work involves an additional learning branch where a richer feature representation is obtained from the backbone network. Motivated by multi-task learning , we employ the auxiliary tasks, i.e., binary segmentation and count estimation, which will contribute our main task networks produce more discriminative feature representations. Towards this problem, we sum the weights of the last layer of encoders in order to guide auxiliary task networks to help main task networks effectively extract critical information from the low-quality image.
For the binary segmentation task, we introduce the segment decoder based on U-Net architecture. Detailed architectures of the are shown in Table 1. The accepts feature set
summed from the last features of each main task’s encoder and outputs a license plate segment with values indicating the probability of pixels belonging to the license plate. Also, ground-truth labels for segmentation can be inferred from the dotted annotations by’s method as Otsu Thresholding, as shown in Figure 3. Although our segmentation annotations by  do not fully reflect the actual detail appearance of an image, we have shown in the experiments that this auxiliary and straightforward learning strategy leads to effective advances in image recovery. Given a pair of and the ground-truth segmentation result in , loss function for the is the binary cross-entropy loss:
where is the real classes of pixels in with 1 for the license plate area and 0 for the background, denotes the pixel-wise probability by .
Also, we find that the generated recovery samples cannot usually distinguish successive texts due to close to each other. Motivated by the observations, we add a counting decoder , which predicts the number of characters in the image. As a result, our
plays two roles, where the first is to cause separation between adjacent texts more clearly. The other role is to promote the encoders of each main task to generate a higher quality image while backpropagating the penalty. The loss function for theis the L2 loss:
where and are the predicted value and the ground-truth, respectively.
3.3 Network Training
The full objective function is a weighted sum of all the losses from Eq. (1) to (4):
We employ a stage-wise training strategy to optimize main tasks with auxiliary tasks and empirically set the weights of each loss as detailed in Section 5.3.
At the testing phase, the auxiliary tasks are removed. Given a low-quality test image , and output the recovered image via denoising and rectification. Then LPR network based on a YOLO v3 detector 
by pre-trained on ImageNet takes the recovered image and generates the recognition result of , and it is denoted as Eq. (6):
4 Experimental Setting
In this section, we describe a list of datasets, metric, and implementation details for the proposed method.
We use LPR datasets AOLP  and newly collected dataset, named VTLP.
AOLP-RP : AOLP-RP  consists of 611 images collected in Taiwan, including ten numbers and 25 letters (except ”O”). This dataset has a challenging factor that the angle of the LP contains oblique samples in terms of distortion. On the other hand, in terms of resolution, all images are relatively easy because they consist of high-resolution samples rather than other datasets.
VTLP : We introduce a new challenging large-scale dataset collected in South Korea. The dataset contains 10,650 LP images, which are divided into 6,400/4,250 images for training and testing, respectively. All Korean letters are hidden for privacy protection. Images in VTLP consist of text(only 10 digits, not Korean). Compared with the public LPR datasets, our dataset has challenging factors: 1) We apply the manual annotation of large-scale images selected from unconstrained real-world, covering a variety of challenging situations using bounding box coordination; 2) Distance from vehicles to the camera is far from other dataset; 3) Various scene-texts interfere with the detection, low-resolution appearance, and very oblique LP.
4.2 Evaluation Metric
We follow the evaluation metric that has been widely used in LPR research[13, 41]. Therefore, if only one of the consecutive characters is misclassified or not detected, it is treated as a failure case. We denote this metric as a recognition accuracy. Also, we address the 36 characters, including 26 letters and 10 digits for text recognition.
4.3 Implementation Details
All the reported implementations are based on the TensorFlow framework, and our method has done on one NVIDIA TITAN X GPU and one Intel Core i7-6700K CPU. In all the experiments, we resize all images to 320
320. For stable training, we use a gradient clipping trick and the Adam optimizer
with high momentum. The proposed network is trained in 1 million iterations with a batch size of 16. The weights in all SNIDER layers are initialized from a zero-mean Gaussian distribution with a standard deviation of 0.01, and the constant 0 as the biases in all layers. All models are trained for the first 100 epochs with a learning rate ofdespite higher values, and then for the remaining epochs at the learning rate of . Batch normalization  and LeakyReLU  are used in all layers of our networks. Also, for network as baseline, we use the YOLO v3 detector  model pre-trained on ImageNet.
Two SNIDER models are trained for evaluations and benchmarking with state-of-the-art methods. The first is a backbone model SNIDER, which uses five convolution blocks at encoder and decoder, respectively. In contrast, the other model denoted by SNIDER-Tiny uses a relatively light network thereby too fast for testing. All SNIDER models are trained under the same parameter setting.
In this section, we evaluate the proposed approach on two datasets: AOLP-RP  and VTLP.
|a||Baseline (YOLO v3)||91.65||80.45|
|c||Add DSN, RSN||98.53||90.71|
|d||Add DSN, RSN, SD||99.02||92.08|
|Add DSN, RSN, CD||98.69||91.08|
|e||Add DSN, RSN, SD, CD (ours)||99.18||93.08|
5.1 Ablation Study
We first compare our proposed method with the baseline LPR network to prove the effectiveness of image recovery performance. Both LPR results on two datasets are reported for the following five types of our methods where each module is optionally added: a) the only baseline without proposed method; b) adding one main task; c) adding all main tasks; d) adding all main tasks and one auxiliary task; e) adding all of the modules (namely, proposed method).
We present the LPR accuracy for each type on two datasets in Table 2, and the visual comparisons are shown in Figure 4. From Table 2, we can find that adding the denoising and the rectification task, respectively, significantly improves the LPR performance (type b, c). In addition, we observe that LPR performance improves more when both tasks are applied at the same time. As shown in Figure 4. (c), noise and blurring effect are removed from the low-quality image (a), and characters are enhanced well compared to (c). This confirms that performing two tasks at the same time is more helpful to recover high-quality images. Despite showing better LPR performance (Table 2. (c)), we still find that the output image contains elements that interfere with LPR performance. For example, there are still challenges to detect the suitable text region, including a region that is unnecessary for recognition, such as a manufacturer’s logo (see in Figure 4. (d)), and ambiguity that not well detected between consecutive characters. Therefore, when each auxiliary task is added to main tasks, recovered image quality can be better (Figure 4. (e,f)) and we observe some improvements on LPR performance (Table 2. (d)). Finally, we incorporate all the tasks, perform experiments on it and observe the best performance improvement in LPR (Table 2. (e)). Furthermore, the recovered image in Figure 4. (g) is the most realistic of all results.
|Method||AOLP-RP Full LPR accuracy (%)|
|Baseline (YOLO v3)||91.65|
|Hsu et al. ||85.76|
|Li et al. ||88.38|
|Silva et al. ||98.36|
|Zhuang et al. ||99.02|
5.2 Comparison with State-of-the-art Methods
We compare the proposed method with some state-of-the-art LPR methods [13, 21, 35, 41]. For the baseline LPR, the SNIDER has been evaluated over the two datasets as described in Section 4.1 that contain low-quality license plate images with a variety of geometric variations.
As Table 2, 3 and 4 show, the SNIDER consistently outperforms the SNIDER-Tiny across all datasets due to the use of a more in-depth and broader backbone network. However, SNIDER-Tiny is also evaluated to be more effective than most methods, and if not, it shows a relatively small performance difference. Therefore, it can be explained that SNIDER is more useful for LPR than other methods for the low-quality image.
AOLP-RP dataset results. For the AOLP-RP, SNIDER demonstrates that our recovery image can significantly improve the performance of LPR on real-world images. This is mainly due to the fact that AOLP dataset which usually have geometrically tilted cases is processed into a well-rectified image. The results are listed in Table 3, and our method obtains the highest performance (99.18%), and outperforms the state-of-the-art LPR methods by more than 0.16%. Note that what we want to illustrate in the AOLP-RP evaluation (especially see the difference between Baseline and ours in Table 3) is that our method can benefit from the SNIDER, which enhances the image quality despite oblique angle.
VTLP dataset results. The quantitative results for VTLP dataset are shown in Table 4 and the visual comparisons are illustrated in Figure 5. Our approach shows superior performance to other LPR algorithms on LPR accuracy and image recovery. Furthermore, we achieve comparable results with state-of-the-art LPR method [18, 35]. From Table 4, our method obtains the highest performance (93.08%), and outperforms the state-of-the-art methods by more than 5.74% (87.34% vs 93.08%). Note that SNIDER achieves robust performance in VTLP that are collected in low-resolution environments rather than other datasets.
5.3 Parameter Study of the Weights for Tasks
The set of weights in Eq.(5) determines the influence of each task. To choose the optimal selection of , we perform various experiments with the SNIDER model on AOLP-RP and VTLP dataset. Since the influence of the main task is larger than that of the auxiliary task, the weight is also set higher. We also need to adjust the weights for fast optimization even within the auxiliary task. Figure 4 shows the segment decoder plays an important role in eliminating unnecessary areas that interfere with LPR. Therefore, we set the weight of the segment decoder higher than the counting decoder. In our experiment, we set the weights for , , and to 0.4, 0.4, 0.15, and 0.05, respectively.
|Faster R-CNN ||87.06||2.7|
|CenterNet ResNet-18 ||84.68||46|
|YOLO v3  (SNIDER-Tiny)||86.66||44|
|YOLO v3  (ours)||93.08||37|
5.4 Impact of LPR Network
We evaluate how LPR network choice impact LPR performance on the VTLP testing set. Results are shown in Table 5. We mainly adopt a real-time detector for fast processing. Compare with [30, 39], SNIDER indicates that the detector plays an important role in LPR performance. Although previous detectors are high-speed processing through lightweight models, they do not guarantee accuracy. Thus, we adopt YOLO v3, which corresponds to the adequate model that includes enough capacity for rich feature representation during real-time processing.
5.5 Weakness Analysis
Figure 6 shows some failure cases, including some false recovery results. These results identify that more progress is needed to improve the rectification performance further. Future work will address this problem by adding the adjacent context to recovering these more challenging license plate images.
In this paper, we propose a new end-to-end trainable image recovery method that is capable of recognizing license plates in the real-world. The proposed recovery network consists of two sub-networks, the denoising sub-network and the rectification network. In particular, two auxiliary tasks are designed to leverage the recovery of license plates, promoting the feature set to be more robust against the geometric variations and blurry data in the real-world scenes. Moreover, a new loss function is introduced to the backbone network to provide regularization effects and a higher-quality recovery image. Extensive experiments over various datasets demonstrate superior performance in license plate recovery and recognition.
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No.B0101-16-0525, development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis. We also appreciate useful discussions with Kyungho Won, Jaewoong Yun, and Sangwoo Park.
-  C. N. E. Anagnostopoulos, I. E. Anagnostopoulos, V. Loumos, and E. Kayafas. A license plate-recognition algorithm for intelligent transportation system applications. IEEE Transactions on Intelligent transportation systems, 7(3):377–392, 2006.
-  O. Bulan, V. Kozitsky, P. Ramesh, and M. Shreve. Segmentation-and annotation-free license plate recognition with deep localization and failure identification. IEEE Transactions on Intelligent Transportation Systems, 18(9):2351–2363, 2017.
H. C. Burger, C. J. Schuler, and S. Harmeling.
Image denoising: Can plain neural networks compete with bm3d?
2012 IEEE conference on computer vision and pattern recognition, pages 2392–2399. IEEE, 2012.
-  H. Cai, Z. Yang, X. Cao, W. Xia, and X. Xu. A new iterative triclass thresholding technique in image segmentation. IEEE transactions on image processing, 23(3):1038–1046, 2014.
-  R. Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
-  Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou. Focusing attention: Towards accurate text recognition in natural images. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
-  S. Cherng, C.-Y. Fang, C.-P. Chen, and S.-W. Chen. Critical motion detection of nearby moving vehicles in a vision-based driver-assistance system. IEEE Transactions on Intelligent Transportation Systems, 10(1):70–82, 2009.
-  K. Dabov, A. Foi, and K. Egiazarian. Video denoising by sparse 3d transform-domain collaborative filtering. In 2007 15th European Signal Processing Conference, pages 145–149. IEEE, 2007.
-  J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
-  L. Du and H. Ling. Preservative license plate de-identification for privacy protection. In 2011 International Conference on Document Analysis and Recognition, pages 468–472. IEEE, 2011.
C. Gou, K. Wang, Y. Yao, and Z. Li.
Vehicle license plate recognition based on extremal regions and restricted boltzmann machines.IEEE Transactions on Intelligent Transportation Systems, 17(4):1096–1107, 2015.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  G.-S. Hsu, J.-C. Chen, and Y.-Z. Chung. Application-oriented license plate recognition. IEEE transactions on vehicular technology, 62(2):552–561, 2012.
-  S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
-  P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
-  K. K. Kim, K. Kim, J. Kim, and H. J. Kim. Learning-based approach for license plate recognition. In Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No. 00TH8501), volume 2, pages 614–623. IEEE, 2000.
-  D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonçalves, W. R. Schwartz, and D. Menotti. A robust real-time automatic license plate recognition based on the yolo detector. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–10. IEEE, 2018.
-  H. Law, Y. Teng, O. Russakovsky, and J. Deng. Cornernet-lite: Efficient keypoint based object detection. arXiv preprint arXiv:1904.08900, 2019.
-  H. Li and C. Shen. Reading car license plates using deep convolutional neural networks and lstms. arXiv preprint arXiv:1601.05610, 2016.
-  H. Li, P. Wang, and C. Shen. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 20(3):1126–1136, 2018.
-  H. Liu, Y. Tian, Y. Yang, L. Pang, and T. Huang. Deep relative distance learning: Tell the difference between similar vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2167–2175, 2016.
-  J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
-  A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3, 2013.
-  S. Noh and M. Jeon. A new framework for background subtraction using multiple cues. In Asian Conference on Computer Vision, pages 493–506. Springer, 2012.
-  J. Pan, D. Sun, H. Pfister, and M.-H. Yang. Blind image deblurring using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1628–1636, 2016.
-  S. Park, J. Yu, and M. Jeon. Learning feature representation for face verification. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE.
-  P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern analysis and machine intelligence, 12(7):629–639, 1990.
-  J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
-  S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
-  Y. Shen, T. Xiao, H. Li, S. Yi, and X. Wang. Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In Proceedings of the IEEE International Conference on Computer Vision, pages 1900–1909, 2017.
-  B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4168–4176, 2016.
-  S. M. Silva and C. R. Jung. License plate detection and recognition in unconstrained scenarios. In European Conference on Computer Vision, pages 593–609. Springer, 2018.
-  Y. Wen, K. Zhang, Z. Li, and Y. Qiao. A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016.
-  J. Yu, S. Park, S. Lee, and M. Jeon. Driver drowsiness detection using condition-adaptive representation learning framework. IEEE Transactions on Intelligent Transportation Systems, 2018.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
-  X. Zhou, D. Wang, and P. Krähenbühl. Objects as points. arXiv preprint arXiv:1904.07850, 2019.
-  S. Zhu, S. A. Dianat, and L. K. Mestha. End-to-end system of license plate localization and recognition. Journal of Electronic Imaging, 24(2):023020, 2015.
-  J. Zhuang, S. Hou, Z. Wang, and Z.-J. Zha. Towards human-level license plate recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 306–321, 2018.