License Plate (LP) recognition in the wild is a fundamental problem in intelligent transportation systems. It can be used in a variety of applications including self-driving vehicles, traffic control and surveillance.
The LP numbers enable the link to a large body of information, including ownership, vehicle condition and driving record. Therefore, the technique of LP recognition in the wild can play a key role in road safety, traffic control and law enforcement. Although the recognition accuracy is acceptable for images shot under constrained conditions, recognizing license plates in complex environment is still far from satisfactory, especially for images photographed in dark, glare, occluded, rainy, snowy, tilted or blurred scenarios as shown in Figure 1.
With the advantage of deep neural networks, numerous work is proposed in recent years for license plate recognition, with Convolutional Neural Networks (CNNs) used for feature extraction, and Connectionist Temporal Classification (CTC)
, number classifiers, etc. followed for character reading. These methods perform well for regular license plates (e.g., nearly horizontal). When the license plate images are tilted or bent, an extra rectification step is required before recognition .
This paper tackles the task of license plate recognition in unconstrained scenarios. A robust framework is proposed to handle license plate recognition in both regular and challenging cases effectively. Our proposed license plate recognizer is composed of a 30-layer lightweight Xception for feature extraction and a 2D-attention based decoding module for character sequence recognition. Without extra processings like image rectification or character segmentation, the proposed model is capable of recognizing license plates in both regular and irregular patterns under various practical scenarios. Different from current methods of treating a license plate as a one-dimensional sequence, our method uses 2D-attention that considers license plate image as a 2-dimensional signal. Trained in a weakly supervised manner, the proposed model is able to approximately localize the corresponding characters on license plates in decoding process, regardless of the appearance of license plate patterns.
Many license plate datasets are collected from one region, which causes bias in the datasets. For example, Xu et al.  introduce a license plate dataset CCPD which contains about 290K real world license plate images in various complex situations, as shown in Figure 1. However, since more than of the images are photographed in one city, the first two characters in license plates are mostly the same, which may lead to bias for the trained model. In order to obtain a robust model which can be generally used for recognizing license plates from different regions, a CycleGAN model is tailored here which can mimic real scenarios and generate different kinds of license plate images, such as in dark or strong lighting conditions, containing shadows, etc. Moreover, license plates with various province characters can be synthesized, which alleviates the exhausting human annotation work to a large extent and enables a more general license plate recognition model. Our framework is evaluated on four public datasets. The competitive performance demonstrates the robustness of our framework. Moreover, we also collect a new license plate dataset with images from all provinces in China, named “CLPD”. It enables a more comprehensive evaluation of current plate recognition methods, and promotes the research of a more practical model.
It should be noted that the focus of this work is license plate recognition. So we simply train the off-the-shelf YOLOv2 detector  here to obtain bounding boxes of license plates.
The main contributions of this paper can be summarized as follows:
1. We design a robust method for license plate recognition in natural scene images. It is made up of a tailored Xception module and an encoder-decoder module. We optimized the recognition framework by using a 2D attention mechanism. It is able to extract local features for individual characters in a weakly supervised manner, without character level annotations needed. Compared to existing license plate recognition approaches, our method does not need an extra module to handle the irregularity of license plates or segment each character for recognition.
2. A tailored CycleGAN is proposed to synthesize license plates under various scenarios, including adding shadows, glare or darkness, perspective transformation, etc.. With this engine we can generate license plate images with less data bias, and so get models with better generalization abilities.
3. We build a new dataset, named CLPD. It covers a large variety of photographing conditions, vehicle types and region codes, which provides a more comprehensive evaluation benchmark for plate recognition algorithms and promotes a more practical model design.
Ii Related Work
In this section, we present a concise introduction to related works on license plate recognition, light-weight convolutional neural networks, generative adversarial networks and datasets of license plate.
Ii-a License Plate Recognition
Existing methods for license plate recognition can be divided into two categories: Segmentation based [3, 5, 6, 7, 8] and Non-segmentation based methods [1, 2, 9]. The segmentation based methods generally segment the license plate into characters and then recognize individual characters by OCR models [3, 10, 11]. Bulan et al. 
perform segmentation and OCR jointly by using a hidden Markov models (HMMs) based probabilistic inference method, where the most likely character sequence is determined by Viterbi algorithm. Segmentation based methods rely heavily on the segmentation performance, which is very susceptible to the environment, including strong or weak lighting, bad weather, blurring,etc., and will result in a low recognition accuracy even with a strong recognizer.
Recent methods are mostly segmentation free. For example, Li et al.  propose to treat license plate as a character sequence. Sequential features are encoded by CNNs and Bidirectional RNNs (BRNNs), and decoded by CTC without character separation. The CNN features are extracted from a well-trained CNN classifier, and the model cannot be trained end-to-end. RPnet proposed by Xu et al.  extracts ROI features from several different convolutional layers, and feeds the combined feathers to a series of classifiers for recognition. The number of classifiers is determined by the number of characters in the license plate, which limits its generalization ability in different regions. Li et al.  later propose a unified network which is able to localize license plates and recognize the letters at the same time in a single forward process. Similarly, the region features are encoded by BRNNs and decoded by CTC, which restricts its application to oriented LPs. Compared to the previous work, our method uses a 2D attention based encoder-decoder framework, where characters can be approximately localized by 2D attention regardless of LP image appearance, which enables its application to arbitrarily-oriented LPs.
Ii-B Scene Text Recognition
License plate recognition can be regarded as a special case of general scene text recognition tasks, which have different characteristics. Characters in license plate usually use the same font in one region. There is no language model hidden in license plate, and no strong relationship with the context semantic information. In contrast, general scene text has a great variability on fonts. A language lexicon is existed and the text content is often highly relevant to the objects or scenes of the image. Xieet al.  propose a novel method where aggregation cross-entropy (ACE) is used for sequence recognition, replacing the generally used CTC loss owing to its inconvenience in processing 2D problems. A multi-object rectified attention network (MORAN) for scene text recognition is proposed by Luo et al. , which contains a multi-object rectification network (MORN) and an attention-based sequence recognition network (ASRN). The image is rectified by MORN and then input to ASRN for recognition. Shi et al.  put forward a system that a flexible Thin-Plate Spline transformation is used to adaptively rectify a text image. A recognition model predicts a character sequence directly from the rectified image. Li et al.  use a 2D attention based encoder-decoder framework for irregular text recognition, which is very similar to our work. However, in our framework, a tailored CycleGAN is added for synthetic license plate generation, which can reduce data bias and improve model generalization ability.
Ii-C Generative Adversarial Networks
With the invention of Generative Adversarial Networks (GANs) , many improved models have emerged, such as Deep Convolutional GANs (DCGANs) , Conditional GAN , Cycle-Consistent Adversarial Networks (CycleGAN) , Wasserstein GANs (WGAN)  etc.. Zhu et al.  propose the CycleGAN, which learns the mapping between an input image and an output image using a training set of unaligned image pairs. In order to migrate the style of one image set to another one, cycle consistency loss is introduced. Based on this model, we propose an improved algorithm to generate synthetic license plate images in more complex environments, which improves the accuracy of license plate recognition furthermore. Wang et al.  adopt CycleWGAN to generate license plate images for improving recognition performance. Images simulating different shotting conditions are generated simultaneously. BRNN+CTC is used for plate recognition, which does not take oriented license plates into consideration as well. Nevertheless, we use a tailored CycleGAN to generate license plates under different conditions separately, which can lead to a better recognition performance.
Ii-D Datasets of License Plate
Most datasets about license plates detection and recognition are collected from one area, and the type of license plate is monotonous (e.g., only containing civic cars, no buses or trucks). Images are taken under similar conditions, such as highway toll stations and parking lots. Hence those datasets could not verify the robustness of a model.
Silva et al.  collect a dataset named CD-HARD with images, which covers some difficult situations, including tilting. However, because of the small number of images, the test result is susceptible to tricks. PKUData  captures images through a road surveillance camera, which includes a variety of license plate types and different lighting conditions. Unfortunately, all license plates are horizontal and taken from one province which has the same province code. Models trained on PKUData cannot be used to recognize license plates from other regions. AOLP  database consists of images with Taiwan license plate. This dataset is categorized into three subsets according to different levels of difficulty and photographing conditions. CCPD is currently the largest license plate dataset with k images, and is divided into multiple subsets such as tilt, difficulty, glare, and distance according to license plate conditions, which contributes greatly to the community. Nevertheless, more than of the images are from one city too, which limits the trained model to recognize license plates from other areas. In this work, we propose to synthesize license plates by CycleGAN so as to make up for the deficiency. A new dataset names CLPD is introduced, which includes license plates from different provinces, to evaluate recognition models comprehensively.
We introduce our proposed model in this section. As presented in Figure 2, the whole LP recognition model consists of two main parts: a tailored Xception network for feature extraction and a 2D-attention based RNN model for character decoding.
Iii-a The Convolutional Image Encoder
A 30-layer Xception encoder is tailored from the original Xception  framework to fit our application, whose details are presented in Figure 3. The convolutional parts of our model are based entirely on depthwise separable convolution layers . The convolutional layers are structured into
modules, where all of them have linear residual connections except for the first and the last one. The term “ResSeparableConv” stands for a stack of three separable convolution layers with an identity residual connection.
The entry flow downsamples the spatial size from to and increases the feature channel from to
using interleaved separable convolutions and max-poolings. In the middle flow, we adopt repeated ResSeparableConv blocks to extract deep features that contain higher level representations, while the spatial size and channel number are fixed. In the exit flow, we extract a middle-level feature mapof size as context for attention network and a final feature vector of dimensions.
Iii-B The Recurrent Sequence Decoder
RNN is widely used in translation, image caption, scene text recognition tasks. Here we extend it to license plate recognition. With a two-dimensional attention mechanism integrated, there is no need to make corrections for irregular license plate images or segment out each character for recognition. The proposed model can handle LPs in arbitrary shapes.
-layer LSTMs with hidden states each are adopted here in the sequence decoder. As shown in Figure 2, the holistic feature vector is fed into LSTMs at time step , which aims to provide an overall information about the input image. Then a “START” token is input into the model at time step . From time step
, the output of previous time step is fed into LSTMs until the “END” token received. The inputs of LSTMs are embedded by one-hot vectors followed by a linear transformation. The calculation of a single LSTM cell in training can be expressed as:
where is the current hidden state, represents the LSTM operation at each time step and is the embedding operation. In inferring process, which is the current output, while in training stage, the groundtruth character is adopted directly as . is a linear transformation, and is the output of the 2D-attention module, which is calculated as follows:
where is the feature vector at position in and is the hidden state at time step . are linear transformation matrices to be learned; is the attention weight at location ; is the weighted sum of image features, i.e., the local feature of the characters to be decoded at current time step . The schematic of the 2D attention mechanism is illustrated in Figure 4.
Iv AsymCycleGAN for LP Image Generation
As aforementioned, it is difficult to manually collect LP images from a variety of regions, which makes most existing LP datasets heavily biased towards specific regional identifiers. In this section, we introduce a method for generating high-quality synthetic LP images using OpenCV and a tailored CycleGAN model (termed as AsymCycleGAN). With this approach, we are able to construct a balanced training data and reduce the reliance on manually collected data.
Iv-a The Architecture of AsymCycleGAN
CycleGAN is an approach to translate an image from a source domain to a target domain in the absence of paired training examples. In this work, the source domain is composed of fake LP images generated by OpenCV and the target domain is made up of real LP images. There are four learnable modules in CycleGAN, leading to two mapping functions and two discriminators and
. The loss function of the standard CycleGAN can be expressed as follows:
where represents the adversarial loss and denotes the cycle-consistency loss:
In our case, what we need is the mapping function to generate real images from synthetic images. can be roughly regarded as generating a noisy image from a clean one and then remove these noises, while is the opposite process. Note that in the process of , the noise in removed by is in theory difficult to be exactly recovered by , as one clean image can be associated to multiple real images with different noises. To this end, we replace the original cycle-consistency loss (4) with
where the term with respect to is removed. We term the modified CycleGAN model is AsymCycleGAN, as its cycle-consistency loss is asymmetric. The architecture of the proposed AsymCycleGAN model is shown in Figure 5.
Iv-B AsymCycleGAN Generation Results
As in CycleGAN, the training of our proposed AsymCycleGAN model only requires two sets of unaligned images: synthetic and real images. As shown in Figure 6, the synthetic images are generated using OpenCV, while the real images are sampled from the CCPD dataset . To generate different types of real images, we further divide the CCPD images into two subsets with different illumination conditions: dark and bright. We use this dataset to train standard CycleGAN and our asymmetric CycleGAN model respectively, which consists of synthetic LPs generated by OpenCV and real-life license plate images in dark or glare environments. The AsymCycleGAN model is trained with a learning rate of and epochs. The images generated by CycleGAN and asymmetric CycleGAN are shown in Figure 6. Moreover, we try to add shadows on the synthetic images so as to imitate real environment, the generated images are presented in Figure 6 (e).
V The Proposed LP Dataset
In this chapter, we introduce a new LP dataset named CLPD (China License Plate Dataset), for a more comprehensive evaluation of LP detection and recognition algorithms, including how it is collected (Section V-A) and the comparison with other datasets (Section V-B).
V-a Data Collection
The LP images in the proposed CLPD dataset are collected from a variety of real-scene image sources, for example, searched from the Internet, taken by mobile phones or captured by car driving recorders. All the faces shown in the images are blurred for privacy reasons. When taking LP photos, we also diversify the photographing angles, shooting times, resolutions and background so as to cover different conditions. The proposed dataset includes multiple vehicle types, such as trucks, cars, police cars and new energy vehicles. Note that new energy vehicles in China have license plates with eight letters, while other vehicles have seven-letter license plates. We also allow occluded license plates which have less than seven visible letters. The variation in the length of license plate letters increases the recognition difficulty as well, and makes the rule based recognition methods infeasible. The bounding boxes and license plate letters are annotated manually. In summary, the CLPD dataset contains LP images from all provinces in mainland China. Some examples are shown in Figure 7. To our knowledge, our proposed LP dataset is the only one that covers all mainland China provinces with real shotted images.
V-B Dataset Comparison
As presented in Table I, we compare our proposed dataset with other LP datasets in several aspects. Although the size of our dataset is small, it contains the most number of region codes. As we collect LP images from multiple sources, the image sizes are not fixed, in contrast to other datasets. Furthermore, AOLP, CCPD and our CLPD contain tilted images, while PKUData does not. Finally, our dataset contains LPs from different types of vehicles, including police car, new energy car and truck, which further increases the diversity of LP styles.
In this section, we conduct extensive experiments to compare our license plate recognition method with the state-of-the-art recognition methods. To demonstrate the effectiveness of the proposed model, plenty of experiments are performed on different license plate datasets.
CCPD  is currently the largest publicly available License Plate (LP) dataset that provides over unique Chinese LP images with detailed annotations. This dataset is separated into different groups according to the difficulty of identification, the illuminations on LP area, the distance from the license plate when photographing, the degree of horizontal tilt and vertical tilt, and the weather (rainy, snowy or fog). Each category includes 10k to 20k images. CCPD-base consists of approximately images, where are used for training and the other half is for test. The other subdatasets (CCPD-DB, CCPD-FN, CCPD-Rotate, CCPD-Weather, CCPD-Challenge) are also used for test.
AOLP database consists of images of Taiwan license plate. This dataset is categorized into three subsets according to complexity levels and photographing conditions: Access Control (AC), Traffic Law Enforcement (LE) and Road Patrol (RP). Since we do not have any other images with Taiwan license plate, we use any two of these subsets for training and the remaining one for test, similar to previous practices [1, 9, 26].
PKUData is released by Yuan et al., which provides images for license plate detection. The license plate labels are not annotated and we labeled the images in this dataset. images are randomly selected for training and the rest are used for test.
CLPD is our proposed LP dataset, which contains images across all provinces in mainland China, with different vehicle types included. The images in the newly proposed CLPD dataset are all real and cover a large variety of photographing conditions, vehicle types and region codes. They are only used for test to verify the practicality of LP recognition models.
Vi-B Implementation Details
In this work, we mainly focus on license plate recognition. In order to get the bounding boxes of license plates, a YOLOv2  detector is trained on the training set of CCPD. We set the IOU threshold to , and achieve a detection performance of and on CCPD test sets. For fair comparison, we use the same evaluation criteria as that in . An LP recognition result is correct if and only if the IoU between the detection and the ground truth is greater than and all characters of the LP are correctly recognized (including the region code).
The recognition network is trained with cross-entropy loss and ADAM optimizer without any pre-training. In the training process, we adopt a batch size of and a learning rate of initially. The learning rate is multiplied by at every iterations until it reaches to . The heights of input images in a batch are fixed, while the widths are calculated according to the aspect ratios of original images. All the experiments are conducted on an NVIDIA GTX1080Ti GPU with 11GB memory.
Vi-C Ablation Studies
To analyze our proposed framework in detail, in this section, we evaluate it with different settings on CCPD dataset.
Vi-C1 Effect of CNN structures
In order to analyze the impact of CNN capacities, we first experiment with different number of CNN channels and layers. As shown in Table II, using more CNN channels indeed improves the license plate recognition accuracy, and the performance is saturated when the channel number reaches . Experimental results with different convolutional layers are demonstrated in Table III. The -layer Xception performs better than models with less layers, but the performance does not significantly improve when further increasing the depth. Hereinafter, we use the -layer Xception with channels.
Vi-C2 Effect of inaccurate bounding box
Secondly, we test the recognition performance with detected and ground truth bounding boxes respectively, to demonstrate the robustness of our algorithm. Note that the detected bounding boxes may not encompass the license plates exactly as the groundtruth. This experiment is conducted to show the effect of bounding box variance on recognition performance. As shown in TableIV, the recognition accuracy only drops slightly by using detected bounding boxes (smaller than for all cases except the “Challenge” one), which validates the robustness of our algorithm to inaccurate bounding boxes. One of the possible reasons is that the adopted 2D attention mechanism makes our algorithm not heavily depend on accurate bounding boxes: at each character decoding step, the adopted attention module will extract the most relevant local feature for each character in 2D space, instead of relying on heuristics rules for character separation.
Vi-C3 Effect of synthetic data
The last ablation study is on the effectiveness of generated synthetic data. Here we also compare the performance by using different GAN models. We train our model with different numbers of real and synthetic images (k, k and k), and then test the performance on CCPD-DB dataset. As shown in Table V, using the synthetic data generated by our proposed AsymCycleGAN offers better improvements than using that generated by the original CycleGAN, which demonstrates the superiority of our proposed AsymCycleGAN.
In addition, when comparing the improvements by using different number of real images, it can be found that the synthetic data plays a more important role when the real data size is smaller.
We can also see that the improvement is reduced when using smaller number of synthetic images. Note that the cost of generating synthetic images is very cheap: they do not need human annotation and the generation speed is fast (about 1K/min). So we can easily employ massive synthetic data for training, to improve the accuracy of LP recognition algorithms.
|Real (20k) + CycleGAN (20k)|
|Real (20k) + AsymCycleGAN (20k)|
|Real (20k) + CycleGAN (200k)|
|Real (20k) + AsymCycleGAN (200k)|
|Real (50k) + CycleGAN (50k)|
|Real (50k) + AsymCycleGAN (50k)|
|Real (50k) + CycleGAN (200k)|
|Real (50k) + AsymCycleGAN (200k)|
|Real (100k) + CycleGAN (100k)|
|Real (100k) + AsymCycleGAN (100k)|
|Real (100k) + CycleGAN (200k)|
|Real (100k) + AsymCycleGAN (200k)|
Vi-D Experiments on Existing Benchmarks
Vi-D1 Results on CCPD
|Ren et al. (2015) |
|Liu et al. (2016) |
|Joseph et al. (2016) |
|Li et al. (2017) |
|Zherzdev et al. (2018) |
|Xu et al. (2018) |
|Zhang et al. (2019) [30, 29]|
|Luo et al. (2019) |
|Wang et al. (2020) |
|Ours (Real Data Only)|
|Ours (Real + Synthetic data)|
It can be seen from Table VI that our algorithm outperforms other algorithms in terms of the overall AP and most of subsets, using the same real training data. The only exception is that the method of Luo et al.  is better than ours on the rotate and tilt subsets. The reason may be that Luo et al.  adopts an STN-based  technique which is specifically designed for rotated images. Note that our algorithm can also benefit from using this technique and the accuracy on the rotate and tile subset is expect to be further improved.
Our algorithm shows significant superiority on subsets with irregular LP images, such as “Rotate”, “Weather” and “Challenge”, which again proves the robustness of our model to the deformation of license plates. Moreover, by adding synthetic images generated by our AsymCycleGAN, the recognition accuracies consistently raise furthermore on all subsets (a gain of Overall). The increment is even obvious when LPs are rotated or tilted (raising on CCPD-Rotate and on CCPD-Tilt). The main reason is that random perspective transformation and rotation are applied to the synthesized data, which is a great complementary to real data.
We select some extremely distorted images and visualize the 2D attention heat maps when decoding each character in Figure 8. The results show that even for very tilted images, the 2D attention model can locate to the character being decoded and extract corresponding features for recognition. It should be noted that the attention module does not require additional character-level annotations. It is trained in a weakly supervised manner by the cross-entropy loss on the whole plate recognition.
Vi-D2 Results on AOLP
|Li et al. (2016) |
|Li et al. (2017) |
|Wu et al. (2018) |
In this section, we compare our model with other state-of-the-art methods on AOLP dataset. For fair comparison, we did not use any synthetic data during model training. Perspective transformation is employed for data augmentation. The results in Table VII show that our approach performs better than other methods on all three subsets, which validates the superiority of our approach. In particular, our method leads to the accuracy increments of on AC, on LE and on RP, compared to the second best results. Note that the RP subset is mainly composed of oriented or distorted license plates, on which our method obtains the largest performance gain. This result further demonstrates the effectiveness of our model in recognizing irregular license plates.
Vi-D3 Results on PKUData
For PKUData, we randomly sample three-fifths for training and use the remaining two-fifths for test. For fair comparison, we re-train the model proposed in  by the same training data. An open API called Sighthounds  is tested as well, but we have no idea about the training data it used. We evaluate the LP recognition accuracy on two settings, i.e., with and without region code (a Chinese character) considered. The model in Sighthounds  does not support region code recognition, so we only report its accuracy without region code. The recognition results are shown in Table VIII. Our model outperforms that in  by about when only real data is adopted, and surpasses Sighthounds  if synthetic training data is added. In comparison with the improvement on CCPD dataset, the accuracy gain is even more obvious when using synthetic data (about ), because of the limited real training images in PKUData, which demonstrates the usefulness of our synthesis engine when there is scarce training data.
|Criterion||ACC||ACC w/o RC||ACC||ACC w/oRC|
|Masood et al. (2017) ||-||-|
|Xu et al. (2018) |
|Ours (Real Data Only)|
|Ours (Real + Synthetic Data)|
Vi-E Experiments on Our CLPD Dataset
As aforementioned, the diversity of our proposed CLPD dataset is much larger than existing LP datasets, which provides a platform to evaluate current algorithms comprehensively. We train the proposed model on CCPD-Base dataset, and test it on CLPD. Experimental results in Table VIII show the advantage of our model. It leads to the highest accuracy no matter region code is considered or not. By adding synthetic data, the accuracy increases further if region code is considered, which benefits from a more balanced region code distribution in our synthetic data that can be easily obtained by the proposed engine. Some experimental results are visualized in Figure 9.
We also present some failure cases in Figure 10. As there is no specific language rule used in license plate, some similar characters are rather difficult to be distinguished, such as “4” and “A”, “8” and “B”,“0” “D”, and “O”. Images with extreme blur or occlusion are also unable to be recognized.
In this paper, we present a robust model for license plate recognition in unconstrained environment. The proposed model is built upon an Xception CNN module for feature extraction, and a 2D-attention based RNN module for sequence decoding. To handle the shortage or unbalance of real training data, CycleGAN is tailored to generate synthetic LP images with different deformation styles and a more balanced region codes, which provides a simple yet effective way to complement available real data. Extensive experimental results indicate the superiority of our methods, especially when addressing distorted license plates or with limited training data. An LP dataset that contains images captured in different ways from various regions is collected so as to evaluate LP recognition methods more comprehensively.
We use an LSTM-based sequence decoder for license plate recognition, which cannot be trained in parallel over time steps. For future works, a transformer-like decoder may be explored to accelerate training speed.
-  Hui Li and Chunhua Shen. Reading car license plates using deep convolutional neural networks and lstms. arXiv:1601.05610, 2016.
Zhenbo Xu, Wei Yang, Ajin Meng, Nanxue Lu, Huan Huang, Changchun Ying, and
Towards end-to-end license plate detection and recognition: A large
dataset and baseline.
Proceedings of the European Conference on Computer Vision (ECCV), pages 255–271, 2018.
-  Sérgio Montazzolli Silva and Cláudio Rosito Jung. License plate detection and recognition in unconstrained scenarios. In Proceedings of the European Conference on Computer Vision (ECCV), pages 593–609. Springer, 2018.
Joseph Redmon and Ali Farhadi.
Yolo9000: better, faster, stronger.
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
Chao Gou, Kunfeng Wang, Yanjie Yao, and Zhengxi Li.
Vehicle license plate recognition based on extremal regions and restricted boltzmann machines.IEEE transactions on intelligent transportation systems, 17(4):1096–1107, 2015.
Jing-Ming Guo and Yun-Fu Liu.
License plate localization and character segmentation with feedback self-learning and hybrid binarization techniques.IEEE transactions on vehicular technology, 57(3):1417–1424, 2008.
-  Gabriel Resende Gonçalves, Sirlene Pio Gomes da Silva, David Menotti, and William Robson Schwartz. Benchmark for license plate character segmentation. Journal of Electronic Imaging, 25(5):053034, 2016.
-  Piyuan Li, Minh Nguyen, and Wei Qi Yan. Rotation correction for license plate recognition. In 2018 4th International Conference on Control, Automation and Robotics (ICCAR), pages 400–404. IEEE, 2018.
-  Hui Li, Peng Wang, and Chunhua Shen. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 20(3):1126–1136, 2018.
-  Chao Gou, Kunfeng Wang, Yanjie Yao, and Zhengxi Li. Vehicle license plate recognition based on extremal regions and restricted boltzmann machines. IEEE transactions on intelligent transportation systems, 17(4):1096–1107, 2015.
-  Orhan Bulan, Vladimir Kozitsky, Palghat Ramesh, and Matthew Shreve. Segmentation-and annotation-free license plate recognition with deep localization and failure identification. IEEE Transactions on Intelligent Transportation Systems, 18(9):2351–2363, 2017.
-  Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, and Lele Xie. Aggregation cross-entropy for sequence recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6538–6547, 2019.
-  Canjie Luo, Lianwen Jin, and Zenghui Sun. Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognition, 90:109–118, 2019.
-  Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 2018.
Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang.
Show, attend and read: A simple and strong baseline for irregular
Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8610–8617, 2019.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. Advances in Neural Inf. Process. Syst., pages 2672–2680, 2014.
-  Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015.
-  Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv:1411.1784, 2014.
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.
Unpaired image-to-image translation using cycle-consistent adversarial networks.In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
-  Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv:1701.07875, 2017.
-  Xinlong Wang, Mingyu You, and Chunhua Shen. Adversarial generation of training examples for vehicle license plate recognition. arXiv preprint arXiv:1707.03124, 2017.
-  Yule Yuan, Wenbin Zou, Yong Zhao, Xinan Wang, Xuefeng Hu, and Nikos Komodakis. A robust and efficient approach to license plate detection. IEEE Transactions on Image Processing, 26(3):1102–1114, 2016.
-  Gee-Sern Hsu, Jiun-Chang Chen, and Yu-Zu Chung. Application-oriented license plate recognition. IEEE transactions on vehicular technology, 62(2):552–561, 2012.
Xception: Deep learning with depthwise separable convolutions.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
-  Laurent Sifre and Stéphane Mallat. Rigid-motion scattering for image classification. Ph. D. dissertation, 2014.
-  Changhao Wu, Shugong Xu, Guocong Song, and Shunqing Zhang. How many labeled license plates are needed? In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 334–346. Springer, 2018.
-  Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
-  Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), pages 21–37. Springer, 2016.
-  Sergey Zherzdev and Alexey Gruzdev. Lprnet: License plate recognition via deep neural networks. arXiv preprint arXiv:1806.10447, 2018.
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao.
Joint face detection and alignment using multitask cascaded convolutional networks.IEEE Signal Processing Letters, 23(10):1499–1503, 2016.
-  Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, and Mingxiang Cai. Decoupled attention network for text recognition. arXiv preprint arXiv:1912.10205, 2019.
-  Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015.
-  Syed Zain Masood, Guang Shu, Afshin Dehghan, and Enrique G Ortiz. License plate detection and recognition using deeply learned convolutional neural networks. arXiv:1703.07330, 2017.