Log In Sign Up

Computer vision based vehicle tracking as a complementary and scalable approach to RFID tagging

by   Pranav Kant Gaur, et al.
Bhabha Atomic Research Centre

Logging of incoming/outgoing vehicles serves as a piece of critical information for root-cause analysis to combat security breach incidents in various sensitive organizations. RFID tagging hampers the scalability of vehicle tracking solutions on both logistics as well as technical fronts. For instance, requiring each incoming vehicle(departmental or private) to be RFID tagged is a severe constraint and coupling video analytics with RFID to detect abnormal vehicle movement is non-trivial. We leverage publicly available implementations of computer vision algorithms to develop an interpretable vehicle tracking algorithm using finite-state machine formalism. The state-machine consumes input from the cascaded object detection and optical character recognition(OCR) models for state transitions. We evaluated the proposed method on 75 video clips of 285 vehicles from our system deployment site. We observed that the detection rate is most affected by the speed and the type of vehicle. The highest detection rate is achieved when the vehicle movement is restricted to follow a movement restrictions(SOP) at the checkpoint similar to RFID tagging. We further analyzed 700 vehicle tracking predictions on live-data and identified that the majority of vehicle number prediction errors are due to illegible-text, image-blur, text occlusion and out-of-vocab letters in vehicle numbers. Towards system deployment and performance enhancement, we expect our ongoing system monitoring to provide evidences to establish a higher vehicle-throughput SOP at the security checkpoint as well as to drive the fine-tuning of the deployed computer-vision models and the state-machine to establish the proposed approach as a promising alternative to RFID-tagging.


page 1

page 2

page 3

page 4


Computer Vision based Animal Collision Avoidance Framework for Autonomous Vehicles

Animals have been a common sighting on roads in India which leads to sev...

Deep Learning Based Vehicle Tracking System Using License Plate Detection And Recognition

Vehicle tracking is an integral part of intelligent traffic management s...

Computer Vision based Accident Detection for Autonomous Vehicles

Numerous Deep Learning and sensor-based models have been developed to de...

Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis

The rapid advancement in the field of deep learning and high performance...

Automated Approach for Computer Vision-based Vehicle Movement Classification at Traffic Intersections

Movement specific vehicle classification and counting at traffic interse...

Monocular Vision-based Prediction of Cut-in Maneuvers with LSTM Networks

Advanced driver assistance and automated driving systems should be capab...

Automatic vehicle tracking and recognition from aerial image sequences

This paper addresses the problem of automated vehicle tracking and recog...

1. Introduction

Logging of incoming/outgoing vehicles serves as a piece of critical information for root-cause analysis to combat security breach incidents in various sensitive organizations. Subject to the severity of various internal maintenance and expansion activities, such organizations witness a large inflow of both internal(or official) as well as external(or private) vehicles, thereby, rendering the manual tracking of vehicle movement, cumbersome, resource-intensive, time-consuming, and most importantly ineffective in providing rapid and reliable inputs. RFID vehicle tagging alleviates speed and accuracy limitations but limits the applicability to departmental vehicles, as private vehicles with temporary entry permits cannot be tagged without introducing further delays in the process. Further, in practice, coupling video solutions–for instance to track vehicle direction– with RFID technology has been tricky, as multiple incoming vehicles can be detected in arbitrary order, and capturing video/photographic evidence sometimes generates data without the RFID reported vehicle, thereby rendering it useless. On the other hand, computer vision algorithms, owing to their rapid advancement in the public domain, provide access to state-of-art implementations for problems like object detection, and optical character recognition(OCR) which can be leveraged to build a reliable, scalable and a complementary solution RFID tagging.

Related work

: Distinct vehicle counting and tracking are the active research areas pertaining to smart transportation, surveillance and security. Researchers involved in these domains have followed diverse methodologies to tackle this problem. Broadly, these solutions are based on frame differencing, counting by detection and density estimation. With advancement of deep learning and computer vision, researchers have also applied deep learning in field of detection and tracking. In

(Yang et al., 2020)

author presented a detection based tracking method, where vehicles are detected using YOLO and then a lightweight feature extraction network is used to extract discriminative features. Author has also discussed the robustness of this method in case of external interferences like occlusion. In

(Hao et al., 2014) author has proposed a novel particle filter algorithm for vehicle tracking. Author has also proposed block symmetry in observation model to improve results in background with similar colors, under partial occlusion. In (Withopf and Jahne, 2006)

author trained a tracking specific model with automatic feature selection. In this work, discriminative features are constructed from pairs of image patches. In

(Mandal and Adu-Gyamfi, 2020) authors have conducted experiments with different detection and tracking models. They have deployed and tested deep learning solutions like CenterNet and Deep SORT, Detectron2 and Deep SORT, YOLOv4 and Deep SORT for detection and tracking. In (Hua and Anastasiu, 2019) a vehicle tracking algorithm is proposed for smart traffic networks. In this author presents track by detection paradigm, where vehicle detection is followed by extended IoU tracker. Extended IoU tracker is fused with historical tracking information to improve the robustness of tracking.

Contribution of this work: We have demonstrated that a performant and interpretable(Demichelis, 2022) baseline approach for vehicle tracking could be developed over publicly available object detection and OCR models using finite-state formalism without resorting to deep-learning based object tracking methods. We highlight the effect of vehicle classification bias on vehicle detection rate as a limitation of the reported method as well as the ways to combat it. Similar to the spirit of vehicle tracking, we reaffirm a known fact that the off-the-shelf OCR models are robust enough to reduce the need for a complicated prediction-filtering logic over vehicle-number predictions generated across multiple image-frames of a vehicle.

2. Problem formulation

A vehicle is considered as logged if an entry as shown in Table-1 is generated once the corresponding vehicles arrive at the security checkpoint.

Vehicle number Vehicle type Vehicle timestamp
MH03CS0071 Car June 7 2022 10:40:00 GMT
KA06N9659 Bus June 12 2022 12:23:00 GMT
Table 1. ANPR vehicle log

2.1. Constraints

Vehicles movement is bidirectional, Vehicle numbers may include special characters like in addition to the regular vehicle number alphabets for India. Vehicle classes include: Car, Jeep, Bus and Truck, it is important to have uniform vehicle tracking and number prediction performance across all vehicle classes. A very high vehicle number-prediction accuracy is desired because departmental vehicles are procured in bulk, and therefore, get assigned consecutive vehicle numbers, one letter flip can map the prediction to another valid vehicle number.

2.2. Metric

The vehicle-detection rate(Lyons, 2020) is the fraction of the vehicles detected by the system to the total number of observed vehicles for a specific duration. The vehicle number prediction from images can be treated as an applied case of Optical character recognition, therefore, a negation of Word-error-rate(Smith, 2007) or the Word accuracy(WA)–referred here as the vehicle-number prediction accuracy–has been used to assess the quality of vehicle number predictions.


(None, *), None

(a, *), fc++

(a, *), fc++

(None, *), zc++

(a, *), , fc = 0, zc = 0, write vehicle log

(b, *), fc = 0

(None, *), , fc = 0, zc = 0


(None, *), zc++

(a, *), None
Figure 1. Vehicle tracking automaton

3. Methodology

Vehicle tracking algorithm receives inputs from vehicle type, number-plate and vehicle-number prediction models. Algorithm 1 summarises the overall approach:

Function trackVehicle 
       frame_counter = 0;
       zero_counter = 0;
       frame_thresh = ;
       zero_thresh = ;
       last_state = 0;
       forall image in image_stream do
             lp_bbox, lp_pred, v_class = pfrs_process(, pred_confidence);
             next_state = pers_process(lp_bbox, lp_pred, last_state, frame_counter, zero_counter);
             last_state = next_state;
       end forall
Algorithm 1 Proposed vehicle tracking algorithm

The algorithm takes each image as input and passes it to the function which encapsulates the joint vehicle-detection(with class prediction) followed by vehicle-number prediction. The vehicle tracking algorithm, denoted here as initializes in no vehicle detected state and receives number-plate bounding box, vehicle class and number-plate text predictions from to updates its internal state with corresponding actions, which when transitioning from vehicle-detected in current image to a unique vehicle detected(across last few frames) state implies writing a new vehicle entry(ref. Table-1) in the vehicle-log database.

3.1. Joint Vehicle and Number-plate detection

The objective of vehicle and number-plate detection stage is to filter number-plate bounding box prediction for downstream consumption by number-plate prediction model. receives image from an image-source(a video, webcam-feed or live-stream) and invokes YOLOv5 model(Jocher et al., 2022) for vehicle and number-plate detection. In each image-frame, multiple-vehicles and number-plates may get detected, therefore, we apply the following 3-stage prediction filtering criterion(in order): 1. All predictions must have the prediction confidence beyond a threshold. 2. A number-plate prediction is passed downstream only if it lies within a vehicle bounding-box. 3. Vehicle bounding box predictions are then filtered out and the remaining number-plate bounding box predictions are sent downstream.

3.2. Number-plate recognition

The objective of number-plate recognition model is to generate vehicle number prediction corresponding to the cropped image received from the joint vehicle and number-plate detection model. Considering the generality of OCR data-sets((Jaderberg et al., 2016)) and the variety of Indian number-plates(Nadiminti et al., 2022)

both in terms of number-plate background and text-font, we have used an open-source OCR model, the PaddleOCR

(Du et al., 2020) for vehicle number recognition. Unlike YOLOv5, we did not require fine-tuning PaddleOCR for majority of vehicle numbers, but defense vehicle numbers(with ) will require model fine-tuning.

3.3. Vehicle tracking algorithm

The vehicle-tracking algorithm has been modelled as a finite-state automaton as shown in fig. 1. The automaton starts in , which represent no vehicle detected condition and updates its state and takes corresponding actions for each image-frame received from the video-source. Inputs for the automaton are number-plate bounding box and number-plate text predictions received from . State represents vehicle detected in current image-frame condition. System stays in until it has not received sufficient number of similar vehicle bounding box predictions, represented by the state variable (or ). For all state-transitions except to , actions of automaton involve updating its state-variables like and (or

) based on heuristically derived

(or ) and (or ), which represent, the minimum votes – in terms of the number of successive predictions and no-predictions, respectively – required to make a state-transition. accumulates the number of no-detection image-frames in any state, in order to prevent frequent transitions to . For to , the automaton writes the vehicle log for the vehicle just detected to the database.

3.4. Standard operating procedure

A SOP defines spatial and temporal restrictions on the movement of vehicles at the security checkpoint. The vehicle is expected to stop within a rectangular region defined as the ROI for a minimum duration(0.6 seconds). The ROI constraint assists the vehicle number prediction by providing it with an optimal trade-off between image-quality and perspective distortion, whereas the minimum time-duration assists vehicle tracking algorithm by providing it sufficient number of frames to register a track as valid.

4. Experiments

4.1. Ground-truth generation process

We have evaluated our method on both test-videos as well as live-stream. The test video clips have been collected covering all times for an entire week. Out of 75 test-clips, 65 clips include vehicles following SOP, whereas 10 videos have a mix of vehicles under SOP and rush-hours(¿80 vehicles/hour). For each video, we have manually recorded the entire vehicle-log, resulting in a database of 285 vehicles across entire week sampled at specific times of the days. Achieving accurate vehicle number-predictions under rainfall condition has been a challenging task, therefore, we have monitored live predictions of our system over 700 vehicle movement instances during the month of July in Mumbai.

4.2. Implementation

We have fine-tuned the PyTorch implementation of YOLOv5 base-model trained over COCO dataset

(Lin et al., 2014)

for vehicle and number-plate detection over 150 images spread over vehicle classes. Specifically, we annotated each input image with vehicle bounding boxes, vehicle-classes(car, bus, jeep and truck) and number-plate coordinates. The model was fine-tuned for 250 epochs with batch-size 16 and 640X640 input size.

For number-plate recognition, we have used the publicly available PP-OCRv2 model for English text (Du et al., 2020). The vehicle-tracking algorithm is implemented in Python. The system runs over a virtual machine with Quadro-M4000 GPU in our on-premise cloud service and accesses live feed from IP camera installed at security checkpoint.

4.3. Context for experiments

The proposed system is targeted towards the gradual automation of the manual vehicle-logging practice at the security checkpoint of our organization. While the duplicate detection of a vehicle is tolerable for system-acceptance in initial stages, a missed vehicle serves as a red-flag. The vehicle detection rate is therefore, the most crucial aspect of the system. Our goal has been to establish a data-driven vehicle-movement policy(or SOP) to be able to ensure close to 100% detection rate in subsequent iterations. In table-2, we report the sensitivity of the detection-rate of the proposed method to vehicle movement constraints,

Vehicle movement Detection rate # of instances
Vehicle following SOP 89.6 155
Vehicle violating SOP 42.3 130
Table 2. Effect of vehicle movement on detection rate

Once, the SOP is established as a way to achieve reasonable performance, we experimented with the fairness of our method with respect to its detection-rate performance across majority and minority vehicle-classes. From the data-collection activity to train, validate and monitor our system, we observed that around 98% of vehicles at the security checkpoint are either cars(17.5%) or jeeps(80.1 %), whereas, the trucks and bus classes represent only the remaining 2% of the traffic. The system, however, should have uniform detection rate across all vehicle classes. Given the dependence of detection bounding box filtering phase on vehicle-classes(ref. sec. 3.1 ), we measured the detection rate of our system across these vehicle classes. Table-3 provides the quantitative evidence for our observation that there is vehicle-class bias in our proposed method.

Vehicle class(under SOP) Detection rate(%) # of instances
Car, Jeep 94 139
Truck, Bus 25 8
Table 3. Effect of vehicle-class on detection rate

We assessed the robustness of vehicle number prediction over rainfall scenario for the system running over live-stream from the deployment site. Table-4 enlists the impact of various issues we observed across 700 vehicle detection instances once the system was evaluated in rainy season under SOP movement restrictions.

Issue Drop in WA(%) # of instances(out of 700)
Illegible text 6 42
Motion blur 2.5 18
Text occlusion 2.5 18
vehicle numbers 1.2 9
Table 4. Effect of image-quality, input-vocab on Vehicle number prediction

5. Discussion


established that there is a strong correlation between placing restrictions over vehicle movement at the security checkpoint and the detection rate. The proposed method, however, still misses vehicles under SOP conditions. The primary reasons based on debugging those cases revealed that the vehicle-class detection model generates predictions for rarely occurring vehicle-classes with lower confidence resulting in filtering of those predictions, due to which the tracking algorithm does not receive such predictions as input for state transitions. We can address this either by removing such bias from the vehicle-detection model by employing upsampling, data-augmentation and weighted loss-function approaches

(Qiu and Song, 2018; Johnson and Khoshgoftaar, 2019) for model-training or we can decouple the prediction filtering process from vehicle-class predictions. We would prefer model-training approach as it will also help us address the issues reported in Table-3.

In Table-4, we identified the major factors affecting the vehicle number prediction performance. Given that the most impactful issue was observed to be illegibility of the text, it provided us with the evidence that publicly available OCRs now match human-performance. Text occlusion are cases where the number-plate text is covered with dirt, rust or partially visible due to an occlusion from another vehicle or security operator. Incorrect predictions due to motion blur result not only from the mismatch between vehicle-speed and shutter-speed of IP camera, but also due to a naive prediction selection strategy currently employed which selects the last prediction before to transition(ref. Figure-1) as the representative vehicle number. Performance drop due to motion-blur, therefore, could be addressed first by using more sophisticated strategies, like vehicle number string clustering, fine-tuning PaddleOCR in addition to increasing the frame-threshold for vehicle-tracking algorithm, thereby allowing vehicle to come to a halt. Most importantly, the evidence from this monitoring exercise have helped us prioritize fine-tuning existing OCR models over training number-plate recognition models from scratch(Wu et al., 2018). In terms of the effort involved to monitor predictions on daily basis, the reported exercise took 12 man-hours, after which we started noticing patterns in the vehicle number prediction issues which could be monitored in algorithmic fashion, giving us sufficient motivation to automate the monitoring process in near future (Klaise et al., 2020; Rukat et al., 2020).

6. Conclusions

We have demonstrated that under the standard operating conditions of security checkpoints, like toll booths, it is feasible to achieve high vehicle-logging accuracy using an interpretable vehicle-tracking algorithm leveraging existing open-domain implementations of object detection and OCR models without resorting to deep-learning based object-tracking solutions. Experiments revealed that the automaton is sensitive to its vehicle-class input which could be addressed by removing corresponding bias in the underlying vehicle detection model. Our work is in progress along developing experimental evidences for establishing a high vehicle-throughput SOP for peak-hour traffic at the security checkpoint, addressing model bias against rarely encountered vehicle classes, as well as automation of our system-monitoring activity.


  • (1)
  • Demichelis (2022) Remy Demichelis. 2022. Science Facing Interoperability as a Necessary Condition of Success and Evil.
  • Du et al. (2020) Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, and Haoshuang Wang. 2020. PP-OCR: A Practical Ultra Lightweight OCR System.
  • Hao et al. (2014) Yanshuang Hao, Yixin Yin, and Jinhui Lan. 2014. Vehicle Tracking Algorithm Based on Observation Feedback and Block Symmetry Particle Filter. Journal of Electrical and Computer Engineering 2014 (13 Feb 2014), 520342.
  • Hua and Anastasiu (2019) S. Hua and D. C. Anastasiu. 2019. Effective Vehicle Tracking Algorithm for Smart Traffic Networks. In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE Computer Society, Los Alamitos, CA, USA, 67–6709.
  • Jaderberg et al. (2016) Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016.

    Reading Text in the Wild with Convolutional Neural Networks.

    International Journal of Computer Vision 116, 1 (jan 2016), 1–20.
  • Jocher et al. (2022) Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012, Yonghye Kwon, TaoXie, Jiacong Fang, imyhxy, Kalen Michael, Lorna, Abhiram V, Diego Montes, Jebastin Nadar, Laughing, tkianai, yxNONG, Piotr Skalski, Zhiqiang Wang, Adam Hogan, Cristi Fati, Lorenzo Mammana, AlexWang1900, Deep Patel, Ding Yiwei, Felix You, Jan Hajek, Laurentiu Diaconu, and Mai Thanh Minh. 2022.

    ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference

  • Johnson and Khoshgoftaar (2019) Justin M. Johnson and Taghi M. Khoshgoftaar. 2019. Survey on deep learning with class imbalance. Journal of Big Data 6, 1 (19 Mar 2019), 27.
  • Klaise et al. (2020) Janis Klaise, Arnaud Van Looveren, Clive Cox, Giovanni Vacanti, and Alexandru Coca. 2020. Monitoring and explainability of models in production.
  • Lin et al. (2014) Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.
  • Lyons (2020) Vivienne Lyons. 2020. Guidance on ANPR Performance Assessment and Optimisation.
  • Mandal and Adu-Gyamfi (2020) Vishal Mandal and Yaw Adu-Gyamfi. 2020. Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis.
  • Nadiminti et al. (2022) Sai Sirisha Nadiminti, Pranav Kant Gaur, and Abhilash Bhardwaj. 2022.

    Exploration of an End-to-End Automatic Number-plate Recognition neural network for Indian datasets.
  • Qiu and Song (2018) Qiang Qiu and Zichen Song. 2018. A Nonuniform Weighted Loss Function for Imbalanced Image Classification. In Proceedings of the 2018 International Conference on Image and Graphics Processing (Hong Kong, Hong Kong) (ICIGP 2018). Association for Computing Machinery, New York, NY, USA, 78–82.
  • Rukat et al. (2020) Tammo Rukat, Dustin Lange, Sebastian Schelter, and Felix Biessmann. 2020. Towards automated ML model monitoring: Measure, improve and quantify data quality. In MLSys 2020 Workshop on MLOps Systems.
  • Smith (2007) R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633.
  • Withopf and Jahne (2006) D. Withopf and B. Jahne. 2006. Learning Algorithm for Real-Time Vehicle Tracking. In 2006 IEEE Intelligent Transportation Systems Conference. 516–521.
  • Wu et al. (2018) Changhao Wu, Shugong Xu, Guocong Song, and Shunqing Zhang. 2018. How many labeled license plates are needed?
  • Yang et al. (2020) Bo Yang, Mingyue Tang, Shaohui Chen, Gang Wang, Yan Tan, and Bijun Li. 2020. A vehicle tracking algorithm combining detector and tracker. EURASIP Journal on Image and Video Processing 2020, 1 (28 Apr 2020), 17.