Automatic Construction of a Recurrent Neural Network based Classifier for Vehicle Passage Detection

by   Evgeny Burnaev, et al.

Recurrent Neural Networks (RNNs) are extensively used for time-series modeling and prediction. We propose an approach for automatic construction of a binary classifier based on Long Short-Term Memory RNNs (LSTM-RNNs) for detection of a vehicle passage through a checkpoint. As an input to the classifier we use multidimensional signals of various sensors that are installed on the checkpoint. Obtained results demonstrate that the previous approach to handcrafting a classifier, consisting of a set of deterministic rules, can be successfully replaced by an automatic RNN training on an appropriately labelled data.



There are no comments yet.


page 1

page 2

page 3

page 4


NeuTM: A Neural Network-based Framework for Traffic Matrix Prediction in SDN

This paper presents NeuTM, a framework for network Traffic Matrix (TM) p...

Learning Over Long Time Lags

The advantage of recurrent neural networks (RNNs) in learning dependenci...

Recurrent Neural Network-based Model for Accelerated Trajectory Analysis in AIMD Simulations

The presented work demonstrates the training of recurrent neural network...

Prediction of Hilbertian autoregressive processes : a Recurrent Neural Network approach

The autoregressive Hilbertian model (ARH) was introduced in the early 90...

OSTSC: Over Sampling for Time Series Classification in R

The OSTSC package is a powerful oversampling approach for classifying un...

Online Fall Detection using Recurrent Neural Networks

Unintentional falls can cause severe injuries and even death, especially...

The effect of phased recurrent units in the classification of multiple catalogs of astronomical lightcurves

In the new era of very large telescopes, where data is crucial to expand...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Paper [1] describes an Automatic Vehicle Classifier (AVC) for toll roads, based on video classification and installed on most of Russian toll roads. Vehicle Passage Detector (VPD) is one of the most important parts of the AVC system. VPD uses as input several binary signals from other AVC subsystems (binary detectors), and makes decisions about a vehicle passage (whether a vehicle is located in the checkpoint area). VPD is based on a “voting scheme”: a vehicle passage is detected if most of the binary detectors provide a positive answer. This logic is augmented by a set of empirical rules, provided by a human expert to quantify and take into account time delays between switches of the binary signals, properties of a sequence of these switches and other information. These rules were extended and modified during AVC test deployment based on an analysis of encountered errors.

The previous paper, devoted to VPD in AVC [3], states that the VPD accuracy is . Since then the test dataset has been extended by new detection and classification error cases. It should be noted that in the current paper we use tests which run with a disabled trailer coupler detector. AVC version, described in the previous paper, provides accuracy on the new dataset if the coupler detector is disabled. At the same time the current classifier version provides accuracy without the coupler detector and with it. Comparing to the previous version the new classifier has optimized algorithms for a shield and a trailer couplers detection, a correlational detector and an updated fusion method for binary detectors aggregation.

Creating rules of this type is a painstaking job requiring creative approach. It would be interesting to develop an automatic method where a machine learning algorithm could replace a human expert. This approach potentially could also produce a higher classification quality. Therefore, in this paper we solve the problem of creating a method for automating an AVC synthesis and minimizing a human involvement.

2 Data description

The input data consists of AVC log records. Each file contains one or several vehicles passages. All the numerical experiments are conducted on a dataset consisting of log files. The system log is filled with three-dimensional signal samples

, each component of which is binarized and produced by one of the following sensors: a correlational detector (sensitive to changes in video-stream images), an induction loop (mounted inside a lane and sensitive to a metal), a shield detector (detects occlusions of a shield, located opposite to a camera). Also the records contain a frame sequence number, manually created reference signal (labels) and predictions of a basic classifier, which is based on human tweaked rules

[1]. A file record is created and saved in a database if and only if at least one of the input signals has changed.

Frame Shield Loop Cor Basic clf Ref. pass
196 1 0 0 0 0
201 1 1 0 0 0
202 0 1 1 1 0
208 1 1 1 1 1
246 0 1 1 1 1
266 1 1 1 1 0
268 1 1 0 1 0
269 1 0 0 0 0
270 0 0 0 0 0
Table 1: A log and a labeled sample

Table 1 shows a log sample. Notations: Shield — a shield detector, Loop — an induction loop, Cor — a correlational detector, Basic clf — a basic classifier prediction, Ref. pass — a reference passage.

3 Quality evaluation

The problem specifics dictates that a special quality passage evaluation metric should be used, which is equal to the standard pointwise two-class classification metric


only in extreme cases. The reason is that the standard metrics using pointwise difference between a reference signal and a predicted signal are not able to estimate a passage classification quality from a physically sensible viewpoint. The metric used in this paper is a

Pass Quality (PQ): where is a number of correctly detected passages, while is a sum of classification error costs on a whole test dataset. Calculation of for various error types (missed passage, merged passages, etc.) is a complicated procedure described in [4], see also table 2. Here denotes the true number of passages in the analysed test signal, denotes the number of detected passages (see details in [4]).

Ref. pass. Detected pass. Err. Desc. Err. Weight
1 1 no error 0
1 0 missed passage 1
0 1 false passage 1
L 1 merged passages L
1 K split passage K
L K multiple error max(L, K)
Table 2: Classification error costs

This quality metric does not take into account how far the detected passage is shifted from the ideal one. However the conducted experiments show that this is not needed for applications, since the sequence of correct passages and their intersections with real passages at least in one instant are important, not the delays themselves.

4 Exploratory analysis

In this section we compare results obtained with various machine learning methods: gradient tree boosting XGB (see description of the algorithm in [5]; the implementation from [7]

was used), logistic regression

LR from the scikit-learn package [8], fully connected neural network NN with a one hidden layer consisting of neurons (see approaches to training such networks in [9, 10, 11, 12]), simple recurrent neural network SimpleRNN from the Keras package [6].

Due to an atypical task partitioning of the data into the training and the control sets were organized as follows: logs for a respective set are randomly selected in a random order, but the order of frames inside every log remains unchanged.

Results for training on the source signal without accounting for past values are provided in table 3, see lines . The obtained results imply that due to the autocorrelation of , classification using standard methods without taking into account the dependency of on , provides low PQ values. The situation that PQ

values are comparable can be easily explained by the fact that at each moment of time the three-dimensional vector

can take only eight distinct values, thus in the considered case all methods can easily provide similar decision rules.

The first idea to increase classification quality w.r.t. the PQ metric is to extend the feature space by using previous values . The results of these experiments are provided in table 3, see lines , . The optimal window size

for each classifier type is selected using the cross-validation procedure. The threshold, producing a binary signal from a classifier output probability (provided by the logistic regression and the neural network), is selected by maximizing the objective function on the training set. One may see that this extension provides significant increase in quality. The

gives the best result. This result, however, is not better than that of the basic classifier.

The second natural idea is to use recurrent neural networks, which proved to be productive in labelling of sequences and time series classification. We will use the standard recurrent network SimpleRNN with a one recurrent layer and a one hidden layer [13]. It turns out that this model allows to obtain a higher quality which is pretty close to that of the basic classifier.

Classifier R PQ
3090.2 2802.3 0.532
3090.2 2802.3 0.532
3090.2 2802.3 0.532
1806.0 453.0 0.799
1794.1 302.6 0.856
1758.0 276.6 0.864
1784.8 214.3 0.892
1684.3 158.7 0.914
Table 3: Comparison of classifier models

Note: the uses an additional feature from the trailer coupler detector. This feature allows to distinguish between vehicles moving at a close distance and vehicles with trailers thus increasing the basic method accuracy PQ from to . In our experiments we do not use this feature, although this additional information could potentially increase classification quality.

The primary reason for a low prediction accuracy is that all algorithms optimize not the target quality metric but a different value — the mean squared prediction error MSE, which is not in a good correspondence with the target metric PQ. Hereinafter by an error we mean .

5 Classification based on RNNs

From the results of section 4 we can notice that only the RNN model provides results with the accuracy comparable to that of the baseline classifier, constructed manually by collecting and ensembling rules, distributed in time. Thus, RNNs is a promising approach to automate construction of classifiers and further improve the accuracy of VPD.

5.1 RNN Architecture

The original recurrent neural network model SimpleRNN

contains only one hidden layer; the output signal of each neuron is used as an input to the same neuron at the next moment of time. In case an input signal has a long duration, the exploding and the vanishing gradient problem appears when training the model and calculating gradients of a neural network performance function, see

[16] for more details. This effect stems from the fact that the gradients of the RNN’s performance function depend on the product of the gradients of neurons in the hidden layer, calculated for all successive values of the training signal, see Fig. 1; as a consequence, this product can take big absolute values as well as tends to zero.

One approach to avoid this effect is to use the Long Short-Term Memory Neural Network architecture (LSTM), which allows effective modelling of long range dependences in signals.

Figure 1: Long Range Dependence in RNN

In [17] (

) the authors proposed the neural network architecture called Gated Recurrent Unit (

GRU), based on the same principles as the LSTM, but it uses a more economical parametrization and fewer operations to compute an output signal. A more extensive overviews of RNN architectures can be found in [14, 15].

5.2 Selection of Rnn Architecture

To use RNN in practice, it is necessary to search for its optimal architecture. This section describes results on this matter. In experiments we use the following types of RNN: LSTM, GRU and SimpleRNN. All experiments are performed using the Keras framework [6], which is a wrapper of the Python-libraries Theano [18] and TensorFlow [19].

For each of the RNN

types we conduct a series of experiments: we play with network hyper-parameters, activation function types and a window length

. Since the training time of an RNN

with several layers is rather big, we fix the number of hidden layers to be equal to one, number of neurons in each layer is limited from above by eight, the maximum number of learning epochs is limited from above by

. In table 4 we provide results (averaged over experiments) of optimal architecture selection for each RNN type. One may see that selection of hyper-parameters, even in a limited space significantly improves the quality of SimpleRNN model, cf. with the first experiments presented in table 3. According to these results we decide to continue to use architecture containing LSTM layers, since it provides the highest performance.

Model R PQ
SimpleRNN 1723.0 161.1 0.915
LSTM 1751.8 145.3 0.923
GRU 1699.0 175.0 0.907
1684.3 158.7 0.914
Table 4: Comparison of different RNN models

5.3 Further improvements

Since when training a neural network we optimize a mean square point-wise error, which is not appropriate for the considered problem, in this section we consider various additional approaches to improve further the detection accuracy, evaluated by the PQ value. In particular, we consider the following tweaks: weighting of the mean square error, used as a performance function when training a neural network; smoothing the input signal by a morphological filter [20]; adding a penalty on a derivative of the neural network output, used when calculating the performance function; optimization of a threshold value, used to binarize the output signal, according to the target quality criterion PQ on the training set; applying the morphological filter to the neural network output signal.

It turned out that improvement of the detection accuracy can be obtained when expanding significantly the structure of the LSTM neural network. In fig. 2 we provide the modified architecture we use: one input LSTM-layer and two hidden Dense layers. However, if we do not use Dropout

transformation, despite the fact that the standard error

MSE decreases on the validation sample up to , the target error PQE increases on the test set. In turn, if we use Dropout transformation before each Dense layer, then MSE increases up to , but at the same time the target error PQE decreases for about .

Also we can achieve additional significant improvement of the detection accuracy by selecting the threshold value of the output signal via the cross-validation procedure on the training set and the subsequent application of the morphological filter to the neural network output, binarized using the selected threshold value.

Figure 2: Final LSTM-RNN model

Also we evaluate an importance of each input signal component by estimating its influence on the accuracy of the final model. In table 5 we provide values of the performance criterion for models, constructed using all possible combinations of input features. We can see that the best set of features is a pair .

Features R PQ
1714.1 153.9 0.918
1621.6 255.9 0.864
1679.5 200.8 0.893
1713.4 170.6 0.910
1738.7 130.8 0.930
1769.9 91.8 0.952
1767.6 93.2 0.950
1684.3 158.7 0.914
Table 5: Accuracy of the final LSTM-RNN model for different subsets of input signal components

6 Conclusions

We can achieve significantly better quality of classification equal to using only two input features, whereas in order to achieve the detection performance PQ equal to , the original (handcrafted) classifier takes as input additional fourth feature from the trailer coupler detector, without which the performance drops to . Thus, in this study we developed the automated approach for constructing and training a classifier which is superior in terms of VPD performance to the previously constructed classifiers. At this stage, further research is possible in several directions.

First, we can increase the classification accuracy by a direct optimization of the PQ criterion when training the neural network. The implementation of such learning algorithm is possible through the use of gradient-free optimization algorithms.

Second, we can create an aggregating mechanism for calculating a final decision from outputs of different detectors.

And finally, we can implement an integrated solution in order to eliminate the initial input data pre-processing, provided by classical image recognition methods and other additional steps of data processing, which happen between the event “a vehicle is shot with a camera” and the event “a binarized input signal

is produced”. In other words, we propose to use the following neural network structure, which realizes all VPD subsystems by a single stack of convolutional neural networks and RNNs:

  • On the first level we use a set of convolutional neural networks, processing images from all available cameras to extract features;

  • On the next levels features, extracted by the set of convolutional neural networks, are combined through the

    RNN architecture with signals, obtained from the induction loop and other devices of the AVC system;

  • Finally, the RNN type model with the structure similar to the one, shown in Fig. 2, is used for passage detection.

Acknowledgements: The work of the first author was supported by the RFBR grants 16-01-00576 A and 16-29-09649 ofi_m. The work of the other authors was conducted in IITP RAS and supported solely by the Russian Science Foundation grant (project 14-50-00150).