Road scenes analysis in adverse weather conditions by polarization-encoded images and adapted deep learning

Object detection in road scenes is necessary to develop both autonomous vehicles and driving assistance systems. Even if deep neural networks for recognition task have shown great performances using conventional images, they fail to detect objects in road scenes in complex acquisition situations. In contrast, polarization images, characterizing the light wave, can robustly describe important physical properties of the object even under poor illumination or strong reflections. This paper shows how non-conventional polarimetric imaging modality overcomes the classical methods for object detection especially in adverse weather conditions. The efficiency of the proposed method is mostly due to the high power of the polarimetry to discriminate any object by its reflective properties and on the use of deep neural networks for object detection. Our goal by this work, is to prove that polarimetry brings a real added value compared with RGB images for object detection. Experimental results on our own dataset composed of road scene images taken during adverse weather conditions show that polarimetry together with deep learning can improve the state-of-the-art by about 20 different detection tasks.



page 3

page 4

page 5


Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

Though deep learning-based object detection methods have achieved promis...

Learning to Automatically Catch Potholes in Worldwide Road Scene Images

Among several road hazards that are present in any paved way in the worl...

Robust and Real Time Detection of Curvy Lanes (Curves) with Desired Slopes for Driving Assistance and Autonomous Vehicles

One of the biggest reasons for road accidents is curvy lanes and blind t...

RestoreX-AI: A Contrastive Approach towards Guiding Image Restoration via Explainable AI Systems

Modern applications such as self-driving cars and drones rely heavily up...

Improved detection of small objects in road network sequences

The vast number of existing IP cameras in current road networks is an op...

Modeling Weather Conditions Consequences on Road Trafficking Behaviors

We provide a model to understand how adverse weather conditions modify t...

Low-latency Perception in Off-Road Dynamical Low Visibility Environments

This work proposes a perception system for autonomous vehicles and advan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Road scene understanding is a vital task nowadays because of the development of driving assistance systems. To get a secure navigation and avoid correctly obstacles in road scenes, it is important to get a robust detection. As one knows, it is primordial to ensure a guarantee of safety when talking about autonomous cars because the slightest dysfunction can lead to serious consequences implying human life. Currently, in ideal cases (i.e. good weather and good visibility), road scenes obstacles are well detected. An example of such systems are Mobileye

[30] or Waymo 111More information can be found at: that achieve a high detection accuracy in such ideal cases. However, when there’s variation of illumination or adverse weather conditions leading to unstable appearances in road scenes, most of the methods of the literature implying conventional vision sensors fail to efficiently detect road objects such as vehicles or pedestrians. Several methods using conventional sensors have been developed to perform a better detection in road scenes. A contrast restoration approach have been introduced by Hautière and al. [12], using a classical RGB camera in order to improve free-space detection in adverse weather conditions. In the same context, Babari and al. [2]

proposed an estimation of fog density to enhance object detection and visibility distance using conventional roadside cameras.

Even if all those methods contributed to improve object detection in road scenes, they also demonstrated the limits of using classical imaging sensors. Non-conventional sensors which bring additional features have been introduced in the autonomous driving field to overcome the detection problems occurring with conventional systems. For instance, infrared imaging enabled Miron and al. [20] to propose an enhanced pedestrian classification system. Infrared imaging was also used by Bertozzi and al. [4] for their proposed tetra-vision system for pedestrian detection.

Meanwhile polarization imaging was gaining popularity in other areas including 3D reconstruction [21], bio-medical imaging analysis [22] and military applications [23]. To our knowledge, there are few works that attempt to use polarization for road scene object detection [8], [15].

The principle of the polarization-encoded imaging is that it characterizes the reflected light wave from any object of the scene. Polarization is able to describe important physical properties of the object including its surface geometric structure, the nature of its material as well as its roughness [29], [5] even under poor illumination or strong reflections. The polarization state of the reflected light is highly related to the physical properties of an object such as its intensity, its shape and its reflection properties. It is important to know that polarization was applied in several fields However, this work, is the first one detecting road scene objects in adverse weather conditions.

Deep neural networks have demonstrated their efficiency regarding the object detection in an image. Those networks not only can detect an object but also achieve to make it really fast by processing several images per second. Girshick and al. [10]

proposed R-CNN, a region-based convolutional neural network that was able to detect objects in an image. It was the first network able to detect the region containing an object while being able to classify the object. This network then evolved to Fast R-CNN

[11] and to Faster-R-CNN [27] with both an improvement in accuracy and processing time. As a matter of fact, its processing time has a frame rate of 5fps and it achieved state-of-the-art object detection accuracy on PASCAL VOC 2007 [7]. This processing time was further improved by Redmond and al. [24] with YOLO. But even though it could detect objects in images at a frame rate of 45 fps, its accuracy couldn’t achieve the one of Faster-RCNN. Since YOLO, Liu and al. [19] proposed SSD, a single shot multibox detector that outperformed Faster R-CNN object detection accuracy on PASCAL VOC 2007 but with a higher frame rate of 16fps. More recently, Lin and al. [17] outperformed SSD detection accuracy on PASCAL VOC 2007 with RetinaNet. RetinaNet’s frame rate is 14fps which is slightly lower than SSD’s but the network achieves a higher performance in small objects detection. By the time, Redmond and al. improved YOLO since its first version to YOLOv2 [25] that outperformed SSD’s object detection accuracy on PASCAL VOC 2007 with a frame rate of 40fps, and recently released YOLOv3 [26]. The accuracy of all those networks as well as their processing time make them a major asset for object detection in road scenes as well as their deployment in the field of autonomous vehicles. Most of those networks have shown their efficiency for object detection in road scenes, using the KITTI dataset [9], [14].

We aim in this paper to combine the power of polarization to discriminate objects and deep neural networks to detect road scene content even in poor illumination conditions. The idea of using deep learning is motivated by our previous work that show how polarimetry may contribute efficiently to road scene analysis [8]

by using only classical machine learning methods (DPM, HOG). Thanks to their high accuracy for object detection, RetinaNet and SSD are chosen to reach our goals. Moreover, due to the lack of polarization road scene dataset, we constituted our own dataset in different weather conditions in Rouen City, France. Experiments show the positive impact of the combination of polarimetry and deep neural networks.

Ii Polarization Formalism and Motivations

Before giving further details regarding the work carried out in this paper, it is important to specify a few polarimetry notions.

Ii-a Polarization Formalism

Polarization is a property of light waves that can oscillate with more than one orientation. It represents the direction in which the wave is travelling. There exist three states of polarization of a light wave: totally polarized (i.e. the direction of the wave is well determined (elliptic, linear or circular)), unpolarized where the wave has random direction and partially polarized where the wave is a combination of two parts: a totally polarized part and an unpolarized part [3]. Polarimetric imaging is the representation of the polarization state of the light wave reflected from each pixel of the scene. It is mainly used to dissociate metallic object from dielectric surface [6]

. When an unpolarized light wave is being reflected, it becomes partially linearly polarized depending on the surface normal and on the refractive index of the material. That reflected light wave can be described by a measurable set of parameters, the linear Stokes vector S = [

, , ] where is the object total intensity whereas and describe roughly the amount of a linearly polarized light.

From the Stokes parameters, other physical properties may result such as the angle of polarization (AOP) and the degree of polarization (DOP) [1]. Polarization images are obtained with a polarizer oriented at a specific orientation angle placed between the scene and the sensor. In order to get the three Stokes parameters, at least three acquisitions with three different polarizer orientations are required. The polarimetric camera used for this work is of the range of Polarcam 4D Technology. It enables to get simultaneously four images, each being obtained with linear polarizer placed at 4 different angles (0°, 45°, 90°and 135°). For each angle , the camera measures an intensity for the scene. The relationship between the Stokes parameters and the intensities, for each , measured by the camera is given by:

The AOP and DOP can be determined from the obtained Stokes vector by:

The refers to the quantity of polarized light in a wave. It is up to one for the totally polarized light and to zero for unpolarized light. The is the orientation of the polarized part of the wave with regards to the incident plan

Ii-B Motivations

In our previous work [8]

, we demonstrated that by a convenient fusion scheme, polarization features with RGB ones achieve a higher performance for car detection purpose. In this work, a feature selection is performed to select the most informative one among five relevant polarization features. It was found that AOP is the most relevant informative feature. An AOP-based DPM detector (polar-based model) and a color-based DPM detector (color-based model) are then trained independently. Different score maps are produced by the two models. A fusion rule that takes the polar-based model as a confirmation (AND-fusion) of the color-based one to produce the final detection bounding boxes was performed. Experiments proved that taking the complementary information provided by the polarization feature reduces largely the false alarm rate (false bounding boxes), and improve the detection accuracy.

Going from these first encouraging results, this paper shows the effects of the most recent methods, based on deep learning frameworks, for polarization-based object detection purpose, including cars and pedestrians. This work aim to demonstrate that polarization is beyond just a rich panel of color information. The physical information provided by this modality is learned by the used deep architectures and outperform the classical detection methods.

Iii Methodology

Achieving a strong and reliable learning requires both numerous and truthful data. As available polarimetric data are scarce, it was decided to acquire some new polarization images in real scenarios to complement existing RGB databases as explained below.

This work shows the performances of both RetinaNet and SSD pre-trained on the MS COCO RGB dataset [18] and fine tuned on different polarization channels combinations; Intensities related to polarization angles (, , ), Stokes vector (, , ) and (, DOP, AOP). The weights of RetinaNet-50 (which refers to RetinaNet using ResNet50 [13] as a backbone) and SSD300 (which refers to SSD taking a 300x300 input) using VGG16 [28] as a backbone are kept fix for these new input images to evaluate the results at a first stage. Fine tuning has been then achieved on a 2730 polarization images database 222The dataset and the weights of RetinaNet-50 fine tuned on it can be found at: to prove how Polarization-based detection overcomes the RGB-based one.

Iii-a Data Acquisition

In order to diversify the images, the acquisition were made while driving to get the more realistic possible images scenario. Following the methodology of the Berkeley Deep Drive database [31], a polarimetric camera was mounted behind the windshield at the height of the driver eyes in order to make real time acquisitions representing what the driver actually sees in his car. By doing so, we were able to get a large diversity of road scenes which enabled us to avoid over-fitting when training the network.

It is important to note that a rig made of a RGB camera placed next to the polarimetric camera was also used to acquire images that are used for the testing. By doing so, we are able to test our network on the exact same scenes on two different modalities (RGB, polarimetry).

We would like to stress on the point that the training data were acquired under sunny weather conditions during winter whereas the testing data were taken under foggy conditions in autumn.

Iii-B Data Sorting and Labelling

Once enough data were collected, it was important to sort those data in order to maximize the diversity of the database. Our polarimetric camera has a frequency of 15 frames per second so, knowing that acquisitions took place mostly in cities where the speed limit is relatively low and the car had to stop briefly due to road signals, it was decided to keep 1 out of 25 frames. By doing so, a trade-off was achieved, the resemblance between two successive images of the database was minimized while maximizing the number of images in the database.

Afterwards each image was labelled by means of bounding boxes. 4 categories of object (car, person, bike, motorbike), which are the most common objects present in road scenes were used. Every object was labelled in our data-set, including semi occluded objects such as cars behind obstacles, mostly occluded objects such as parts of windshields corresponding to cars parked behind many others in car parks and small objects such as cars far away. Figure 1 illustrates the level of precision of this labelling.

Fig. 1: Labelling of the objects in a parking

Our final dataset contains 2730 labelled polarization images divided into 2221 images for the training set and 509 for the testing set. Those images contain about 23K labelled objects including about 90% objects of the class ’cars’ for both the training and the testing sets and 10% objects of the class ’person’. Table I sums up the number of labelled objects in each class for the training and testing sets. There are less than 30 objects for each of the classes ’bike’ and ’motorbike’ that is the reason why theses classes were not considered in the present study. In order to compare the performances with the detection on RGB images, there’s also a testing set labelled containing 509 RGB images with the equivalent of the polarization images. Figure 2 shows an example of those two testing sets. As it was pointed out earlier, the training set only contains images that were taken in sunny days and the testing set only contains images that were taken in foggy days. With this configuration, it was possible to see if polarization images, characterizing an object by its reflection and not only by its shape or intensity, could overcome classical image detection when the weather conditions are not optimal. The MS COCO dataset contains the same configuration as in our dataset, including only few images in adverse weather conditions which are not enough to enable the network to properly detect objects in such conditions. By taking only images in good weather conditions for the training purpose, we were on the same basis. We could then compare the results of classical images detection with RetinaNet-50 and SSD300 trained on the MS COCO dataset and the detection results on polarization images with the same networks after fine tuning on the polarimetric dataset.

Class name Training set Testing set
car 11687 9265
person 1488 442
bike 4 12
motorbike 21 0
TABLE I: Number of labelled objects in the database
Fig. 2: On top left , on top right , on bottom left and on bottom right the equivalent of this scene in RGB

Iii-C Data Encoding

Referring to the polarization formalism, the AOP and DOP values lie in the following intervals:

To avoid the neural network to get confused by different data formats, each parameter were normalized between 0 and 255. The polarimetric channels could get values in the same format than the RGB and thus be processed in the same way.

Iv Experiments

After collecting, sorting and labelling all the data needed, it was possible to start training the networks.

Iv-a Experimental Setup

To remind the experiments conditions, a model of RetinaNet-50 and a model of SSD300 using a VGG16 as a backbone are used. Both of them are pre-trained on the MS COCO dataset. In order to improve the detection on polarization images and compare the detection with the one in classical images, the cited models were fine tuned on the polarimetric database using the MS COCO dataset as a basis. Because the polarization channels combinations presented in the introduction (i.e. (, , ), (, , ) and (, AOP, DOP)) contain different information, the networks were fine tuned on each combination separately. The MS COCO dataset is a good basis to fine tune the networks as it contains different road scenes classes including the 4 classes of the polarimetric dataset. Figure 3 summarizes up the experimental setup.

Fig. 3: Experimental setup

Because fine tuning requires a very low learning rate to be able to learn the new parameters efficiently, RetinaNet-50 was trained on 50 epoch with a learning rate of

and SSD300 on 100 epochs with a learning rate of . The Adam optimizer [16] with a learning rate was used for both networks. It is important to note that 50 and 100 epochs for respectively RetinaNet-50 and SSD300 are fixed to make sure the network would converge at the end of the training. The optimal weights are found according to the loss value.

Iv-B Results and Discussion

As a reminder, the networks were trained on a database containing only road scenes when the weather is sunny in order to be synchronized with the MS COCO dataset. Because of the low number of samples in classes ’bike’ and ’motorbike’ they were skipped in this experiment. The mean average precision for the data format {RGB, (, , ), (, , ), (, AOP, DOP)} used in this work is given by:

where and are the average precision respectively for the classes ’person’ and ’car’ for the related data format while and are the number of instances in the testing set for respectively the classes ’person’ and ’car’.

After fine tuning, the was computed for each of the polarization channels combination using the updated weights for each one of them. For RGB, the of the detection from the RetinaNet-50 and the SSD300 trained on the MS COCO dataset was used.

Entries Class name AP no FT AP FT
RGB person 0.8254 X
car 0.6639 X
0.6706 X
(, , ) person 0.8556 0.9079
car 0.6064 0.7290
0.6177 0.7371
(, , ) person 0.6945 0.8969
car 0.4114 0.7375
0.4243 0.7448
(, AOP, DOP) person 0.0166 0.3585
car 0.1265 0.6050
0.1215 0.5938
TABLE II: Comparison of the detection with RetinaNet-50 before and after fine tuning. AP no FT and AP FT stand respectively for Average Precision before Fine Tuning and Average Precision after Fine Tuning.

As it can be seen in Table II, when dealing with the polarization channels combinations without fine tuning, the network fails to detect all the objects in the road scene. After fine tuning, the detection with RetinaNet-50 in two polarization channels combinations ((, , ) and (, , )) overcome the classical RGB detection when it comes to car and pedestrian detection. However, regarding the detection with SSD300 after fine tuning, it overcomes the classical RGB detection for car as well as for person detection with only one polarization channels combination, (, , ).

The percentage of error rate evolution for the object and the data format is given by:

where is the average precision for object with the RGB data format while denotes the average precision on the object and the related data format .

For (, , ), the error rate decreased is of 21.90% for the car detection and of 47.25% for the person detection, which are important improvements in term of object detection.

SSD300 is known for its bad detection of small objects unlike RetinaNet-50. On top of that, data augmentation is very important to enable SSD300 to learn the new features correctly. This alternative solution was not possible in the polarization context because data augmentation does not preserve the physical interpretation of the scene it represents. Based on this observation, data augmentation was not used in the training phase. As a consequence, SSD300’s architecture might not be adapted to properly learn the object polarimetric features. Polarimetric imaging could thus be a real added value, especially for small objects detection and adverse weather conditions. When objects are too small to be detected from their shape or altered due to bad visibility, they can be characterized according to their reflection, which doesn’t change with the object’s size or occlusion. By learning these new features, the network is still able to detect the objects in altered conditions thanks to the polarimetric property of the reflection.

(a) Detection results before fine tuning on (, , ), (, , ) and (, AOP, DOP)
(b) Detection results after fine tuning on (, , ), (, , ) and (, AOP, DOP)
Fig. 4: Detection results with RetinaNet-50

Figure 3(a) shows results of RetinaNet-50 detection before fine tuning on the polarization channels combinations and Figure 3(b) the detection with the same neural network on the same channels combinations but after fine tuning. The illustrated results prove that fine tuning enabled RetinaNet-50 to detect efficiently objects on polarimetric road scenes images. The network learned successfully these new physical features.

The performed experiments and the obtained results show that polarimetric imaging is a real asset in the field of object detection in road scenes. As a matter of fact, this experimental setup showed that even if the network wasn’t trained on scenes in adverse weather conditions for both RGB and polarimetric detection purposes, the detection on polarization images achieved better results. An illustration of those performances can be found in Figure 5. Polarimetric parameters describe an object features regarding its reflection. This new physical property takes over in the detection process when the shape or the intensity are difficult to detect.

Fig. 5: Comparison of the detection in foggy weather in the same scene with the different parameters. On top left (, , ), on top right (, , ), on bottom left (, AOP, DOP) and on bottom right RGB. For the polarization channels combinations detection, blue boxes refer to car and orange boxes refer to pedestrian and for the RGB detection, orange boxes refer to cars and purple boxes refer to pedestrian.

V Conclusion and Future Work

This paper proves that polarimetric imaging is a real added value in the field of object detection in road scenes. Polarization images associated with deep networks are able to efficiently detect objects in the scene even in case of adverse weather or in presence of small objects.

In the future, the polarimetric database will be increased so as to get more objects in the underrepresented categories (bike and motorbike) and for different weather conditions. Having a large dataset of polarization images and its counterpart in RGB, would enable to achieve fine tuning on both of them and make a stronger comparison for object detection. It would also be interesting to achieve data augmentation on this dataset in order to reinforce the learning while keeping the polarimetric physical meaning. A fusion scheme of polarization channels with RGB images merits to be studied thoroughly to enhance the average precision of both of them separately.


This work is supported by the ICUB project 2017 ANR program : ANR-17-CE22-0011. We also thank our colleagues from the Criann who provided us some computation resources with Myria and by so enabled us to get our results efficiently and faster.


  • [1] S. Ainouz, O. Morel, D. Fofi, S. Mosaddegh, and A. Bensrhair (2013) Adaptive processing of catadioptric images using polarization imaging: towards a pola-catadioptric model. Optical engineering 52 (3), pp. 037001. Cited by: §II-A.
  • [2] R. Babari, N. Hautière, É. Dumont, N. Paparoditis, and J. Misener (2012) Visibility monitoring using conventional roadside cameras–emerging applications. Transportation research part C: emerging technologies 22, pp. 17–28. Cited by: §I.
  • [3] M. Bass, E. W. Van Stryland, D. R. Williams, and W. L. Wolfe (1995) Handbook of optics. Vol. 2, McGraw-Hill New York. Cited by: §II-A.
  • [4] M. Bertozzi, A. Broggi, M. Felisa, G. Vezzoni, and M. Del Rose (2006) Low-level pedestrian detection by means of visible and far infra-red tetra-vision. In 2006 IEEE Intelligent Vehicles Symposium, pp. 231–236. Cited by: §I.
  • [5] M. Blanchon, O. Morel, Y. Zhang, R. Seulin, N. Crombez, and D. Sidibé (2019) Outdoor scenes pixel-wise semantic segmentation using polarimetry and fully convolutional network. In

    14th International Conference on Computer Vision Theory and Applications (VISAPP 2019)

    Cited by: §I.
  • [6] M. Born and E. Wolf (2013) Principles of optics: electromagnetic theory of propagation, interference and diffraction of light. Elsevier. Cited by: §II-A.
  • [7] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2), pp. 303–338. Cited by: §I.
  • [8] W. Fan, S. Ainouz, F. Meriaudeau, and A. Bensrhair (2018) Polarization-based car detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3069–3073. Cited by: §I, §I, §II-B.
  • [9] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3354–3361. Cited by: §I.
  • [10] R. Girshick, J. Donahue, T. Darrell, and J. Malik (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587. Cited by: §I.
  • [11] R. Girshick (2015) Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1440–1448. Cited by: §I.
  • [12] N. Hautière, J. Tarel, H. Halmaoui, R. Brémond, and D. Aubert (2014) Enhanced fog detection and free-space segmentation for car navigation. Machine vision and applications 25 (3), pp. 667–679. Cited by: §I.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016-06) Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §III.
  • [14] J. Janai, F. Güney, A. Behl, and A. Geiger (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519. Cited by: §I.
  • [15] A. Kamann, P. Held, F. Perras, P. Zaumseil, T. Brandmeier, and U. T. Schwarz (2018) Automotive radar multipath propagation in uncertain environments. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 859–864. Cited by: §I.
  • [16] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-A.
  • [17] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollar (2017) Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. Cited by: §I.
  • [18] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In ECCV, pp. 740–755. Cited by: §III.
  • [19] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) Ssd: single shot multibox detector. In European conference on computer vision, pp. 21–37. Cited by: §I.
  • [20] A. Miron, A. Rogozan, S. Ainouz, A. Bensrhair, and A. Broggi (2015) An evaluation of the pedestrian classification in a multi-domain multi-modality setup. Sensors 15 (6), pp. 13851–13873. Cited by: §I.
  • [21] O. Morel, C. Stolz, F. Meriaudeau, and P. Gorria (2006) Active lighting applied to three-dimensional reconstruction of specular metallic surfaces by polarization imaging. Applied optics 45 (17), pp. 4062–4068. Cited by: §I.
  • [22] T. Novikova, J. Rehbinder, J. Vizet, A. Pierangelo, R. Ossikovski, A. Nazac, A. Benali, and P. Validire (2018) Mueller polarimetry as a tool for optical biopsy of tissue. In 2018 International Conference Laser Optics (ICLO), pp. 553–553. Cited by: §I.
  • [23] A. L. Rankin and L. H. Matthies (2010) Passive sensor evaluation for unmanned ground vehicle mud detection. Journal of Field Robotics 27 (4), pp. 473–490. Cited by: §I.
  • [24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016-06) You only look once: unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [25] J. Redmon and A. Farhadi (2017) YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271. Cited by: §I.
  • [26] J. Redmon and A. Farhadi (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767. Cited by: §I.
  • [27] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99. Cited by: §I.
  • [28] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §III.
  • [29] L. B. Wolff and A. G. Andreou (1995) Polarization camera sensors. IVC 13 (6), pp. 497–510. Cited by: §I.
  • [30] D. B. Yoffie (2014) Mobileye: the future of driverless cars. Harvard Business School Case, pp. 715–421. Cited by: §I.
  • [31] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell (2018) BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687. Cited by: §III-A.