In the 1980s, Bajcsy  introduced the concept of active perception, as “a problem of intelligent control strategies applied to the data acquisition process”. This idea was later explored and termed “active vision”, with an emphasis on visual perception, by Aloimonos et al. . In their studies, it was shown that many vision problems could be solved in a much more efficient way by an active approach than a passive one. Active vision was later formalized as a special case of the attention problem, by Tsotsos .
Despite the advantages of being active, most vision-guided robotic systems are characterized by their passive perspectives. Most of them rely on camera’s built-in auto-exposure algorithms [15, 26], which set camera exposure by evaluating the mean brightness of an image. While these methods result in good images from the perspective of human, it is not always the case for a robot. Moreover, vision algorithms are typically trained on offline image datasets, which suffer from a significant camera sensor specific bias . This results in less generalized models, which are sensitive to camera parameters and often fail on poor exposed images.
Figure 1 demonstrates a failure case of an object detection algorithm when using auto-exposure. For this case, the illumination is low, 50lx and 200lx. The built-in auto-exposure uses very large shutter speed and voltage gain to compensate for this low light condition. While it increases the overall brightness of the acquired image, it fails the object detection algorithm as large shutter speed results in over-exposure around the object of interest and large voltage gain introduces noise. This example indicates that a finer control of camera’s intrinsic parameters is needed.
In this paper, we present a novel active control of camera parameters method, to make robot vision more robust against variation in illumination. Specifically, we investigate object detection algorithms, as object detection is one of the basic tasks in vision-guided robots. For camera parameters, we focus on shutter speed and voltage gain. There are manly two contributions in this work: 1) quantitative evaluation of object detection algorithms reveals their sensitivity to camera parameters ; 2) a novel active control of camera parameters method is proposed to improve the robustness of vision algorithms.
Ii Related Work
How to make robot vision robust to different light conditions is still a challenging problem in the robotics community [1, 22, 23, 28, 21, 17], as a slight ambient illumination change may produce large difference in the appearance of objects. In the literature, there are four common approaches to achieve this goal, from different perspectives.
and inferred albedo and surface normal from neural networks. Better illumination invariance could be achieved by using these representations instead of the original image. The second approach is to use multiple instance-based models, where each instance corresponds to one light condition. Belhumeur  proved that the set of images of an object in fixed pose but with variant illumination, forms a convex cone, and the dimension of this illumination cone equals the number of distinct surface normals. However, algorithms based on this approach typically need large amount of training data and have high computational cost. The third approach is camera sensor accommodation , which dates back to the 1970s. It was proposed that sensor accommodation, automatic control by computer over the parameters of camera, should be an integral part of the recognition process. This idea was later applied on active fixation in the context of object recognition 
. The fourth one is illumination preprocessing. Preprocessing has been a common practice in object recognition pipelines, which aims to improve the reliability of a vision system. For face recognition, particularly, a study demonstrated that illumination preprocessing is helpful in handling lighting variations.
In this paper, we use the third approach to achieve the robustness of object detection algorithms to different light conditions. It was proposed that camera parameters should be optimized with respect to different metrics, like image entropy in  and gradient information in . This methodology was further summarized in . In comparison with the previous work, our method emphasizes that the control of camera parameters should be optimized by the performance of specific vision applications, i.e. object detection algorithms, rather than using a generic metric.
Iii Camera Parameters on the Performance of Object Detection Algorithms
As introduced in Section I, camera parameters determine the quality of perceived images, which affect the performance of vision algorithms. In this section, we present our quantitative evaluation of object detection algorithms, with respect to different ambient illuminations and two camera settings, i.e. shutter speed and voltage gain.
One of the fundamental problems of common image datasets, such as [6, 9, 24, 16], is that they are image sensor biased. Most images are taken under good lighting condition and with proper exposure. In order to evaluate the sensitivity of object detection algorithms to camera parameters, a new dataset is introduced in this paper.
This dataset contains 2240 images in total, by viewing 5 different objects (bicycle, bottle, chair, pottedplant and tvmonitor), at 7 levels of illumination and with 64 camera configurations (8 shutter speeds 8 voltage gains). Each image is in 8-bit/color RGB format and of 1280x1204 resolution. All object instances in the dataset are manually annotated with class labels and bounding boxes. Samples of this dataset can be found in Figure 2. The full dataset is published at http://jtl.lassonde.yorku.ca/software/datasets/.
To accurately measure the illumination of the scene, a Yoctopuce light sensor 111http://www.yoctopuce.com/EN/products/usb-environmental-sensors/yocto-light-v3 was used to measure the ambient illumination. Also, intensity-controllable light bulbs were used to achieve different light conditions, 50lx, 200lx, 400lx, 800lx, 1600lx and 3200lx. The digital camera was a Point Grey Flea3 camera (mode: FL3-U3-13E4C-C), which was equipped with a CMOS sensor and a programmable API interface. The allowed shutter speed and voltage gain ranges were 0.016ms-24.973ms and 0dB-24.014dB respectively. These permissible ranges were uniformly sampled into 8 distinct values in each dimension. The sample from a range was set as , where , leading to candidate settings for the shutter/gain parameters, under which the corresponding images were acquired. The aperture was fixed at 4, and the red and blue white-balancing channels were set to 500 and 800 respectively. All other parameters were kept at default values.
Iii-B Evaluation Setup
, the Regions with Convolutional Neural Networks (R-CNN), and the Spatial Pyramid Pooling in Deep Convolutional Networks (SPP-net) 
. The original implementations were used (except for BoW), and no optimization or transfer learning techniques were applied.
For the DPM, the Release 5 version, as published at https://people.eecs.berkeley.edu/~rbg/latent/, was adopted. There were twenty class-specific detectors trained on the PASCAL VOC 2007 dataset , and only five of them were used for the purpose of this evaluation. The outputs of each detector were combined using non-maximum suppression with a 0.5 overlap threshold. For the BoW, we replicated the idea in  with our own implementation. To make it consistent with the other algorithms, it was trained only on the PASCAL VOC 2007 dataset. Local features were sampled densely over the images and represented by SIFT  and HoG 
descriptors. We used a visual book size of 1000 and a spatial pyramid with 3 levels using 1x1, 2x2, and 4x4. For the classifiers, we used linear SVMs with a chi-squared kernel. The retraining process took two iterations. For the R-CNN and SPP-net, the neural networks were pre-trained on ImageNet and fine-tuned on the PASCAL VOC 2007 dataset. Twenty detectors were trained for the objects in the PASCAL dataset, and only five of them were used for evaluating. The outputs of each detector were combined via non-maximum suppression with an overlap threshold of 0.5.
The output, given an input image, of each algorithm was required to be a list of predicted object instances, each represented by a bounding box, a level and a confidence score. A predicted instance is considered true if the label is correct and the bounding box overlaps no less than 50% with the ground-truth bounding box, otherwise false.
Following the methodology by Andreopoulos & Tsotsos , the evaluation procedures include:
Run the object detection algorithms on all the images that correspond to each combination;
Sort the outputs by their confidence scores and then evaluate them, using the aforementioned rule;
Compute the precision-recall curve from the above results;
Compute the average precision (AP) by sampling the precision-recall curve.
The final results are represented by performance tables. A performance table is a 8x8 matrix , where is the AP of an algorithm on all the images that correspond to sample of shutter speed and sample of voltage gain, for a illumination. The range of AP is . Larger APs are represented in black color, and smaller are in white color.
Iii-C Results and Discussion
Figure 3 - 5 shows the performance tables of object detection algorithms under three light conditions. The most obvious observation is that all algorithms only work with a subset of the pairs, for a specific illumination. Another observation is that the algorithms prefer faster shutter speed and smaller voltage gain when the scene is bright, and slower shutter speed and larger voltage gain when the scene is dark. However, algorithms demonstrate different sensitivity to changes in shutter speed and voltage gain. The DPM accepts wider range of values in the shutter/gain parameter space due to the relative illumination robustness of the underlying HOG features, while the BoW, R-CNN and SPP-net work with narrower range of values. Also, the best-performing camera parameters, for each illumination, vary among algorithms.
By aggregating the performance tables in Figure 3 - 5, the results of object detection algorithms with respect to each illumination, shutter speed and voltage gain are shown in Figure 6. Results are represented by mean average precision (mAP) , which is the mean of a series of AP.
For ambient illumination, the common trend is that the performance increases, reaches the peak and then decreases as the ambient illumination goes from low to high. Note that for low illumination conditions, the DPM algorithm significantly outperforms the others. Similar pattern is also observed for shutter speed. For voltage gain, constant performance loss has been found at DPM as the voltage gain increase. One possible reason is that voltage gain introduces noises , which affects the results. Another possibility is that these performance transients could be due to non-uniform sample representation at the given camera parameters in the original training set, and thus our method is able to uncover statistical irregularities in the training ensemble.
Iv Active Control of Camera Parameters
As discussed in Section III, the camera’s intrinsic parameters have a significant impact on the performance of object detection algorithms, and the optimal shutter speed and voltage gain configurations are algorithm and ambient illumination-specific. In this section, we propose a novel active control of camera parameters method based on the evaluation results. The overall framework is shown in Figure 7.
There are mainly five components in this proposed framework: 1) create a dataset of images, by sampling the ambient illumination and camera parameters of interest; 2) evaluate the performance of vision algorithms on the created dataset, and build performance tables; 3) use light sensor to measure the ambient illumination; 4) select the optimal combination based on the performance tables, for a given illumination reading; 5) run the selected algorithm on the image taken with the selected camera parameters.
The motivation of this active control of camera parameters method is mainly from the analysis of the performance behaviors of object detection algorithms. It is observed that algorithms behave differently with respect to variant illumination, shutter speed and voltage gain. We propose to systematically analyze and encode these behaviors, and to utilize these results to improve the stability and robustness of a vision system.
Despite the simplicity of the idea, there are also challenging problems to solve. The first one is the reliability of the noisy performance tables, and the second one is that there may exist multiple optimal choices.
Figure 8 demonstrates the original performance table of the DPM on images taken with various camera configurations, for illumination 800lx. In this case, the optimal pairs are , , , , , , , , , , , and , which all yield the best result
. In such situation, it is unclear which one should be selected. However, it can be found that the majority of the optimal choices are in the top-left quarter of the performance table and only a few outliers are beyond this area.
One possible reason for the outliers is that the distribution of camera sensor parameters in the training dataset, for the vision algorithms, is biased. The trained object detectors are fitted to specific camera parameters combination, for various light conditions. It may be that the unevenness of the results is due to non-uniform training sample distributions with respect to camera parameters and lighting conditions. This may highlight a major difficulty with large large image datasets if their creators do not pay careful attention to the statistical characteristics of the training population with respect to its key parameters.
To solve the aforementioned issues, Gaussian smoothing is applied to the original performance tables.
The reason for smoothing is to remove outliers and reduce the possibility of multiple-maxima. The Guassian filter is used, due to its simplicity to trade off between each individual value and the local averages via the parameter. In our implementation, the kernel size is 3 x 3 and the
value is configured to be 0.5, 1 or 2. For the values at boundaries, there is not enough data to do a full smoothing operation. In such cases, we crop the Gaussian filter accordingly (zero-padding could be an alternative for the border effects). Note that the Gaussian smoothing could bring in the following side-effect on the performance tables: a low value which indicates a bad detection rate can have a higher value due to high values around it; then, this low detection value’s settings would be chosen. Increase data samples or decrease the value ofwould help avoid this situation.
For the purpose of this work, the values of ambient illumination, shutter speed and voltage gain are sampled at distinct values. However, the actual readings of light sensor and the camera’s intrinsic parameters are continuous. To make the proposed system accept continuous illumination measurements and output continuous camera parameters, linear interpolation is applied.
V Experimental Results
To evaluate how the proposed active control of camera parameters method work, empirical experiment was conducted. Our method was compared with the conventional approach, camera’s built-in auto-exposure algorithm. We measured the performance of object detection algorithms using these two different approaches.
V-a Experimental Setup
This experiment was conducted on the dataset introduced in Section III-A. We split this dataset into two groups, training and testing. The training set was used for the active control of camera parameters system to compute the performance tables, and the test set was used for testing. This process was repeated by using different combination of the training and testing sets, and averaging was applied to the results. The procedures were as follows:
Pre-compute the performance tables on the training set;
For each object and for each illumination, run our proposed system as described in Section IV to get the optimal pair;
Run each object detection algorithm on the image that corresponds to the proposed camera parameters, and on the image that is taken with auto-exposure;
Evaluate and compare the results (A predicted bounding box is considered correct if it overlaps no less than 50% with the ground-truth bounding box, otherwise false).
The auto-exposure method is the built-in exposure algorithm in Point Grey Flea3 cameras. This algorithm is controlled by two parameters, the optimal brightness level and the region-of-interest (ROI). It determines a proper exposure based the mean brightness over the ROI. Both parameters were kept at their default values during the experiment.
For the active control method, there are also two parameters, the kernel size and the Gaussian . We were using a 3x3 kernel, considering the performance table is at 8x8 resolution. We used different values, i.e. 0.5, 1 and 2.
V-B Results and Discussions
Table I summarizes the performance of each object detection algorithm with auto-exposure and active control. Compared with auto-exposure, active control results in significantly better performance for three object detection algorithms, the DPM, Bow and R-CNN. See Figure 9 for the comparison of these two approaches by relative increments.
The performance boost is more obvious on local feature-based algorithms (DPM and Bow), than on convolutional neural network-based algorithms (R-CNN and SPP-net). One possible reason is that local features, such as SIFT and HoG, are sensitive to the camera parameters, as pointed out in . On the contrary, convolutional neural networks often contain a few pooling layers, which mitigate the effects of camera exposure.
Also, the results are dependent on the parameter of the Gaussian smoothing operator, as noises in the performance tables of different algorithms vary. No single value, that constantly outperforms the others, has been found. However, gives an overall decent results in our experiment.
In this paper, a novel active control of camera parameters method is proposed in order to improve the robustness and adaptivity of vision guided robotic systems, with an emphasis on object detection algorithms.
We first introduced a novel image dataset which incorporates ambient illumination and camera’s intrinsic parameters. Then, we presented our quantitative evaluation on the performance of object detection algorithms with respect to light conditions and sensor configurations. Our results reveal the sensor bias of vision algorithms, which necessitates a finer control of camera parameters for these algorithms to work in real-world applications.
Further, we proposed the active control of camera parameters method, from the perspective of algorithm performance. This approach was empirically evaluated and compared with conventional auto-exposure method. Experimental results demonstrated the effectiveness of our proposed method in improving the robustness of vision algorithms to illumination variation.
More importantly, our work act as a proof of principle, on how to achieve illumination robustness and camera parameters optimization. The methodology can be summarized as: 1) create image dataset by sampling the camera parameter space; 2) benchmark vision algorithms of interest on the dataset and compute performance statistics; 3) optimize camera parameters by the statistics.
Vii Future Work
In the experiments, the performance tables are obtained with five objects in a simple scene. In the future, the tests should be extended to include multiple objects in different backgrounds and from different viewpoints. Also, an experiment with more than one camera would demonstrate the robustness of the method.
- Adini et al.  Yael Adini, Yael Moses, and Shimon Ullman. Face recognition: The problem of compensating for changes in illumination direction. IEEE Transactions on pattern analysis and machine intelligence, 19(7):721–732, 1997.
Aloimonos et al. 
John Aloimonos, Isaac Weiss, and Amit Bandyopadhyay.
International Journal of Computer Vision, 1(4):333–356, 1988.
- Andreopoulos and Tsotsos  Alexander Andreopoulos and John K Tsotsos. On sensor bias in experimental methods for comparing interest-point, saliency, and recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1):110–126, 2012.
- Bajcsy  Ruzena Bajcsy. Active perception vs. passive perception. In IEEE Workshop on Computer Vision Representation and Control, Bellaire, Michigan, 1985.
- Belhumeur and Kriegman  Peter N Belhumeur and David J Kriegman. What is the set of images of an object under all possible illumination conditions? International Journal of Computer Vision, 28(3):245–260, 1998.
-  Stanely Bileschi. CBCL streetscenes challenge framework (2007). http://cbcl.mit.edu/software-datasets/streetscenes/.
- Brunnström et al.  Kjell Brunnström, Jan-Olof Eklundh, and Tomas Uhlin. Active fixation for scene exploration. International Journal of Computer Vision, 17(2):137–162, 1996.
Dalal and Triggs 
Navneet Dalal and Bill Triggs.
Histograms of oriented gradients for human detection.
IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886–893. IEEE, 2005.
- Everingham et al.  Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
- Felzenszwalb et al.  Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010.
- Girshick et al.  Ross Girshick, Jeff Donahue, Trevor Darrell, and Jagannath Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages 580–587. IEEE, 2014.
- Han et al.  Hu Han, Shiguang Shan, Xilin Chen, and Wen Gao. A comparative study on illumination preprocessing in face recognition. Pattern Recognition, 46(6):1691–1699, 2013.
- He et al.  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904–1916, 2015.
Healey and Kondepudy 
Glenn E Healey and Raghava Kondepudy.
Radiometric CCD camera calibration and noise estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(3):267–276, 1994.
- Johnson  Bruce K Johnson. Photographic exposure control system and method, January 3 1984. US Patent 4,423,936.
- Lin et al.  Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014.
- Linderoth et al.  Magnus Linderoth, Anders Robertsson, and Rolf Johansson. Color-based detection robust to varying illumination spectrum. In IEEE Workshop on Robot Vision, pages 120–125. IEEE, 2013.
- Llano et al.  Eduardo Garea Llano, Heydi Mendez Vazquez, Josef Kittler, and Kieron Messer. An illumination insensitive representation for face verification in the frequency domain. In International Conference on Pattern Recognition, volume 1, pages 215–218. IEEE, 2006.
- Lowe  David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
- Lu et al.  Huimin Lu, Hui Zhang, Shaowu Yang, and Zhiqiang Zheng. Camera parameters auto-adjusting technique for robust robot vision. In IEEE International Conference on Robotics and Automation, pages 1518–1523. IEEE, 2010.
- Maier et al.  Werner Maier, Michael Eschey, and Eckehard Steinbach. Image-based object detection under varying illumination in environments with specular surfaces. In IEEE International Conference on Image Processing, pages 1389–1392. IEEE, 2011.
- Osadchy and Keren  Margarita Osadchy and Daniel Keren. Image detection under varying illumination and pose. In IEEE Conference on Computer Vision, volume 2, pages 668–673. IEEE, 2001.
- Osadchy and Keren  Margarita Osadchy and Daniel Keren. Efficient detection under varying illumination conditions and image plane rotations. Computer Vision and Image Understanding, 93(3):245–259, 2004.
- Russakovsky et al.  Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- Salton and McGill  Gerard Salton and Michael J McGill. Introduction to modern information retrieval. 1986.
- Sampat et al.  Nitin Sampat, Shyam Venkataraman, Thomas Yeh, and Robert L Kremens. System implications of implementing auto-exposure on consumer digital cameras. In Electronic Imaging’99, pages 100–107. International Society for Optics and Photonics, 1999.
- Shim et al.  Inwook Shim, Joon-Young Lee, and In So Kweon. Auto-adjusting camera exposure for outdoor robotics using gradient information. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1011–1017. IEEE, 2014.
- Tanaka et al.  Tatsuya Tanaka, Atsushi Shimada, Daisaku Arita, and Rin-ichiro Taniguchi. Object detection under varying illumination based on adaptive background modeling considering spatial locality. In Pacific-Rim Symposium on Image and Video Technology, pages 645–656. Springer, 2009.
Tang et al. 
Yichuan Tang, Ruslan Salakhutdinov, and Geoffrey Hinton.
Deep lambertian networks.
International Conference on Machine Learning, 2012.
- Tenenbaum  Jay Martin Tenenbaum. Accommodation in computer vision. Technical report, DTIC Document, 1970.
- Tsotsos  John K Tsotsos. On the relative complexity of active vs. passive visual search. International Journal of Computer Vision, 7(2):127–141, 1992.
- Uijlings et al.  Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, 2013.
- Wei and Lai  Shou-Der Wei and Shang-Hong Lai. Robust face recognition under lighting variations. In International Conference on Pattern Recognition, volume 1, pages 354–357. IEEE, 2004.
- Westerhoff et al.  Jens Westerhoff, Mirko Meuter, and Anton Kummert. A generic parameter optimization workflow for camera control algorithms. In IEEE International Conference on Intelligent Transportation Systems, pages 944–949. IEEE, 2015.