A Robust Real-Time Computing-based Environment Sensing System for Intelligent Vehicle

01/27/2020 ∙ by Qiwei Xie, et al. ∙ 29

For intelligent vehicles, sensing the 3D environment is the first but crucial step. In this paper, we build a real-time advanced driver assistance system based on a low-power mobile platform. The system is a real-time multi-scheme integrated innovation system, which combines stereo matching algorithm with machine learning based obstacle detection approach and takes advantage of the distributed computing technology of a mobile platform with GPU and CPUs. First of all, a multi-scale fast MPV (Multi-Path-Viterbi) stereo matching algorithm is proposed, which can generate robust and accurate disparity map. Then a machine learning, which is based on fusion technology of monocular and binocular, is applied to detect the obstacles. We also advance an automatic fast calibration mechanism based on Zhang's calibration method. Finally, the distributed computing and reasonable data flow programming are applied to ensure the operational efficiency of the system. The experimental results show that the system can achieve robust and accurate real-time environment perception for intelligent vehicles, which can be directly used in the commercial real-time intelligent driving applications.



There are no comments yet.


page 5

page 9

page 19

page 21

page 23

page 27

page 28

page 29

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For autonomous driving or advanced driver assistance systems (ADAS), Object detection based on 3D environment perception is one of the key components. There are usually four major types of 3D environment perception approaches in the literature, including stereo-vision-based, LiDAR-based, radar-based, and multi-sensor-based hybrid approaches [1, 2, 3, 4, 5].

LiDAR and radar have been applied extensively to detect obstacles in intelligent vehicles. They use laser light or signal of radio waves to detect the distance to objects, respectively [6, 7, 8, 9, 10]. Compared to LIDAR and radar, binocular stereo vision does not involve motion artifacts and can generate much denser depth information. Besides, neither LiDAR nor radar can detect objects smaller than a certain size. For example, for a typical LIDAR-Velodyne HDL-E, we can easily calculate that the minimum detectable height of the object is m where the LIDAR is mounted m from the ground. In other words, this system may miss objects below m. However, for vehicles traveling on highways, objects of the above dimension may become a potential threat to safe driving. The binocular stereo matching algorithm can generate dense enough disparity information for detecting this kind of small objects due to its much higher image resolution compared to LIDAR and radar.

This paper presents a real-time ADAS based on the binocular stereo vision running on a low-power mobile platform. A fast calibration system is proposed to achieve calibration of the binocular camera. A multi-scale fast MPV (multi-path-viterbi) algorithm is also proposed to adapt to limited resources of the mobile platform. Real-time detection is achieved by using a simple and effective recognition scheme. In addition, distributed computing technology of processing units is advanced to speed up computing and enhance the robustness of the system. The extensive experiments verify the feasibility of the auxiliary driving system based on binocular and demonstrate the applicability from the perspective of hardware device.

1.1 Related Works

We conducted the literature review in four aspects, including fast calibration method, perception system, road detection, and distributed computing technology.

1.1.1 Fast calibration method.

Binocular camera calibration is the process of estimating the intrinsic parameters of two monocular cameras and the extrinsic parameters of the binocular cameras. In traditional methods, calibration algorithms are used to construct the geometric model of optical lens

[11, 12, 13]

. Recently, calibration methods based on neural network

[14] have been proposed. Existing study [15] shows that Zhang? s technique not only avoids complex operation of traditional calibration methods, but also can ensure higher accuracy and stability. However, Zhang? s algorithm requires multiple images of a planar calibration grid. Based on Zhang? s algorithm, we propose a new automatic fast calibration system.

1.1.2 Perception system.

The main hardware of our perception system is a stereo vision sensor consisting of binocular camera and IPS chip. We adopt the multi-scale fast MPV stereo matching algorithm to implement the stereo matching, which is based on a hierarchical bi-direction Viterbi process constrained by Total Variation (TV) [2, 16]. We suggest this strategies with the specific mathematical proof to exhibit that the proposed fast MPV method is suitable for running at our mobile platform. We also improve the original fast MPV algorithm to a multi-scale fast MPV algorithm by the way of virtual nodes.

1.1.3 Road detection.

There are several road detection approaches for intelligent vehicles in the literature. They can be divided into three categories, including monocular-based method [17], homograph-transformation-based method [18]

and the stereo-matching-based method. The stereo-matching-based method can be further classified into two sub-categories based on the space where the processing is performed, including v-disparity space

[19] and and Euclidean space [20]. The former method is faster than the latter but requires higher road flatness [21].

This paper proposes a fully unsupervised way to detect the object by adding an extra Viterbi process on v-disparity space. We use a fast MPV stereo matching algorithm to generate robust and accurate disparity map compared to other state-of-the-art real-time stereo matching algorithms. This multi-scale fast MPV algorithm includes two parts: estimating disparity and estimate epipolar line distortion. Firstly, we conduct Viterbi process at 4 bi-directional paths [22, 23] to estimate disparity. Then, based on the results of Viterbi, a convex optimization equation is derived to estimate epipolar line distortion. Finally, two parts are combined into an online framework to do stereo matching.

Based on the result of binocular stereo matching, we also propose an image fusion technique to combine two images of binocular disparity map and monocular gray map. This fusion method will provide the region of interest (ROI) in images for our system. Finally, we propose a target recognition method based on machine learning to improve the robustness and accuracy for detecting distant objects.

1.1.4 Processing unit.

In this paper, we propose a hybrid programming method based on the combination of CPU and GPU that is different from other approaches in the literature [24]. In our approach the GPU is used to accelerate the CPU’s calculations. We take advantage of the distributed computing technology by arranging GPU to run the binocular stereo matching algorithm and CPU to achieve target recognition by using a cascade AdaBoost machine learning method [25]. The AdaBoost algorithm uses a non-random combination of several weak classifiers and weights to each weak classifier to build a strong classifier with superior performance. In this situation, the system accuracy can be improved.

CUDA is also used in our system, which is a parallel programming model and software environment launched by NVIDIA [26]. It uses a high-level language as a programming language and provides a large number of high-performance computing instructions. CUDA can fully utilize the GPU’s large-scale parallel computing capability [27].

1.2 Our Contributions

The contributions of this paper are as follows.

  • An advanced driver assistance system on mobile platforms for intelligent vehicles is proposed in this paper, which consists of the hardware design (system design and fast calibration device) and software design (algorithms and data management). The system can not only be implemented in real-time, but also be robust under different weather and lighting conditions.

  • A fast binocular calibration method has been proposed for our system and special fast calibration apparatus is designed for this purpose.

  • A multi-scale fast MPV algorithm is proposed in order to run a real-time system on mobile platforms. This method can provide dense disparity information, road detection and segmentation, and obstacle detection.

  • A machine learning classifier is trained in our object recognition system, in which the corresponding ROI sets are labeled artificially. Our classifier is sensitive to distant objects, thus improving their system detection accuracy.

  • Distributed computing and efficient data management provide an important foundation for the real-time operation of the system.

1.3 Organization of The Paper

The rest of the paper is organized as follows. The overview of the proposed system is introduced in Section 2. The principle of the proposed system is elaborated in Section 3. The experiment results and analysis are presented in Section 4. Conclusions are drawn in Section 5.

2 Overview of The Proposed System

In this section, we briefly introduce the whole system design, multi-scale fast MPV algorithm for stereo matching, the target recognition method based on monocular and binocular fusion, fast calibration method and the device, the hardware structure of the mobile platform, and the data management.

2.1 System Design

Our system is designed according to the following procedures, as shown in Figure LABEL:working_flow. First, two images (left and right images) are inputted into the system. The stereo matching algorithm is used to calculate the disparity map, which contains a structural similarity (SSIM) algorithm and a bi-directional Viterbi algorithm. Based on the disparity map, we calculate the histogram of the disparity in the horizontal direction and vertical direction to obtain the road model. Then based on the road model and disparity map, the obstacle ROI is computed. By combining the obstacle ROI and left image, the fusion detection of monocular and binocular can be achieved. Finally, we collect all the ROI sets and label them artificially to train an AdaBoost classifier. The AdaBoost classifier can identify whether there is an obstacle in the ROI. If an obstacle can be detected by the classifier, a red rectangle is marked on the obstacle in the original left image.

Figure 1: System working flow.

2.2 Stereo Matching

Our proposed disparity estimation method has the following characteristics.

  • A bi-directional Viterbi algorithm for total 4 paths is used to decode the matching cost space and a hierarchical strategy is proposed to merge the 4 paths to further decrease the decoding error in [28].

  • We apply the TV constraint [29] into the Viterbi path in order to approximately model 3D plane at different orientations to achieve similar effects to Total Generalized Variation (TGV) [30] and Slanted-plane models [31].

  • We apply a fast calculation technique to find the best Viterbi path, and use a multi-scale method to greatly increase the computational speed with a small loss of accuracy.

  • We use SSIM to measure the pixel difference between left and right images at epiploic lines.

2.3 Target Recognition

Based on the result of binocular stereo matching, we propose an image fusion technique to overlay two image products, namely binocular disparity map and monocular gray map [32, 33]. This fusion method can provide the ROI of images for our system.

We also propose a target recognition method using machine learning algorithm to improve the robustness and accuracy in detecting distant objects, which has the following characteristics.

  • Based on the result of stereo matching, we locate the ROI of suspected distant objects in the disparity map, and then extract the corresponding ROI in the monocular image. This processing can reduce the dimension of input data and save the time in window transformation so that it ensures real time detection.

  • The distant objects are detected by a cascade classifier that is trained by the AdaBoost algorithm based on Local Binary Pattern (LBP) features. These features can measure and describe image local texture information with low sensitivity to illumination. In addition, the algorithm is not complicated and easy to implement. Experiments show that our method is robust and accurate on processors with relatively low performance.

2.4 Fast Calibration

According to Zhang? s algorithm, we design an automatic fast calibration method and dedicated device for the system, which allows our system to be calibrated without excessive manual intervention and to ensure consistent calibration results.

2.5 Hardware Structure of Mobile Platform

The system consists of binocular cameras, ISP (Image Signal Processing) module, CPUs & GPU modules, and power system. The assembly sketch is shown in Figure 2. The baseline of our binocular cameras is mm, the focal length of the lens is mm and the resolution of the sensors is pixels. Specific parameters of our system are shown in Table 1 and our development platform is shown in Figure 3.

Figure 2: Assembly sketch. Left: System Appearance. Right: installation diagram. 1. OBD power. 2. Our system.
Items Parameters
Effective measurement distance m
Horizontal field of view degree
Dynamic range DB
Resolution ratio
Baseline distance mm
Focal length mm
Data depth bits
Service voltage V
Table 1: System parameters.
Figure 3: Development platform.

In order to improve the efficiency of the operation, parallel computing is adopted in our system. This system is built with an NVIDIA Kepler ”GK20a” GPU with 192 SM3.2 CUDA cores (upto 326 GFLOPS) and an NVIDIA ”4-Plus-1” 2.32GHz ARM quad-core Cortex-A15 CPU with Cortex-A15 battery-saving shadow-core. Our system structure is shown in Figure 4, which consists of two cameras and a data processing and control unit. It can provide hundreds of millions of transistors to the processor by Moore law, but most of these transistors are used to make caches in CPU designed to run a single thread program. This can control the processor power consumption within a reasonable range. However, it hinders the further improvement of the performance.

Figure 4: System structure.

Different from CPU, GPU has a large number of execution and operation units, and only a small amount of data is used for data caching and instruction flow control. So, our combination of CPU and GPU can provide better performance.

2.6 Data Management

As a real-time system, we use distributed computing and reference to robust software partitioning techniques [34] to speed up the operations. In our system, the GPU is only used to run the fast stereo matching algorithm. Here, we refer to [35] programming with CUDA, which implements a higher abstraction model, while the other four CPUs execute instruction scheduling, road detection, target recognition, and other computing tasks, respectively. We refer to [36] using parallel computing methods to assign tasks to the CPU. The main CPU executes instruction scheduling, while performing road detection and target recognition on other three CPUs at same time, as shown in Figure 5.

Figure 5: The logical architecture of CPUs and GPU.

3 The Principles of The Proposed System

In this section, we present the detailed techniques of our proposed system in eight subsections. The fast-automatic stereo calibration system is first introduced in subsection 3.1. Then, the principle of the stereo matching algorithm, the fast calculation method for finding the best Viterbi path, and the multi-scale image matching approach are presented in subsection 3.2, 3.3, and 3.4, respectively. Next, subsection 3.5, 3.6, and 3.7 respectively describe the road and obstacle detection method, the target recognition, and the distributed computing technology. Finally, the data flow of our system is presented in subsection 3.8.

3.1 Fast Automatic Stereo Calibration System

Stereo calibration provides the basis for matching binocular cameras [37]. Camera calibration is a prerequisite for the binocular stereo vision system, and the accuracy of the calibration parameters plays a crucial role in subsequent processing in the system [38]. In this paper, our system is calibrated based on Zhang’s method [13].

Binocular stereo vision works as follows: it uses two imaging devices to acquire two images of the object being measured from different locations, and then calculates the camera? s internal and external parameters. The distance between the object and the imaging devices can be obtained by calculating the positional deviation between the corresponding pixels of the two images (referred to as “disparity”). Thereby, three-dimensional information of an object in the camera coordinate system can be acquired. A binocular stereoscopic vision imaging apparatus typically consists of two identical cameras placed side-by-side and spaced apart from one another (called ”baseline distance”), commonly referred to as a binocular stereo camera (short as “binocular camera” in this paper).

In order to accurately obtain the three-dimensional information of the object to be measured in the camera coordinate system, the binocular camera is usually calibrated according to the ”Zhang’s camera calibration method” using a black and white grid calibration board, thereby obtaining a series of calibration parameters. These parameters include internal parameter matrices (such as distortion parameter matrix), external parameter matrices, and so on.

The calibration process is as follows: image pairs are captured by cameras and , respectively. After capturing each image pair, the calibration board is regularly adjusted relative to the position of the binocular camera. In this situation, the binocular camera will acquire dozens of image pairs or even dozens of calibration boards. Each set of image pairs should also meet certain conditions. For example, all the grid angles on the calibration board are located in the field of view of camera 1 and camera 2 and can be accurately extracted by the algorithm. Besides, the calibration board is always flat and clean. The calibration process is shown in Figure 6.

Figure 6: The calibration process of the binocular camera.

Since the calibration process of the binocular camera is complicated and the steps are cumbersome, and it is usually operated manually, it becomes the biggest bottleneck that affects the production efficiency in the mass production process of the binocular camera. The proposed fast automatic binocular camera calibration device (hereinafter referred to as “calibration device”) solves the efficiency problem of the binocular camera calibration process and reduces the calibration time of a single binocular camera to less than minute.

The calibration workbench is shown in Figure 7. The checkerboard is designed as the calibration board. Through stereo calibration, we obtain the intrinsic and extrinsic parameters and coefficient of the distortion model. We can then calculate a remapping table of the images that indicate the mapping between the pixels in the corrected image and the original image.

Figure 7: Workbench of binocular camera calibration.

Image correction includes both the monocular distortion removing and binocular polar alignment. As shown in Figure 8, the top images (a) are origin images, in which the binocular polar is not aligned. We can see that the same corner points in the circles are not on the same line between the left and right images. The bottom images (b) are corrected images, in which the binocular polar has been aligned. We can see that the same corner points in circles are at the same line between the left and right images.

Figure 8: Stereo calibration.

The undistorted image obtained by stereo calibration provides a good foundation for subsequent processing. Unless otherwise specified, the following images in this paper are all corrected images.

3.2 Stereo Matching

In this paper, after the calibration, all images conform to the pinhole image model and the coordinate origin is at the center of the image. We define that and are the rectified left and right images, and and are the image patches located at and , respectively. , and

denote the mean, variance and covariance. Approximately,

and can be viewed as estimation of the luminance and contrast, and measures the tendency. This paper follows [39], and the luminance, contrast and structure similarity measures are given as follows:


where , and are constants given by , and , respectively. The is the dynamic range of the pixel values, where and are two scalar constants. The cost function is defined as follow:


where , and are parameters to define the relative importance of the corresponding three components. the is disparity and is pixel coordinates in image. We adopt a fast calculation method for , as shown in Figure 9.

Figure 9: Fast calculation method of SSIM.

The MPV algorithm includes two parts. The first part estimates disparity by a Viterbi process and the second part is the path-merging strategy, which uses bi-directions (horizontal, vertical, and two diagonal) Viterbi paths on the matching space to provide good coverage of the image.

We introduce total variation (TV) constraint in Viterbi path to constrain the disparity variation. Because TV constraint is applied to all the paths independently, planes at different orientations can be approximately modeled by at least one path. Therefore, it can model the 3D objects with one or multiple slanted planes. TV constraint is useful to smooth some non-textured areas, such as roads or car bodies which are common in driving scenes but hard for stereo matching algorithms. Besides that, we also use the intensity gradient information to control the regularization level of TV constraint and make edges to be sharper.

Figure 10: Total Variation (TV) constraint.

Figure 10 shows the principle of the TV constraint in our method. Both and are the focus of the left and right camera and is the baseline. The is an object point at a plane and the coordinate of satisfies the following formula:


Denote as the disparity of , then


According to above hypothesis, we can obtain the following formula:


For example, for a plane perpendicular to the optical axis , then and . In addition, we take the ground plane as an example , then . Since ADAS focuses on the road surface and the corresponding obstacles, it is reasonable to add TV constraint, which is expressed by defining the energy on the disparity map as follows:


where is the TV constraint modified by the gradient of image . It penalizes all the disparity changes between and , where has disparity and belongs to the neighborhood of . The is the tradeoff parameters to balance the TV term and the fit term.

The stereo matching solution can be formulated as finding the disparity map that minimizes the energy function . The Viterbi algorithm can be used to approximate the optimum solution [40]. In this case, the Viterbi trellis represents a graph of disparity states for all pixels. Each node in this trellis represents a disparity assigned to a pixel and each edge represents a possible disparity change between two adjacent pixels in the same Viterbi path, as shown in Figure 11. We define the energy of a node as with pixel and disparity , as following.

Figure 11: Trellis diagram for nodes and edges in a same Viterbi path.

According to Viterbi algorithm:


where, means the connected nodes from to , and means the previous node at the same Viterbi path. In normal Viterbi algorithm, node number of is generally small and the total computational cost for one pixel is , where indicates the number of nodes. In our MPV algorithm, we set the as all the possible Viterbi nodes. This setup can keep edge sharp for outdoor scenes but it increases the computational cost to .

Figure 12: Merging strategy in Viterbi.

In each layer, we apply bi-directional Viterbi algorithm according to Equation 5. Then, we update the Viterbi nodes energy by using optimum energy of the two opposite directions. For horizontal path, we use the minimum function to sharpen edges, and for other paths we use the average function to remove noises. After finishing one layer, the energy of Viterbi nodes of current layer is used as the initial value of the energy of Viterbi nodes at the next layer. Here, several specific strategies can be applied to the path merging for every layer. As shown in Figure 12, for example, we set a twice penalty to the left Viterbi in case of changing from small disparity to big one, which helps to improve the performance at occluded area.

Figure 13: Hierarchical structure for the merging of multiple Viterbi paths.

As shown in Figure 13, we use bi-directional (horizontal, vertical, and two diagonals) Viterbi paths on the matching space to provide good coverage of the image. Horizontal directions have stronger constraints compared to other directions. In our approach, we use the results of horizontal directions as strong posterior information to calculate the optimum paths of other directions. There are hierarchical layers in this paper, and we apply bi-directional Viterbi algorithm in each layer. Then, we update the Viterbi node? s energy by using optimum energy of the two opposite directions. We refine the Multi-Path-Viterbi algorithm as Table 2 and Figure 14 shows the example for search paths.

Algorithm: Multi-path viterbi algorithm
Input: Previous Pixels
Output: new Viterbi energy
Step 1: Compute the left and right bi-directional Viterbi algorithm;
Step 2: Compute the up and down bi-directional Viterbi algorithm;
Step 3: Compute the right down and left up Viterbi algorithm;
Step 4: Compute the left down and right up Viterbi algorithm.
Table 2: Multi-path viterbi algorithm.
Figure 14: Hierarchical structure for the merging of multiple Viterbi paths.

3.3 Fast calculation technique to find best viterbi path

In order to implement the algorithm on real-time system, we propose a new fast calculation technique to find the best Viterbi path. For normal Viterbi algorithm on epipolar line, if searching n branches for m disparity nodes, it needs searching. Now we can search m branches for m disparity nodes for searching. This new fast calculation technique is based on a new simplified searching path, as shown in Figure 15, which is derived from the following theorem.

Figure 15: Simplified technique for the Viterbi searching path.

The original Viterbi searching form (n term for comparison in minimum function) is Eq.(9):


We proposed fast calculation alternative formula as Eq.(10):


If replace as , we obtain Eq.(11):


We can get as Eq.(12):


If we add on both sides of the equation as Eq.(13):


Based on the defintion , we derive the equation as Eq.(14):


Combined Eq.(13) and Eq.(14), we can obtain Eq.(15):


Integrated Eq.(15) with Eq.(11), we get Eq.(16):


Based on the above theorem, we reduce the complexity from to for normal Viterbi algorithm on epipolar.

3.4 Multi-scale image matching approach

We refer to the multi-scale image matching approach as another fast calculation technique. Inspired by [41, 36], two pieces of main recommendations are made: image pyramid and multi-scale disparity transformation. We reduce the image size by implementing down-sampling and the disparity from the previous layer is passed and transformed to next layer. We only calculate the matching value of full range for each pixel at the top of the pyramid, while more pixels at other layers are obtained by disparity transformation. The specific principle is as follows.

Firstly, down-sample the image to a preset scale. In this paper, we set the preset scale to three layers. That is, the image at the top of the pyramid (Layer ) is one-sixteenth of the original image (Layer , the middle layer is Layer ). Therefore, the matching range in the top layer is one quarter of the matching range of the original image.

Next, we split the image at the layer to some blocks which have the same pixels size. We implement the MPV algorithm for each block. As a result, every block has the initial disparity value. We suggest that the size of these block at the layer should not be too small, because there are mainly large feature objects.

Then, we pass these initial disparity values to the next layer. Obviously, not all pixels have initial disparity values at the layer, and only the pixels sampled to the previous layer have initial values. We continue to split images to smaller blocks than the previous layer. We suggest that the closer the image layer is to the original image, the smaller the size of the block, otherwise, more pyramid layers will need to be created. The mode of initial disparity at each small block is assigned as the initial value of the MPV algorithm. It should be noted that these initial values have to be transformed by the sampling rate.

Besides, the searching scope of the MPV algorithm is dynamically adjusted in different layer. Therefore, the searching scope of each layer is inconsistent. For current layer, we add some virtual nodes to support the MPV algorithm.

At last, when disparity value is transformed to the last layer, original images, we split each pixel as a block and initialize the uninitialized pixels by linear interpolation . Assume that the size of the image is

and the maximum searching scope is pixels, the maximum algorithm complexity of the original MPV algorithm is:


For the -scale MPV algorithm, the complexity of the layer is:


The complexity of the layer is:


And the complexity of the layer is:


Therefore, the total maximum algorithm complexity of the -scale MPV algorithm is:


Comparing Eq. (17) and Eq. (21), it is obvious that our improved algorithm is superior to the original one.

Figure 16: Multi-scale technique for the MPV algorithm.

Figure 16 shows the proposed multi-scale technique for the MPV algorithm. The bottom of Figure 16 demonstrates the implementation of the multi-scale technique. The four previous blocks have the same searching scope and different disparities (different colors). Each large block is split into four small blocks on current layer with the different initial disparities, and these small blocks have different searching scope (different colors). As the contrast, the small blocks have the same search scope (same color) in the original algorithm shown in upper right corner. Figure 17 illustrates our implementation platform of above fast calculation technique.

Figure 17: Toward Real-time Disparity Estimation.

3.5 Road and obstacle detection

Using the disparity map of the image, we can calculate the histogram of disparity map in horizontal and vertical directions. Then corresponds to the number of points with same disparity as at the horizontal image line in the disparity map :


where denotes the Kronecker delta. Similarly, for any pixel in , we have


Generally, we can detect the road model in and detect the obstacle in . Radon transform is performed to detect road. Given disparity map , Randon transform to can be defined as:


where denotes Dirac delta function, is the distance from the line to the origin through the normal of the line that intersects the origin, and is the angle between the same normal and x-axis. provides a mapping from to a parameter space spanned by and .

Viterbi searching space is built in the directly to detect the road and curb. We treat every pixel in the as a Viterbi node and the Viterbi process accumulates the value of each node to find a continuous path that has the maximum sum of value at proper straightness constraints. For road detection, the Viterbi equation is:


According to Viterbi algorithm, we have:


where is the parameter to control the straightness of road.

With the road area G and disparity map , we can map every point in the G to space with the camera parameters. Let denotes image coordinate of a pixel in the image and u denotes its disparity value such as that . Assume its coordinate is . Given the focal length and base line length of calibrated stereo vision system, we have according to the geometry of stereo vision.

Suppose the height of the small object is and the equation of the road surface is . Then we can classify a pixel as a part of a small object on road if and only if:


3.6 Target recognition

After road and obstacle being detected in the disparity map, we can obtain the ROI windows [42, 43, 44] of the obstacles. Then, we extract the area where the window located in the left image [45, 46, 47]. In this case, obstacles are provided to the target recognition system.

The task of monocular target recognition is released by cascade classifiers based on the AdaBoost algorithm [48]. In general, most weak classifiers can be used to construct a cascade. The key properties are that the computation time and detection rate can be adjusted [49]. We train each weak classifier by AdaBoost until the detection accuracy meets certain constraints [50]. The process of cascade [51] is shown in Figure 18. The initial classifier eliminates many negative examples with very little processing. Subsequent stages eliminate additional negatives but require additional computation. After several stages, few extreme negative examples remain.

Figure 18: Schematic depiction of a detection cascade.

We train the cascade classifier with the LBP feature [52]. This feature is similar to a texture descriptor for object detection. According to [53], we set a weight of to the central pixel. In the [54], the parameters of LBP operator are and , where is the number of pixels and is the distance to the central pixel. Then the LBP is defined as follow:


where is the value of center pixel, is the value of neighbor, and is the weight for each operation. The can be defined as follows:


Finally, we structure a cascade, shown as follows, of a linear combination with selected weak classifiers.


where, is a weak classifier and is a weight coefficient. Based on above weak classifiers, we can construct a superior performance cascade AdaBoost classifier in real-time [55].

3.7 Distributed computing technology

On our platform, GPU has powerful graphics computing capabilities. The hardware calculation flow of the fast SSIM algorithm is shown in the Figure 17, which can employ the GPU power and ensure that our system is a real-time processing system.

We divide the task of sensing the environment into two parts that are disparity map calculation and target recognition. Distributed computation is implemented in our system to complete parallel computing of two parts at the same time. We calculate the disparity map by GPU and implement the target recognition by CPU. Distributed computation [56] process is shown in Figure 19.

Figure 19: Task segmentation based on distributed computation.

Binocular images are input to GPU where disparity map is calculated. Then, GPU feeds the disparity map and left image into ARM in which target recognition is implemented. At the same time, GPU is computing the next disparity map [57]. In addition to target recognition, ARM carries out system control for the rest of the time [58]. In this way, our system can achieve real-time detection.

In our method, a lot of complicated calculations are done by GPU. Accelerated by GPU, our stereo matching algorithm can be implemented less than ms. GPU acceleration unit is shown in Figure 20. The left flow chart is the process of GPU calculation, and the right frame diagram is the schematic diagram of GPU hardware acceleration.

Figure 20: GPU acceleration flow.

3.8 Data flow in system

The detail of distributed computation can be divided into four steps as following.

  • Two images are stored in shared memory. The CPU sends instructions to GPU to evaluate disparity. The GPU fetches images from shared memory to implement disparity evaluation through internal distributed processing. Then, the disparity map is pushed into shared memory by GPU.

  • The CPU catches the disparity map for road detection. Furthermore, the alternative region is extracted from left image and stored in shared memory.

  • The ROI is assigned to other CPUs, where a machine learning model is carried out to obstacle detection.

As shown in Figure 21. Since our system platform TK has CPUs and GPU, except for stereo matching which is calculated by GPU, other tasks are shared by CPUs. The data flow is shown in Figure 22.

Figure 21: Data flow of distributed computation.

The frequency of GPU is dynamically adjusted, in order to save power cost, we set the frequency of GPU at a fixed value. The Fast-MPV algorithm is implemented under the fixed frequency by GPU.

Figure 22: Data flow in task module.

Shared memory is used between processes and is managed by the system. Threads can use memory address passing to access the same buffer, but we copy a new memory, because of that when more than two threads access the same buffer at the same time, they are controlled by system lock that takes a certain time to apply to the system (more than ms). Assume that the frame rate of image acquisition is fps and the processing speed is fps, the maximum number of memory copies in 1s is shown in Eq. (31):


For a image, the memory copy speed is less than ms, which is determined by testing. Therefore, memory copies cost about less ms and they distributed in CPUs.

4 Experiment and Analysis

In this section, we conduct four types of experiments to evaluate the performance of our system. First, we evaluate the performance of our stereo matching algorithm by using the KITTI dataset. Then the detection accuracy and running time, the entire system test, and the performance of the stereo matching algorithm under different weather and lighting conditions are evaluated by using our own collected real-time driving datasets. At last, the system hardware performance is tested.

4.1 Evaluation of Stereo Matching Algorithm

The performance of our proposed stereo matching algorithm is evaluated by using the KITTI datasets [59]. We use the KITTI training dataset which includes images and use the development kit in the KITTI website to do the evaluation. The MPV algorithm has average error rate and our fast-MPV algorithm has average error rate, which outperforms SGBM with [60] and ELAS with [61]. These results are illustrated in Figure 23. Although the fast-MPV algorithm’s error rate is slightly higher than the original MPV algorithm, it is still significantly better than the SGBM and ELAS. At the same time, the computation cost of the fast algorithm is far less than that of the original algorithm.

Figure 23: Error Rate of stereo matching algorithm.

The result shows that our fast binocular stereo matching algorithm is very reliable. Furthermore, we show the disparity maps in each scale on the image pyramid in Figure 24. It gives a help to explain how our proposed multi-scale fast MPV algorithm works.

Figure 24: Multi-scale disparity map.

4.2 Evaluate of System Detection Accuracy and Running Time

We further evaluate our performance of the whole system on our own collected real-time driving datasets. We focus on the evaluation of the detection accuracy and running time. We apply our Fast-MPV method to the images captured by our vehicle and obtain the disparity map through stereo matching. Then, the road model could be calculated by the disparity map. By comparing the road model and pixels in the disparity map, we can obtain the obstacle regions. In Figure 25, the two upper pictures are binocular original images, the lower left picture is the disparity map, and the lower right picture is the calculated road model.

Figure 25: Top: Left image & Right image; Bottom: Disparity & Road model.

Combined with the road model and disparity map, we can obtain a binary obstacle ROI image. We use it as a mask to segment the left-eye image. As a result, we can get the fusion map. As shown in Figure 26, every ROI region is treated as a screening window to pick up the ROI gray information as an ROI image.

Figure 26: Target recognition flow.

We collect a large number of samples as ROI images and label them by hand. The cascade AdaBoost classifier is trained by different Stages and maxDepth, in which Stages indicate the series of classifiers and maxDepth is the maximum depth of a weak classifier tree. We use the Gentle AdaBoost. The maximum error detection rate for each level of the classifier is . Therefore, we can obtain a trained AdaBoost classifier to detect the ROI image. This classifier can recognize whether there is an obstacle in the ROI image. During the experiments, we have the following findings.

  • As the stages increase, the detection accuracy is not significantly improved, but the detection time is expanded. It means that the number of stages is not the key fact affecting the detection accuracy of our classifier with the LBP characteristics. Besides, excessive stages will extend the detection time.

  • As the maxDepth increases, the accuracy does not increase. In Figure 27, the classifier is overfitted at maxDepth being . At the same time, there is no significant difference in the accuracy of the test between maxDepth and maxDepth . However, for the same stage, the test time is extended with the increase of maxDepth.

  • Both the size and the growth rate of the detection window affect the detection time. Because we aim at obstacles meters away, those obstacles generally range in size from to in the image. In addition, we set the growth rate being , which means that there are cycles of detection from the minimum window to the maximum window. As shown in Figure 28, this parameter setting is designed to ensure real-time requirements and greater accuracy of detection.

  • In order to ensure the detection effect and real-time requirements, we decide to set the stages as and maxDepth as . The accuracy is about and the process time is about ms.

Figure 27: AdaBoost results of maxDepth & stages. Left: Stages-Accuracy; Right: Stages-Time.
Figure 28: AdaBoost results of growth rate & window size. Left: Growth rate-Time; Right: Growth rate-Accuracy.

There is only a small difference in time when detecting different numbers of obstacles in our algorithm, as shown in Figure 29. Since our method is used for autonomous vehicles which takes notice of pedestrians, vehicles and so on, our system focuses on object detection of close obstacles. In addition, we limit the size of the obstacle pixels. Therefore, we can detect obstacles almost at the same time. Our approach is more robust in terms of the time it takes to detect multiple obstacles.

Figure 29: Detection time (There are five comparison groups that have 0, 1, 2, 3 and 4 obstacles, respectively. Each group is contained by 15 images.).
Algorithm Running time(ms)
UV-disparity [62]
Symmetry [63]
Our method (Haar) 18.3
Our method (LBP) 18.2
Table 3: Results of The Running Time.

Table 3 illustrates the comparison results of the running times among our methods and some other algorithms. The running time in our experiment is calculated by detecting the average of images. The selected comparison algorithms are typical obstacle detection algorithms for intelligent vehicles. We can see that though the detection accuracies are close, as shown in Figure 30, our approach has obvious advantage in terms of running time.

Figure 30: Different algorithm results.

The running times of our system with Haar-like features or LBP features are almost the same. However, we found that using the LBP feature detection is more accurate and stable than using the Haar-like feature. Therefore, the LBP feature is more powerful than the Haar feature.

4.3 Evaluate Detection accuracy under different weather and lighting condition

Different weather and light conditions have different effects on the detection accuracy of the system. The reason is that they have heavy effects on the disparity maps. Figure 31 illustrates the detection results (left images) and their corresponding disparity maps (right images). From top to bottom, we show the detection results under different weather conditions, such as sunny, rainy, night, snowy and backlight. The experiment results demonstrate that our system is robust in various scenarios.

Figure 31: The results under different weather and lighting conditions.

Rainy days and nights are two main scenarios that reduce detection accuracy. Due to the poor lighting conditions at night, image quality is expected to decrease, however, the experiment results show that our stereo matching algorithm is still robust enough to detect the objects. Low brightness will definitely have a slight effect on the edge of the object recognition, which may cause the classifier to reduce the positional accuracy. However, our system also shows good performance with fill light. On rainy days, the image is blurred due to the occlusion of the rain. In the heavy rain, the accuracy of our system can be greatly reduced. Snowy test data shows that the test results are second excellent. This is mainly because the LBP features have the characteristics of gray invariance. The result of the backlight test data is due to the unclear edge of the object; however, our stereo matching algorithm provides stable positioning results. The corresponding experiment results are shown in Figure 32 and Table 4, respectively, in which the statistical results of images in each group are demonstrated.

Weather Condition Accuracy Rate
Light Rain
Heavry Rain
Table 4: Result of Accuracy Rate.
Figure 32: The results of the accuracy rate under different weather and lighting conditions.

As a system built on a mobile platform, it can be easily carried in a variety of driving environments to achieve ADAS functions. We show the actual effect of the system when it is installed on a small passenger car. As mentioned above, we install the equipment in the middle of the windshield of this passenger car, where our system is in the red box in Figure 33.

Figure 33: System test. From top to bottom: bus body, car body, green belt, traffic cone, car rear.

In system test, we choose different scenarios to test the obstacle detection function of the system, including the bus body, car body, green belt, traffic cone and car rear. The screen on the right is an external display device, not part of the system itself, just to show our stereo matching effect in the experiment. In the experiment, the driver travels from a distance to the obstacle (from left to right in Figure 33, the test system stably detects the obstacles, and sends the alarms at the preset alarm distance.

The test results show that the system is applicable not only for detecting standard obstacles, such as the rear of the car, car body, etc., but also for detecting non-standard obstacles, such as green belt, traffic cones, etc. The test results of different weather conditions and test scenarios show that our system is fully competent for the functional requirements of automatic driving ADAS in the complex environment. Besides, based on our dense disparity map, we can also calculate dense point cloud maps, as shown in Figure 34. That can be considered as pseudo-LiDAR representations [64].

Figure 34: Pseudo-LiDAR signal vs visual disparity map. Top: the evening scene in the rain. Bottom: Sunny afternoon scenes. Left: grey image on binocular left view. Middle: disparity map. Right: pseudo-LiDAR points.

4.4 Evaluation of System Module Performance

At last, the system hardware performance is tested. The influence of the frame rate of image acquisition on the system is shown in Table 5. With the increase of image acquisition frame rate, the CPU occupancy rate and image processing frame rate increase significantly, but the growth rate of processing frame rate is lower than the acquisition frame rate, and the memory growth is not obvious. According to this experiment, we set the acquisition frame rate to fps, while the processing frame rate is about equal to the same frame rate, and the data flow is most efficient.

Frame Rate(fps)
CPU Occupancy Rate
Memory Usage M M M
Processing Frame Rate
Table 5: The Influence of Frame Rate.

The effect of ambient temperature is shown in Table 6. Our system can operate at ambient temperature of to C and the power of full load is not more than W. In order to protect the chip, while the chip temperature exceeds C, the system will automatically power off.

Ambient Temperature (C)
Chip Temperature (C)
Full Load Power (W)
Standby Power (W)
Table 6: The Effect of ambient temperature.

The running time of main task modules under different GPU frequencies is shown in Table 7. Running time shows that the operation efficiency of different task modules under different GPU frequencies. Since the calibration and obstacle extraction task do not run at GPU, their running time do not change much. With the increase of GPU frequency, the running time of stereo matching task module decreases and processing frame rate increases. At the same GPU frequency, the stereo matching task’s efficiency is also related to image acquisition frame rate. In general, when the frame rate is consistent, the higher the frequency, the shorter the processing time; when the frequency is consistent, the higher the frame rate, the shorter the processing time.

Task Module Running Time(ms)
GPU Frequencies
Task (CPU)
Stereo Matching
Task (GPU)
Obstacle Extraction
Task (CPU)
Table 7: System Performance at Different GPU Frequencies.

We also tested the performance of obstacle extraction under different frame rates by setting the ARM being CPUs and frequency being G. It is shown in Table 8. When the processing frame rate is fps, the mean CPU occupancy rate of the obstacle extraction module is and memory usage is ; while the processing frame rate is fps, the mean CPU occupancy rate and memory usage are and , respectively.

10fps 15fps
CPU occupancy
rate (%)
Memory usage
CPU occupancy
rate (%)
Memory usage
17.73 0.1 13.30 0.1
18.25 0.10 17.13 0.20
16.78 0.10 18.05 0.30
17.65 0.30 19.55 0.10
17.33 0.01 16.28 0.01
Mean Value 17.85 0.12 16.86 0.14
Table 8: Obstacle Extraction Module.

5 Conclusion

A new robust real-time advanced driver assistance system based on mobile platform is proposed in this paper, which can be applied directly to intelligent driving. There are four major innovations in the system. First, a stereo calibration system is built, which can automatically implement fast calibration for binocular camera. Secondly, a multi-scale fast MPV algorithm is proposed. It can provide the dense disparity information in real-time for intelligent vehicles. Thirdly, a superior performance cascade AdaBoost classifier is trained, which can provide target detection and recognition in real-time. Fourthly, the distributed computing method and efficient data management approach is advanced, which further improves the performance of the system. The extensive experiment results show that our system not only improves the recognition rate on benchmark database, but also has the applicability in the field of commercial real-time intelligent driving. Our future work will focus on the accuracy improvement under extreme weather conditions.

6 Acknowledgment

This work was supported by the National Science Foundation of China under Grant 61673381, Multi-year research grant of University of Macau MYRG2017-00218-FST and MYRG2018-00111-FST.


  • [1] Hao Zhu, Ka-Veng Yuen, Lyudmila Mihaylova, and Henry Leung. Overview of environment perception for intelligent vehicles. IEEE Transactions on Intelligent Transportation Systems, 18(10):2584–2601, 2017.
  • [2] Qian Long, Qiwei Xie, Seiichi Mita, Kazuhisa Ishimaru, and Noriaki Shirai. A real-time dense stereo matching method for critical environment sensing in autonomous driving. In 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 853–860. IEEE, 2014.
  • [3] Alberto Y Hata and Denis F Wolf. Feature detection for vehicle localization in urban environments using a multilayer lidar. IEEE Transactions on Intelligent Transportation Systems, 17(2):420–429, 2015.
  • [4] Shichao Xie, Diange Yang, Kun Jiang, and Yuanxin Zhong. Pixels and 3-d points alignment method for the fusion of camera and lidar data. IEEE Transactions on Instrumentation and Measurement, 2018.
  • [5] Hongbo Gao, Bo Cheng, Jianqiang Wang, Keqiang Li, Jianhui Zhao, and Deyi Li. Object classification using cnn-based fusion of vision and lidar in autonomous vehicle environment. IEEE Transactions on Industrial Informatics, 14(9):4224–4231, 2018.
  • [6] Jianfeng Zhao, Bodong Liang, and Qiuxia Chen. The key technology toward the self-driving car. International Journal of Intelligent Unmanned Systems, 6(1):2–20, 2018.
  • [7] Jessica Van Brummelen, Marie O Brien, Dominique Gruyer, and Homayoun Najjaran. Autonomous vehicle perception: The technology of today and tomorrow. Transportation research part C: emerging technologies, 89:384–406, 2018.
  • [8] Xinyu Zhang, Hongbo Gao, Mu Guo, Guopeng Li, Yuchao Liu, and Deyi Li. A study on key technologies of unmanned driving. CAAI Transactions on Intelligence Technology, 1(1):4–13, 2016.
  • [9] Michael Montemerlo, Jan Becker, Suhrid Bhat, Hendrik Dahlkamp, Dmitri Dolgov, Scott Ettinger, Dirk Haehnel, Tim Hilden, Gabe Hoffmann, Burkhard Huhnke, et al. Junior: The stanford entry in the urban challenge. Journal of field Robotics, 25(9):569–597, 2008.
  • [10] Sören Kammel, Julius Ziegler, Benjamin Pitzer, Moritz Werling, Tobias Gindele, Daniel Jagzent, Joachim Schröder, Michael Thuy, Matthias Goebl, Felix von Hundelshausen, et al. Team annieway’s autonomous system for the 2007 darpa urban challenge. Journal of Field Robotics, 25(9):615–639, 2008.
  • [11] Juyang Weng, Paul Cohen, and Marc Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (10):965–980, 1992.
  • [12] Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms.

    International journal of computer vision

    , 47(1-3):7–42, 2002.
  • [13] Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22, 2000.
  • [14] Lyndon N Smith and Melvyn L Smith. Automatic machine vision calibration using statistical and neural network methods. Image and Vision Computing, 23(10):887–899, 2005.
  • [15] Ping Zhao, Yong-kui Li, Li-jun Chen, and Xue-wei Bai. Camera calibration technology based on circular points for binocular stereovision system. In International Workshop on Computer Science for Environmental Engineering and EcoInformatics, pages 356–363. Springer, 2011.
  • [16] Nicolas Dey, Laure Blanc-Feraud, Christophe Zimmer, Pascal Roux, Zvi Kam, Jean-Christophe Olivo-Marin, and Josiane Zerubia. Richardson–lucy algorithm with total variation regularization for 3d confocal microscope deconvolution. Microscopy research and technique, 69(4):260–266, 2006.
  • [17] Hui Kong, Jean-Yves Audibert, and Jean Ponce. General road detection from a single image. IEEE Transactions on Image Processing, 19(8):2211–2220, 2010.
  • [18] Chunzhao Guo, Seiichi Mita, and David McAllester.

    Robust road detection and tracking in challenging scenarios based on markov random fields with unsupervised learning.

    IEEE Transactions on intelligent transportation systems, 13(3):1338–1354, 2012.
  • [19] Zhencheng Hu and Keiichi Uchimura. Uv-disparity: an efficient algorithm for stereovision based scene analysis. In IEEE Proceedings. Intelligent Vehicles Symposium, 2005., pages 48–54. IEEE, 2005.
  • [20] Sergiu Nedevschi, Radu Danescu, Dan Frentiu, Tiberiu Marita, Florin Oniga, Ciprian Pocol, Thorsten Graf, and Rolf Schmidt. High accuracy stereovision approach for obstacle detection on non-planar roads. Proc. IEEE INES, pages 211–216, 2004.
  • [21] Angel D Sappa, Rosa Herrero, Fadi Dornaika, David Gerónimo, and Antonio López. Road approximation in euclidean and v-disparity space: a comparative study. In International Conference on Computer Aided Systems Theory, pages 1105–1112. Springer, 2007.
  • [22] G David Forney. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.
  • [23] Qiwei Xie, Qian Long, and Seiichi Mita. Integration of optical flow and multi-path-viterbi algorithm for stereo vision. International Journal of Wavelets, Multiresolution and Information Processing, 15(03):1750022, 2017.
  • [24] Jason Power, Arkaprava Basu, Junli Gu, Sooraj Puthoor, Bradford M Beckmann, Mark D Hill, Steven K Reinhardt, and David A Wood. Heterogeneous system coherence for integrated cpu-gpu systems. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 457–467. ACM, 2013.
  • [25] François Fleuret.

    Fast binary feature selection with conditional mutual information.

    Journal of Machine learning research, 5(Nov):1531–1555, 2004.
  • [26] Vibhav Vineet and PJ Narayanan. Cuda cuts: Fast graph cuts on the gpu. In

    2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

    , pages 1–8. IEEE, 2008.
  • [27] David Kirk et al. Nvidia cuda software and gpu parallel computing architecture. In ISMM, volume 7, pages 103–104, 2007.
  • [28] Lalit Bahl, John Cocke, Frederick Jelinek, and Josef Raviv. Optimal decoding of linear codes for minimizing symbol error rate (corresp.). IEEE Transactions on information theory, 20(2):284–287, 1974.
  • [29] Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision, 20(1-2):89–97, 2004.
  • [30] Rene Ranftl, Thomas Pock, and Horst Bischof. Minimizing tgv-based variational models with non-convex data terms. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 282–293. Springer, 2013.
  • [31] Stan Birchfield and Carlo Tomasi. Multiway cut for stereo and motion with slanted surfaces. In Proceedings of the seventh IEEE international conference on computer vision, volume 1, pages 489–495. IEEE, 1999.
  • [32] Zheng Liu, Erik Blasch, Zhiyun Xue, Jiying Zhao, Robert Laganiere, and Wei Wu. Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: a comparative study. IEEE transactions on pattern analysis and machine intelligence, 34(1):94–109, 2011.
  • [33] Fernando Garcia, David Martin, Arturo De La Escalera, and Jose Maria Armingol. Sensor fusion methodology for vehicle detection. IEEE Intelligent Transportation Systems Magazine, 9(1):123–133, 2017.
  • [34] Simon A Spacey, Wolfram Wiesemann, Daniel Kuhn, and Wayne Luk. Robust software partitioning with multiple instantiation. INFORMS Journal on Computing, 24(3):500–515, 2012.
  • [35] Marco A Boschetti, Vittorio Maniezzo, and Francesco Strappaveccia. Using gpu computing for solving the two-dimensional guillotine cutting problem. INFORMS Journal on Computing, 28(3):540–552, 2016.
  • [36] Ketan Date and Rakesh Nagi. Level 2 reformulation linearization technique–based parallel algorithms for solving large quadratic assignment problems on graphics processing unit clusters. INFORMS Journal on Computing, 31(4):771–789, 2019.
  • [37] Kurt Konolige. Small vision systems: Hardware and implementation. In Robotics research, pages 203–212. Springer, 1998.
  • [38] Tongtong Li, Changying Liu, Yang Liu, Tianhao Wang, and Dapeng Yang. Binocular stereo vision calibration based on alternate adjustment algorithm. Optik, 173:13–20, 2018.
  • [39] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  • [40] Tran Thai Son and Seiichi Mita. Stereo matching algorithm using a simplified trellis diagram iteratively and bi-directionally. IEICE transactions on information and systems, 89(1):314–325, 2006.
  • [41] David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
  • [42] Jaewoong Choi, Junyoung Lee, Dongwook Kim, Giacomo Soprani, Pietro Cerri, Alberto Broggi, and Kyongsu Yi. Environment-detection-and-mapping algorithm for autonomous driving in rural or off-road environment. IEEE Transactions on Intelligent Transportation Systems, 13(2):974–982, 2012.
  • [43] Claudio Caraffi, Stefano Cattani, and Paolo Grisleri. Off-road path and obstacle detection using decision networks and stereo vision. IEEE Transactions on Intelligent Transportation Systems, 8(4):607–618, 2007.
  • [44] Melih Altun and Mehmet Celenk. Road scene content analysis for driver assistance and autonomous driving. IEEE transactions on intelligent transportation systems, 18(12):3398–3407, 2017.
  • [45] Abdelhamid Mammeri, Tianyu Zuo, and Azzedine Boukerche. Extending the detection range of vision-based vehicular instrumentation. IEEE Transactions on Instrumentation and Measurement, 65(4):856–873, 2016.
  • [46] Jing Yuan, Huan Chen, Fengchi Sun, and Yalou Huang. Multisensor information fusion for people tracking with a mobile robot: A particle filtering approach. IEEE transactions on Instrumentation and Measurement, 64(9):2427–2442, 2015.
  • [47] Romulo Gonçalves Lins, Sidney N Givigi, and Paulo Roberto Gardel Kurka. Vision-based measurement for localization of objects in 3-d for robotic applications. IEEE Transactions on Instrumentation and Measurement, 64(11):2950–2958, 2015.
  • [48] Robert E Schapire. Explaining adaboost. In Empirical inference, pages 37–52. Springer, 2013.
  • [49] Shengcai Liao, Anil K Jain, and Stan Z Li. A fast and accurate unconstrained face detector. IEEE transactions on pattern analysis and machine intelligence, 38(2):211–223, 2015.
  • [50] Xuchun Li, Lei Wang, and Eric Sung. Adaboost with svm-based component classifiers.

    Engineering Applications of Artificial Intelligence

    , 21(5):785–795, 2008.
  • [51] Paul Viola and Michael Jones. Fast and robust classification using asymmetric adaboost and a detector cascade. In Advances in neural information processing systems, pages 1311–1318, 2002.
  • [52] Xiaoyu Wang, Tony X Han, and Shuicheng Yan. An hog-lbp human detector with partial occlusion handling. In 2009 IEEE 12th international conference on computer vision, pages 32–39. IEEE, 2009.
  • [53] Li Liu, Songyang Lao, Paul W Fieguth, Yulan Guo, Xiaogang Wang, and Matti Pietikäinen. Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing, 25(3):1368–1381, 2016.
  • [54] Wilbert G Aguilar and Cecilio Angulo. Robust video stabilization based on motion intention for low-cost micro aerial vehicles. In 2014 IEEE 11th International Multi-Conference on Systems, Signals & Devices (SSD14), pages 1–6. IEEE, 2014.
  • [55] Weiming Hu, Jun Gao, Yanguo Wang, Ou Wu, and Stephen Maybank. Online adaboost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Transactions on Cybernetics, 44(1):66–82, 2013.
  • [56] Douglas Thain, Todd Tannenbaum, and Miron Livny. Distributed computing in practice: the condor experience. Concurrency and computation: practice and experience, 17(2-4):323–356, 2005.
  • [57] Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. imapreduce: A distributed computing framework for iterative computation. Journal of Grid Computing, 10(1):47–68, 2012.
  • [58] Ajay D Kshemkalyani and Mukesh Singhal. Distributed computing: principles, algorithms, and systems. Cambridge University Press, 2011.
  • [59] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE, 2012.
  • [60] Heiko Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence, 30(2):328–341, 2007.
  • [61] Andreas Geiger, Martin Roser, and Raquel Urtasun. Efficient large-scale stereo matching. In Asian conference on computer vision, pages 25–38. Springer, 2010.
  • [62] Yuan Gao, Xiao Ai, John Rarity, and Naim Dahnoun. Obstacle detection with 3d camera using uv-disparity. In International Workshop on Systems, Signal Processing and their Applications, WOSSPA, pages 239–242. IEEE, 2011.
  • [63] Bin Li, Rong-ben Wang, and Ke-you Guo. Study on machine vision based obstacle detection and recognition method for intelligent vehicle. Kung lu chiao tung ko chi= journal of highway and transportation research and development. Vol. 19, no. 4, 2002.
  • [64] Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8445–8453, 2019.