I Introduction
In a complicated environment, all self-driving mobile systems need to equip with the vision sensors. Therefore, if we can implement a SLAM and navigation system purely depends on the vision sensors, then we can save a big amount of cost by excluding the expensive laser and/or LiDAR sensors.
A SLAM system is not just for the mapping and localization purpose, it also needs to support the robust navigation functionality. In a practical application, the issue of can SLAM should be replaced with robust SLAM nowadays. This short paper poses the problem of what are the necessary components for a robust visual SLAM system, and then introduce the A*SLAM system which meets all of these conditions.
![]() |
![]() |
Ii Requisites for Robust Visual SLAM
A robust visual SLAM system needs to satisfy at least the following four conditions.
Ii-a Stereo Vision
A monocular SLAM can only build a map up to scale. A scale map is insufficient for a mobile system to perform meaningful work in the physical world. The motion control is based on the metric information, and the movements are represented by the metric unit. A stereo vision can add metric measurements to the map based on the baseline distance of the binocular imaging.
Another benefit of the stereo vision is the detection of depth value for each pixel by fast image matching along the predefined epipolar line. The depth information can be used for the initialization of the SLAM feature, and can also be used for obstacle detection in a navigation system.

Ii-B Wide Field of View
Wide field of view (FoV) means possible of integrating more spatial information in a SLAM computation. Seeing wider and more can usually improve the accuracy and the robustness of the feature tracking. Furthermore, the SLAM features near outer FoV can be easily converged because these features can utilize big parallax existing at the outer FoV area.
In a navigation scenario, a limited FoV will degrade the path planning performance. Figure 1(a) shows a robot (blue) is blocked by an obstacle (gray). After the robot planned a path (red), however, because the camera FoV (green) is not wide enough to verify the planned path, so the robot cannot convert this path to motion immediately.
Adopting the fisheye camera is considered as the simplest way to provide a wide FoV to a visual SLAM system.
Ii-C Full View
The concept of full view is an extension of wide FoV but it requires using more than one camera. Figure 1(b) shows a full surrounding view configuration using two fisheye cameras. Compared to a partial camera view, the full view provides even more robustness to the SLAM system. In a full view system, the occlusion problem becomes much more trivial than the single camera system.
Beyond the above-mentioned advantages, the main reason for using the full view configuration is the full view can generate a complete map. A complete map means a single scan can gather all 360-degree information into a map, hence, the same area does not need any further scan. As shown in Fig. 2, to map an area between A and B, a partial view camera needs to scan the area twice. After mapping, there exist two independent maps A-B and B-A. When a robot started from A and then made a turn at midpoint C, there is no guarantee the robot can smoothly switch from map A-B to map B-A due to the inevitable mapping error. For practical use, all visual SLAM systems should be comprised of the full view configuration in the future.

![]() |
![]() |
Ii-D Illunimation Invariant Feature
A SLAM map is a static one, it can only include a specific lighting condition when the map is built. By comparison, the environment is a dynamic one, the visual appearance suffers from the frequent change of lightings, weathers, and seasons. Existing popular SLAM methods, like [1] and [2], are all using the pixel value to compose the SLAM features. Accordingly, the performance of tracking the SLAM feature is greatly downgraded when the lighting condition is different from the registration time.
Edge is an illumination invariant feature. Edges are formed by the abrupt color transitions on a surface or the intersection of heterogeneous geometric structures in 3D space. As shown in Fig. 3, the edges can be robustly detected at the same location even under the severe change of the lighting conditions. Dealing edges as a pure geometric substance, without the supporting of the pixel values, is a very challenging problem in the SLAM community. Now we are proud to announce the problem of using the edge as the SLAM feature is solved.
Iii A*slam
A*SLAM system satisfies all previously described four conditions of a robust visual SLAM system. A*SLAM system features combining two sets of fisheye stereo cameras and taking the image edge as the SLAM features.
Among the dual stereo camera sets, one is looking forward and another one is looking backward. Each stereo camera is equipped with a pair of 180-degree fisheye lenses. In this way, the A*SLAM system is able to cover a full environmental view. The fisheye image can also be used to generate a wide-angle depth image. Figure 4 shows an example of using our developed CaliCam® [3] stereo camera to generate a panorama depth image [4].
Iv Conclusion
To the best of our knowledge, A*SLAM is the only system in the world to embrace all the challenging requisites of a robust visual SLAM system. We are now actively looking for all potential domains to apply our robust A*SLAM system for maximizing the SLAM and navigational performance.

References
-
[1]
R. Mur-Artal, J. D. Tardos. “ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras,” T-RO, 2017.
- [2] J. Engel, V. Koltun and D. Cremers, “Direct Sparse Odometry,” PAMI, 2018.
- [3] https://github.com/astar-ai/calicam
- [4] https://github.com/astar-ai/pdi
- [5] https://youtu.be/TNOjwwURBGQ
Comments
There are no comments yet.