I Introduction
Generally, there are three basic phases during Mars exploration missions—entry, descent and landing (EDL) [1], where the landing phase finally determines whether Mars rovers land on Martian surface safely and precisely. Due to the large uncertainties and dispersions derived from Martian environment, existing algorithms used for EDL phase cannot guarantee the precision of the Mars rovers’ landing on the target point. Moreover, after landing, Mars rovers are usually required to move to new target points constantly in order to carry out new exploration tasks. Hence, in future Mars missions, autonomous navigation algorithms are essential for Mars rovers to avoid risky areas (such as craters, high mountains and rocks) and reach target points precisely and efficiently (Fig. 1).
Currently, one of the most significant methods for Mars navigation is visual navigation [2]. Two main methods for Mars visual navigation are blind drive and autonomous navigation with hazard avoidance (AutoNav) [3]. In blind drive, all commands for Mars rovers are determined by engineers from the earth before starting missions. This method promptly reduces the efficiency and flexibility of exploration missions. By contrast, AutoNav can lender Mars rover execute missions unmannedly. Thus, it is more in coincidence with the increasing future demand on Mars rovers’ autonomy and intelligence.
Classical algorithms for AutoNav such as Dijkstra [4, 5], [6, 7] and [8, 9] have been widely researched in the past decades. It is noteworthy that these algorithms have to search the optimal path iteratively on cellular grip maps, which are both time consuming and memory consuming [10]. When dimensions of maps become large and computation resources are limited, these algorithms may fail to offer the optimal navigation policy. To overcome the dimension explosion problem, intelligent algorithms such as neural network [11]
[12] and particle swarm algorithm [13] were extended into planetary navigation problem. However, prior knowledge about the obstacles in maps is prerequisite for these algorithms to work.To provide the optimal navigation policy directly from natural Martian scenes, effective feature representation algorithms are required. That is, these algorithms have to understand deep features of input image such as the shape and location of obstacles firstly and then determine the navigation policy according to these deep features. In recent years,
Deep Convolutional Neural Networks (DCNNs)
have received wide attention in computer vision field for their superior feature representation capability
[14]. Notably, although the training process of DCNNs consumes massive time and computation resource, it is completed offline. When applying DCNNs to represent deep features of images online after training, it costs little time and computation resource. Therefore, DCNNs have been widely applied in varieties of visual tasks such as image classification [15], object detection [16], visual navigation [17] and robotic manipulation [18].Inspired by the stateofart performance of DCNNs in computer vision field, planetary visual navigation algorithms based on deep neural network have been researched. In [19], a 3 dimensional DCNN was designed to create a safety map for autonomous landing zone detection from terrain image. In [20], a DCNN was trained to predict rover’s position from terrain images for Lunar navigation. Though these algorithms are capable of extracting deep features of raw images, they are unable to provide the optimal policy for navigation directly. To solve this probelm, Value Iteration Network (VIN) was firstly proposed in [21] to plan path directly from images and applied to Mars visual navigation problem successfully. Then, in [22], Memory Augmented Control Network was proposed to find the optimal path for rovers in partially observable environment. Both of these networks for visual navigation employed Value Iteration Module. However, it takes massive time to train them.
In this paper, an efficient algorithm to determine the optimal navigation policy directly from original Martian images is investigated. Specifically, a novel DCNN architecture with double branches is designed. It can represent both global and local deep features of input images and then achieve precise navigation efficiently. The main contributions of this paper are summarized as follows:

Emerging deep learning techniques (deep neural networks) are leveraged to deal with Mars visual navigation problem.

The proposed DCNN architecture with double branches and nonrecurrent structure can find the optimal path to target point directly from global Martian environment images and prior knowledge about risky areas in images are not required.

Compared with acknowledged (VIN), the proposed DCNN architecture achieves better performance on Mars visual navigation and the training time is reduced by 45.8%.

The accuracy and efficiency of this novel architecture are demonstrated through experiment results and analysis.
The rest paper is organized as follows. Section II provides preliminaries of this paper. Section III describes the novel DCNN architecture for Mars visual navigation. Experimental results and analysis are illustrated in Section IV, followed by discussion and conclusions in Section V.
Ii Preliminaries
Iia Markov Decision Process
Mars visual navigation can be formulated as a Markov Decision Process (MDP), since the next state of Mars rover can be determined by its current state and action completely. A standard MDP for sequential decision making is composed of action space , state space , reward
, transition probability distribution
and policy . At time step , the agent can obtain its state from environment and then choose its action satisfying distribution . After that, its state will transit into and the agent will then receive reward from environment, where satisfies the transition probability distribution . The whole process is shown in Fig. 2.Denote the discount factor of reward by . A policy is defined as optimal if and only if its parameter satisfies
(1) 
To measure the expected accumulative reward of and , the state value function and the action value function are defined respectively as
(2) 
(3)  
By solving Eq. (4), the optimal policy is determined such that the objective of MDP is achieved. However, since both state value function and action value function are unknown before, Eq. (4
) cannot be solved directly. Therefore, state value function and action value function have to be estimated in order to solving
MDP problem.IiB Value Function Estimation
Value iteration is an typical method for value function estimation and then addressing MDP problem [25]. Denote the estimated state value function at step by , and the estimated action value function for each state at step by . is utilized to represent the policy at step . Then, the value iteration process can be expressed as
(5) 
(6)  
Through iteration, the policy and value functions will converge to optimum , and simultaneously.
However, since it is difficult to determine the explicit representation of , and (especially when the dimension of is high), VIN is applied to approximate this process successfully. Specifically, VIN is designed with Value Iteration Module, which consists of recurrent convolutional layers [21]. As illustrated in Fig. 3, the value function layer is stacked with the reward layer
and then filtered by a convolutional layer and a maxpooling layer recurrently. Furthermore, through
VIN, navigation information including global environment and target point can be conveyed to each state in the final value function layer. Experiments demonstrate that this architecture performs well in navigation tasks. However, it takes lots of time and computation resource to train such a recurrent convolutional neural network when the value of becomes large. Therefore, replacing Value Iteration Module with a more efficient and nonrecurrent architecture without losing its excellent navigation performance becomes the focus of this paper.IiC LearningBased Algorithms
Typically, there exist two learningbased algorithms for training DCNNs in value function estimation—Reinforcement learning [25] and Imitation learning [26]. In Reinforcement learning, no prior knowledge is required and the agent can find the optimal policy in complex environment by trial and error [27]. However, the training process of Reinforcement learning is computationally inefficient. In Imitation learning, when the expert dataset is given
, the training process transforms into supervised learning with higher dataefficiency and fitting accuracy.
Considering that expert dataset for global visual navigation is available ( is the optimal action at state and is the number of samples), in this paper, Imitation learning method is applied to find the optimal navigation policy.
Iii Model Description
Iiia Mars Visual Navigation Model
In this subsection, the process of formulating Mars visual navigation into MDP is presented. More precisely, state is composed of the Martian environment image , target point and the current position of Mars rover at time step . The action represents the moving direction of the Mars rover at time step (0:east, 1:south, 2:west, 3:north, 4:southeast, 5:northeast, 6:southwest, 7:northwest). After taking action , the current location of the Mars rover will change and the state will transit into . If the Mars rover reaches the target point precisely at time , a positive reward will be obtained (such as ). Otherwise, the Mars rover will get a negative reward (such as ).
Furthermore, the output vector of the proposed
DCNN is defined as (). Then the training loss is defined in cross entropy form with norm as [28](7) 
where is the number of training samples, is the onehot vector [29] of and
is the hyperparameter adjusting the effect of
norm on the loss function.
By minimizing the loss function , the optimal parameter of navigation policy is determined as follows
(8) 
IiiB The Novel Deep Neural Network Architecture
In this subsection, the novel deep neural network architecture—DBNet with double branches for deep feature representations and value function estimation is illuminated. The principle design idea of DBNet is to replace Value Iteration Module of VIN with a nonrecurrent convolutional network structure. Firstly, the reprocessing layers of DBNet compresses the input Martian environment image into feature map ( and ). Then, the global deep feature () and the local deep feature () are extracted from feature map by branch one and branch two respectively. By fusing and , the final deep feature (value function estimation) of Martian environment image is derived. Then, the optimal navigation policy can be determined through Eq. (5).
The diagram of DBNet is illustrated in Fig. 4, where Conv, Pool, Res, Fc and S
are short for convolutional layer, maxpooling layer, residual convolution layer, fullyconnected layer and softmax layer respectively. More specific explanations of
DBNet are given as follows.(1) The reprocessing layers comprises of two convolutional layers (Conv00, Conv01) and two maxpooling layers (Pool00, Pool01). After compressing the original image , the navigation policy becomes area by area instead of point by point (each area has size ). Thus, the efficiency of visual navigation is promptly enhanced.
(2) Branch one consists of one convolutional layer (Conv10), three residual convolutional layers (Res11, Res12, Res13), four maxpooling layers (Pool10, Pool11, Pool12, Pool13) and two fully connected layers (Fc1, Fc2). Notably, residual convolutional layer (Fig. 5) is one kind of convolutional layer proposed in [30], which not only increases the training accuracy of convolutional neural networks with deep feature representations, but also makes them generalize well to testing data. Considering that DBNet is required to represent deep features of Martian image and achieves highprecision in unknown Martian environment images, residual convolutional layers are employed on DBNet. The deep feature represented by this branch is a global guidance to the Mars rover, containing abstract information about global Martian environment and target point .
(3) Branch two is composed of two convolutional layers (Conv20, Conv21) and four residual convolutional layers (Res21, Res22, Res23, Res24). The deep feature represented by this branch depicts the local value distribution of Martian environment image with target , which acts as a local guidance to Mars rover.
(4) The final deep feature is fully connected with and through Fc3, corresponding to the value of one action at current state . Hence, following Eq. (5), the optimal visual navigation policy is determined.
Compared with VIN, not only the depth of DBNet is reduced significantly (since it is nonrecurrent), but also both global and local information of the image is kept and represented effectively. Detailed parameters of DBNet are demonstrated in TABLE I.
Reprocessing layers  Conv00 
kernels with stride 1 

Pool00  kernels with stride 2  
(A=12)  Conv01  kernels with stride 1 
Pool01  kernels with stride 2  
Branch one  Conv10  kernels with stride 1 
Pool10  kernels with stride 1  
Res11  kernels with stride 1  
Pool11  kernels with stride 2  
Res12  kernels with stride 1  
Pool12  kernels with stride 2  
(B=10)  Res13  kernels with stride 1 
Pool13  kernels with stride 1  
Fc1  192 nodes  
Fc2  10 nodes  
Branch two  Conv20  kernels with stride 1 
Res21  kernels with stride 1  
Res22  kernels with stride 1  
Res23  kernels with stride 1  
(C=10)  Res24  kernels with stride 1 
Res25  kernels with stride 1  
Conv21  kernels with stride 1  
Output layers  Fc3  8 nodes 
S1  8 nodes 
Iv Experiments and Analysis
In this section, DBNet and VIN are firstly trained and tested on Martian image dataset derived from HiRISE [31]. The dataset consists of 10000 highresolution Martian images, each of which has 7 optimal trajectory samples (generated randomly). The training set and the testing set consist of 6/7 and 1/7 dataset respectively. Then, navigation accuracy and training efficiency of DBNet and VIN are compared. Finally, detailed analysis of DBNet is made through model ablation experiments. More precisely, the following questions will be investigated:

Could DBNet provide the optimal navigation policy directly from original Martian environment images?

Could DBNet outperform the best framework—VIN in accuracy and efficiency?

Could DBNet keep its performance after ablating some of its components?
Iva Experiment Results on Martian Images
In this subsection, the process of training and testing DBNet and VIN on Martian image dataset is described. The input image has a size of with 3 channels (i.g. ), consisting of the gray image of original Martian environment, the edge image of original Martian environment generated by Canny algorithm [32] and the target image (Fig. 6). Then, training accuracy and testing accuracy of DBNet and VIN are counted to contrast the proportion of the optimal action they take each step. To compare the navigation performance of DBNet and VIN, success rate on both training images and testing images are counted. It is worth noting that a navigation process is considered successful if and only if the Mars rover reaches target point from start point without running into any risky areas.
As illustrated in Fig. 7, both training loss and training error of DBNet converge faster that VIN
. After 200 training epoches,
DBNet achieves 96.4% training accuracy and 95.4% testing accuracy, outperforming VIN significantly in precision (as shown in TABLE II). Moreover, compared with VIN, average time cost of DBNet in one training epoch is reduced by 45.8%, exceeding VIN in efficiency promptly. Finally, DBNet achieves high success rate both in training data and testing data. Remarkably, Martian environment images in testing data are totally unknown to DBNet, since training data differs from testing data. Therefore, even if the environment is unknown before, DBNet can still achieve highprecision visual navigation. By contrast, VIN exhibits poor performance on success rate, which is less than 80% in testing data.Examples of successful navigation process are demonstrated in Fig. 8. It can be seen that the rover avoid craters with varying size precisely under the guidance of DBNet. Furthermore, the trajectories are nearly optimal. It is worth noting that prior knowledge of craters are unknown and DBNet has to understand deep representations of original Martian images intuitively. Therefore, the performance of DBNet is marvellous.
Architectures  DBNet  VIN 

Training accuracy  96.4%  90.0% 
Testing accuracy  95.6%  89.8% 
Training success rate  96.0%  81.1% 
Testing success rate  93.3%  79.4% 
Average time cost (each epoch)  52.8s  97.5s 
IvB Model Ablation Analysis of DBNet
In this subsection, to test whether DBNet could keep its performance after ablating some of its components, model ablation experiments are conducted. Define DBNet without branch one as B1Net. Then, derive B2net by replacing residual convolutional layers of B1Net with normal convolutional layers. As illustrated in Fig. 9 and TABLE III, without global deep features, the navigation accuracy and success rate of B1Net drop promptly compared with DBNet. Moreover, with only normal convolutional layers, training cost and error of B2Net remain at high levels, unable to provide reliable navigation policy for the Mars rover. Therefore, both of the twobranch architecture and the residual convolutional layers make indispensable contributions to the final performance of DBNet.
28x28 grid map  DBNet  B1Net  B2Net 

Training accuracy  96.4%  86.2%  13.8% 
Testing accuracy  95.6%  85.8%  12.8% 
Training success rate  96.0%  63.2%  1.1% 
Testing success rate  93.3%  63.2%  1.3% 
Moreover, to explore the inner mechanism of DBNet, the final value function layers () of DBNet, B1Net and VIN are contrasted in a visualized way. The value function layers estimates the action value distribution of current Martian images and target point. After being visualized, locations close to target point should be lighter (larger value) while location far from target point or near risky areas should be darker (smaller value). As demonstrated in Fig. 10, the value functions estimated by by DBNet are more in coincidence with the original Martian images compared with B2Net. It is clear that risky areas are darker and the lighter locations are around target points in value function layers generated by DBNet from Fig. 10. By contrast, B1Net without global deep features cannot estimate the value function as precisely as DBNet. VIN also fails to recognize risky areas of Martian images evidently. Therefore, DBNet indeed has a remarkable capability of representing deep features and estimating the value distribution of current Martian environment.
V Conclusions
In this paper, a novel deep neural network architecture—DBNet with double branches and nonrecurrent structure is designed for dealing with Martian visual navigation problem. DBNet is able to determine the optimal navigation policy to target point directly from original Martian environment images without any prior knowledge. Moreover, compared with the existing best architecture—VIN, DBNet achieves higher precision and efficiency. Most significantly, the average training time of DBNet is reduced by 45.8%. In future research, more effective deep neural network architecture will be explored and the robustness of the architecture will be researched further.
Vi Acknowledgement
This work was supported by the National Key Research and Development Program of China under Grant 2018YFB1003700, the Beijing Natural Science Foundation under Grant 4161001, the National Natural Science Foundation Projects of International Cooperation and Exchanges under Grant 61720106010, and by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China under Grant 61621063.
References
 [1] Braun, R., Manning, R.: ‘Mars exploration entry, descent and landing challenges’, Journal of Spacecraft Rockets, 2007, 44 (2), pp.310323.
 [2] Matthies, L., Maimone, M., Johnson, A., et al.: ‘Computer Vision on Mars’, Internal Journal of Computer Vision, 2007, 75 (1), pp.6792.
 [3] Joseph, C., Arturo, R., Dave, F.: ‘Global path planning on board the Mars exploration rovers’, Aerospace Conference, 2007, pp.111.
 [4] Sakuta, M., Takanashi, S., and Kubota, T.: ‘An image based path planning scheme for exploration rover’, IEEE International Conference on Robotics and Biomimetics, 2011, pp.385388.
 [5] Guo, Q., Zhang, Z., Xu, Y.: ‘Pathplanning of automated guided vehicle based on improved Dijkstra algorithm’, Control and Decision Conference, 2017, pp.71387143.
 [6] Chiang, C.H., Chiang, PJ., Fei, C.C., et al.: ‘A comparative study of implementing Fast Marching Method and A* search for mobile robot path planning in grid environment: Effect of map resolution’, IEEE Workshop on Advanced Robotics and Its Social Impacts, 2007, pp.16.

[7]
Jeddisaravi, K., Alitappeh, R.J., Guimaraes, F.G.: ‘Multiobjective mobile robot path planning based on A* search’, International Conference on Computer and Knowledge Engineering, 2017, pp.712.

[8]
Ferguson, D., Stentz, A.: ‘Using interpolation to improve path planning: The Field D* algorithm’, Journal of Field Robotics, 2006, 23 (2), pp.79101.
 [9] Shi, J., Liu, C., Xi, H.: ‘A framedquadtree based on reversed D* path planning approach for intelligent mobile Robot ’, Journal of Computers, 2012, 7 (2), pp.464469.
 [10] Wooden, D.T.:‘Graphbased Path Planning for Mobile Robots’, thesis, Georgia Institute of Technology, 2006
 [11] Bassil Y.: ‘Neural network model for pathplanning of robotic rover systems’, International Journal of Science and Technology, 2012, 2 (2), pp.94100.
 [12] Zeng, C., Zhang, Q., Wei, X.: ‘Robotic global pathplanning based modified genetic algorithm and A* algorithm’, International Conference on Measuring Technology and Mechatronics Automation, 2011, pp.167170.

[13]
Kang, HI., Lee, B., Kim, K.: ‘Path planning algorithm using the particle swarm optimization and the improved Dijkstra algorithm’, Workshop on Computational Intelligence and Industrial Application, 2009, 17 (4), pp.10021004.
 [14] Gu, J., Wang, Z., Kuen, J., et al.: ‘Recent advances in convolutional neural networks ’, arXiv preprint arXiv:1512.07108, 2015.

[15]
Krizhevsky, A., Sutskever, I., Hinton, G., E.: ‘Imagenet classification with deep convolutional neural networks’, Advances in neural information processing systems, 2012, pp.10971105.
 [16] Huang, J., Guadarrama, S., Murphy, K., et al.: ‘Speed/accuracy tradeoffs for modern convolutional object detectors’, arXiv preprint arXiv:1611.10012, 2016.
 [17] Zhu, Y., Mottaghi, R., Kolve, E., et al.: ‘Targetdriven visual navigation in indoor scenes using deep reinforcement learning’, Proceedings of the International Conference on Robotics and Automation, 2017, pp.33573364.

[18]
Levine, S., Finn, C., Darrell, T., et al.: ‘Endtoend training of deep visuomotor policies’, Journal of Machine Learning Research, 2015, 17 (1), pp.13341373.
 [19] Tanner, C., Roberto, F., Richard, L., et al.: ‘A deep learning approach for optical autonomous planetary relative terrain navigation’, AAS/AIAA Spaceflight Mechanics Meeting, 2017, pp.329338.
 [20] Maturana, D., Scherer, S.: ‘3D convolutional neural networks for landing zone detection from LiDAR’, International Conference on Robotics and Automation, 2015, pp.34713478.
 [21] Tamar, A., Wu, Y., Thomas, G., et al.: ‘Value iteration networks’, In Advances in Neural Information Processing Systems, 2016, pp.21462154.
 [22] Khan, A., Zhang, C., Atanasov, N., et al.: ‘Memory augmented control networks’, arXiv preprint arXiv:1709.05706, 2017
 [23] Bellman, R.: ‘Dynamic programming’, Princeton University Press, 1957.
 [24] Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P., et al.: ‘Dynamic programming and optimal control’, Athena Scientific, 4th edition, 2012.
 [25] Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: An introduction’, MIT Press, 1998.
 [26] Li, Y.: ‘Deep reinforcement learning: An overview’, arXiv preprint arXiv:1701.07274, 2017.
 [27] Attia, A., Dayan, S.: ‘Global overview of Imitation Learning’, arXiv preprint arXiv:1801.06503, 2018.
 [28] Goodfellow, I., Bengio, Y., and Courville, A., et al: ‘Deep Learning’, MIT Press, 2016.
 [29] Harris,D., Harris, S.: ‘Bergstrom, W.J., et al.: ‘Digital design and computer architecture’, Chian Machine Press, 2014.

[30]
He, K., Zhang, X., Ren, S., et al.: ‘Deep residual learning for image recognition ’, IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.770778.
 [31] McEwen, S.A., Eliason, M.E., Bergstrom, W.J., et al.: ‘Mars Reconnaissance Orbiter’s High Resolution Imaging Science Experiment (HiRISE)’, Journal of Geophysical Research Planets, 2007, 112(E05S02), pp.140.
 [32] Canny, J.: ‘A Computational Approach To Edge Detection’, IEEE Transaction on Pattern Analysis and Machine Intelligence, 1986, 8(6), pp.679698.
Comments
There are no comments yet.