1 Introduction
Path planning is an important technique for robotic applications such as autonomous mobile robots and arm manipulation. The objective of this technique is to find an optimal or a feasible path from the initial state to the goal, subject to constraints derived from a variety of factors including the nonholonomic property of wheeled mobile robots, collision avoidance, or the limits associated with the joints of a robotic manipulator.
These constraints and the existence of local traps mean that a large amount of calculation is required to find a path. Traditional search algorithms, A* (Hart et al., 1968) and Heuristic based Rapidly exploring Random Trees (RRT) (Vemula et al., 2014), rely on heuristics to decrease the amount of calculations for finding a path. Typically, these heuristics are manually crafted for a particular robot and environment. For example, to accelerate the path search in threedimensional (3D) space (i.e., the 2D position and heading angle of the robot) under the nonholonomic constraints of carlike robots, the Hybrid A* algorithm (Montemerlo et al., 2008) combines the following manually designed heuristics: 1) a nonholonomic shortest path cost calculated by Dubins (Dubins, 1957) or ReedsandShepp (Reeds and Shepp, 1990) algorithms (assuming that no obstacles exist), and 2) a holonomic shortest path cost with obstacles calculated by the backward Dijkstra algorithm. Both require a reasonably smaller amount of computational time compared to the search cost itself. However, it is not trivial for humans to manually craft such heuristics for each specific search problem. Furthermore, although the simple combination of heuristics is effective in simple environments, heuristics do not perform as reliable in more complicated environments.
In recent years, Convolutional Neural Networks (CNNs), a machinelearning technique, have achieved impressive results in a variety of domains such as vision
(Noh et al., 2015; Yu and Koltun, 2016), language (Vinyals et al., 2014), audio (Hershey et al., 2016), and games (Silver et al., 2017; Mnih et al., 2015) applications. In this paper, we propose CNNbased heuristic learning methods. As depicted in Figure 1, our convolutional networks take feature images, which are extracted from an obstacle map, and a goal position as inputs, and predict the heuristic (estimated costtogo) values in each position in a 2D map as the outputs. These outputs are used in path planners as heuristics to accelerate the search in planners. CNNs have the following advantages with respect to learning heuristics in path planning problems, especially for robots; 1) CNNs can capture both the global structures of environments (e.g., the road map) and local details (e.g., the obstacle shape), and 2) CNNs can generate spatially structured outputs (e.g., heuristic values in neighboring configurations tend to be smoothly transitioned). We utilize path planning algorithms such as Backward Dijkstra and A* to generate ground truth heuristic values. Because our CNNs predict heuristics in a fully convolutional way, both the inference step and training step are efficiently taken on all states in environments at the same time. We also propose a learning method that combines supervision by a planner with the Temporal Difference learning method (TD) to improve sampling efficiency. Similarly to ours, the approach proposed by Bhardwaj et al.
(Bhardwaj et al., 2017) learns a heuristic by imitating the Backward Dijkstra algorithm. However, it uses fully connected neural networks, and is applied for each state feature to obtain the state output. This requires the computation of inference and training to be performed independently on each state. Our model has the ability to learn heuristics only from paths rather than requiring dense costtogo values that rely on an algorithm that performs a whole state search, the computational cost of which would be prohibitive for more complicated problems. In addition, our method predicts heuristics for all the states of a 2D map at the same time using fully convolutional NNs, whereas that of (Bhardwaj et al., 2017) predicts a heuristic independently for every state by using fully connected NNs, which is computationally inefficient. Consequently, they had to employ the DAgger algorithm (Ross et al., 2010; Ross and Bagnell, 2014) to efficiently sample training data. Furthermore, their method relies only on a whole state space search algorithm (i.e., Backward Dijkstra) to generate the ground truth, whereas we propose a more efficient method to generate the ground truth. That is, our method only relies on an optimal path search algorithm (i.e., the A* algorithm). This algorithm enables our method to be applied to problems of a larger scale and a wider range of domains.CNNs have previously been employed to solve problems related to path planning. Wulfmeier et al. (Wulfmeier et al., 2016, 2015)
learned CNNs to produce cost maps from demonstration (i.e., Inverse Reinforcement Learning). The purpose was to learn previously unknown cost functions for planning to imitate the demonstration behavior. Our objective is different in a sense that the heuristic is learned to reduce the computational time required for planning. Path planners were previously utilized for reactive CNN policy learning to solve robot navigation problems. For example,
(Kanezaki et al., 2018) learned a reactive CNN policy with global path planner results in the form of supervised signals. In another study, (Gao et al., 2017) utilized global planner paths as inputs to improve a CNN policy based on reinforcement learning. “Value Iteration Networks (VIN)" (Tamar et al., 2017)embed a differentiable planning module (i.e., value iteration) into CNNs that can learn planners including mapping from observations to cost maps and the state transition probabilities in an endtoend fashion.
(Gupta et al., 2017)applied VIN to mobile robot visual navigation problems to perform map localization and planning simultaneously by using an endtoend framework. In a VIN framework, the training objective is fundamentally arbitrary, and their experiments show imitation learning and reinforcement learning only, because their objective was not to use learning heuristics to speed up planners. The computational cost of value iteration becomes prohibitively large when the state space is large. As a result, VIN is limited to search problems with a small state space, e.g., a
2D grid world. Our method does not rely on value iteration at inference time, and can be applied to problems with a much larger search space, e.g., a grid world. Although we limit our experiments to a pathfinding problem in simple 2D grid worlds, our method could also be applied to larger problems such as 3D path planning with nonholonomic constraints. In summary, the main contributions of our paper are as follows:
learning heuristics using CNNs in a fully convolutional way over states

proposing three learning methods (backward Dijkstra(BD), Sparse, Sparse+TD) which imitate the costtogo values generated by either path planning algorithms or the Temporal Difference method

demonstrating significant reduction on search costs against a simple heuristic search method in our 2D grid world planning experiments
This paper is organized as follows: in Section 2, we describe the procedure of our proposed framework as illustrated in Figure 1. First, we describe a searchbased path planning algorithm and its heuristic function in Subsection 2.1. Further details of our heuristic learning approach are provided in Subsection 2.2. We introduce three different algorithms depending on the characteristics of the training data. Details of the experiments we performed on these algorithms appear in Section 3, where we discuss the results and describe the dataset and present the implementation of CNNs in our proposed framework. These results are used to demonstrate the effectiveness of the proposed framework. Finally, we summarize the results and discuss our future work in Section 4.
2 Proposed Framework
2.1 Preliminaries
We consider a searchbased path planner in a graph as a baseline. The pseudocode of this planner is provided in Algorithm 1. A graph search begins at a start vertex . At each vertex evaluation, it expands the next search candidates by , which returns successor edges and child vertexes. Each of the search candidate vertexes is validated by the function, which returns if an edge is occupied with an obstacle according to an environment .Each valid candidate is evaluated by a search score function , and all candidate vertexes are pushed into a queue with their scores. At the next iterative cycle, the queue pops a vertex with the highest score. Then, its successor vertexes are evaluated and pushed into the queue again. The procedure is repeated until it reaches a goal or the queue becomes empty.
Based on Algorithm 1, the Dijkstra search is obtained by defining a score function using a costsofar value denoted as :
(1) 
A costsofar is calculated by accumulating costs (denoted as ) of edges along a shortest path found so far during a search. By defining the search heuristics function , the A* algorithm can be derived from a score function
(2) 
and we define a search depending only on heuristics as a greedy search algorithm as follows:
(3) 
2.2 Learning Heuristics using Convolutional Neural Networks for Planner
Our goal is to find a more efficient heuristic function that minimizes search costs (the number of vertices visited/examined during search). As shown in Figure 1, our method considers an environment that contains a binary obstacle map as input, extracts the feature maps from it, then uses a CNN to predict a heuristic value at every node in a graph, which we call heuristic map. The predicted heuristic map is used as a lookup table for querying a heuristic value during a graph search based on the planner described in the previous section. Note that one can extend our method to take a continuous valued cost map as input, where each pixel represents a cost to visit the corresponding state.
Because the CNN in this method is fully convolutional, it has the ability to simultaneously predict a heuristic value for every node in a graph (singleshot inference). In addition, it can also leverage the matured implementations of GPGPU such as cuDNN. We learn the heuristic map with the aid of the planner, which is employed during the training of CNNs as a target of prediction. We introduce three variants of learning algorithms.
Dense target learning with Backward Dijkstra (BD): Our CNN can be directly trained by minimizing the squared error between the prediction and the target costtogo value at every node. The costtogo of a vertex is defined as the cost accumulated along a shortest path to the goal. The Backward Dijkstra algorithm can calculate the costtogo values of all valid vertexes in a graph, where searches are propagated from a goal until no vertex to be opened is available in Algorithm 1
. Our training is performed by minimizing the loss function
(4) 
where denotes the costtogo value map generated by the Backward Dijkstra, and is a mask to make it possible to ignore invalid vertexes the Backward Dijkstra search cannot visit during target value generation, e.g., areas occupied or surrounded by obstacles (Figure. 2).
Sparse target learning with A* path search (Sparse): The computational time required to generate the costtogo target value by the Backward Dijkstra is often prohibitively long for largescale problems (planning in larger 2D grid maps, or highdimensional problems, etc.), and can be a bottleneck for learning heuristics. We also propose a learning method that relies only on the target costtogo values in those vertexes belonging to the shortest path found by the A* algorithm, given randomly sampled starting and goal positions. A* is much faster than the Backward Dijkstra, which improves the data collection efficiency in terms of the variation of environments. Similar to dense target learning, eq.(4) is used as a loss function, although the training mask is 1 only at vertexes along a path.
Sparse target learning with TD error minimization (Sparse+TD): Learning with sparse target signals may result in underfitting when training due to no supervision signal in pixels that are not visited by the A* path. We propose a method to utilize temporal difference (TD) learning method in order to compensate the lack of supervision.
(5) 
where is initialized as the current prediction , , and . The value can be updated by using another iterative step and this can be implemented as a convolution with fixed kernels and biases followed by a minimum operation along an axis representing successor vertexes (Tamar et al., 2017). Note that we only update the value iteratively during training to obtain more dense target values of the costtogo. The use of an updated costtogo estimate enables the loss function to be written as
(6) 
where is 1 at , 0 otherwise, and balances the weight of the TD minimization loss. We can also update the value iteratively with multiple steps to obtain the target costtogo estimate. In our experiment, we used steps to iteratively update the value and set .
3 Experimental Setup and Results
3.1 Dataset
We trained and evaluated our algorithms on a 2D grid world path planning problem with a dataset provided by (Bhardwaj et al., 2017). We used seven different types of environments in the dataset, Shifting gaps, Bugtrap and Forest, Forest, Gap and Forest, Single Bugtrap, Mazes, and Multiple Bugtraps for our experiment. Each environment type contains different kinds of local traps. For example, the environment type Shifting gaps has an obstacle traversing the central section of the 2D map, obstructing the left and right sides of the map. The traversing obstacle is opened at a vertical position (the position is to be sampled randomly during dataset generation). Simple heuristics such as Euclidean heuristics may undesirably guide local traps by greedily moving towards the goal without considering the opened position.
Each environment type consists of 800 training 2D grid maps as binary images (either occupied by an obstacle or not), and 100 testing maps. Each map has the dimensionality of , where each grid indicates the existence of an obstacle. We consider each pixel in the map as a vertex and find a path from the start vertex to the goal vertex in an 8connected grid as a planning problem. The cost is defined as the distance of a path. Edges connected to a vertex at which an obstacle exists are considered as invalid. Although we randomly sampled the start and goal positions for path planning to generate supervision during training, we fixed the start and goal positions for path planning as and , respectively, during evaluation in order to be compatible with evaluation in Bhardwaj et al. (2017).
3.2 Implementation details
In the neural network architecture we used in our tasks, we employed suitable techniques such as a dilated convolution and encoderdecoder structure to extract global and local spatial contexts from 2D input maps, and to output spatially consistent output images. The encoder CNN repeatedly applied the convolution module three times to produce feature maps with smaller spatial dimensions, and larger maps to take a wider spatial context into account. The convolution module consists of three
convolutions, and each of them is followed by a batch normalization and a leaky ReLU. A stride of 2 was used in the first convolution, and the dilation factors of the convolution kernels
(Yu and Koltun, 2016) were incremented from 1 to 3. The number of convolution channels of the three modules was 16, 32, and 64, respectively. The decoder CNN repeated the deconvolution module three times. The deconvolution module is similar to the convolution module except that the first convolution is replaced with a deconvolution step with a kernel with an upscaling factor of 2. The number of convolution channels of the three modules is 32, 16, and 16, respectively, except the last convolution of the third deconvolution module produces single channel output in the form of a heuristics map. The input to the CNN consists of feature maps we extract from the 2D obstacle map. The feature maps consist of 1) obstacle map itself, 2) the distances from obstacles, and 3) the distance from the goal, each as an image, which are composed by stacking them as channels of images. The distances from the obstacle and goal are often used to construct Artificial Potential Fields Qureshi and Ayaz (2017), where the goal distance is used as an attractive potential function, and the obstacle distance is used as a repulsive potential function. We pass these functions to train a simple heuristic more easily.During training, we randomly sample 32 maps from the dataset to construct a minibatch of a stochastic gradient descent step. For each map, we use A* to randomly sample start and goal positions until a valid path is found between them, after which we generate the costtogo targets as described in Section
2.2 using either the Backward Dijkstra or A*. A random image translation is applied to inputs as data augmentation, which produces feature maps as inputs. We used Adam (Kingma and Ba, 2014) as a stochastic gradient descent algorithm with , , and . During testing, we used a greedy search algorithm as a planner as in eq.(3), which accepts the heuristics map produced by our trained CNNs.3.3 Results
For each type of environment in the dataset, training is performed for epochs. Each training period is approximately 10 hours on a single GTX 1080 Ti for CNNs and on a Corei7 K7700 for onthefly planning groundtruth generation.
Figure 3 shows the training curves of the mean absolute errors between the predicted heuristic values and ground truth costtogo values obtained from the evaluation set. BD consistently produced a smaller error than Sparse and Sparse+TD because it can utilize the ground truths on all states in an environment as its training targets. Sparse and Sparse+TD produce fairly similar results, although they can only access the ground truths at states along an optimal path. Table 1 contains the results of our evaluation of the trained models utilized as heuristic value estimators in a greedy path planner for both metrics: search cost and path quality. The search cost is defined as the number of expanded vertexes during the search, where a smaller number corresponds to a shorter search time. The path quality is calculated by accumulating the distance moved along a generated path.
BD consistently outperforms the others in both metrics with the learning curve evaluation, whereas no significant difference is observed between Sparse and Sparse+TD. We used Sparse in the subsequent experiments because it maintains a good balance between efficiency in training data generation, algorithm simplicity, and performance.
Search cost  Path quality  

Heuristic  Learned(BD)  Learned(Sparse)  Learned(Sparse+TD)  Learned(BD)  Learned(Sparse)  Learned(Sparse+TD)  
Environment  Shifting gap  263  351  329  322  350  353 
Bugtrap and Forest  450  544  594  359  373  380  
Forest  305  308  465  324  329  419  
Gaps and Forest  300  357  445  337  348  350  
Single Bugtrap  270  327  367  321  346  342  
Mazes  401  403  941  356  370  377  
Multiple Bugtraps  855  1697  1405  367  392  376 
Table 2 compares our trained heuristic with a simple Euclid distance heuristic and a heuristic trained with SaIL (Bhardwaj et al., 2017). The results quantitatively show that our method significantly outperforms the other methods in terms of the search cost in all environments. Although the greedy path planner does not aim to find the optimal path, our methods produce paths that closely approximate the optimal path (Optimal) in terms of the path quality . Euclid also produces path quality close to the ground truth optimal path, because it leads the search aggressively towards the goal, which enables it to find the minimum path length in this simple holonomic 2D pathfinding problem. However, its search cost is far larger than ours. Compared to those of (Bhardwaj et al., 2017), our results suggest that our convolutional model predicts more effective heuristics from simply generated feature maps without having the carefully designed features that are fed into simple fully connected networks as in (Bhardwaj et al., 2017)
. Our feature extraction mostly relies on fully convolutional architectures only.
We also compared the computational time on our local machine (GTX 1080 Ti and Intel Corei7 K7700) against the Euclid heuristic baseline. This was not compared with Bhardwaj’s results (Bhardwaj et al., 2017) because their implementation is too slow because of its pure Python implementation. Our planners are written in highly optimized C++, and the CNNs for heuristic prediction utilize GPUs for computation. The average planning time is as follows: 1) A* with Euclidean heuristics: ms, 2) Greedy search with Euclidean heuristics: ms, 3) Greedy search with learned heuristics (ours): ms (CNNs: ms Planning: ms). The learned heuristics reduces the planning time considerably relative to the Euclidean baseline. Even after adding the computational cost of the CNNs, our method is significantly faster than the baselines.
Search cost  Path quality  
Planner  Greedy  A*  SaIL Bhardwaj et al. (2017)  Greedy  SaIL Bhardwaj et al. (2017)  
Heuristic  Optimal  Euclid  Learned(ours)  Euclid  Optimal  Euclid  Learned(ours)  
Environment  Shifting gap  250  37814  351  23699  505  311  314  350  331 
Bugtrap and Forest  273  20367  544  35056  751  325  352  373  395  
Forest  252  9205  308  24418  357  312  334  329  327  
Gaps and Forest  259  12386  357  19981  8913  316  322  348  945  
Single Bugtrap  236  3303  327  27797  1215  303  306  346  337  
Mazes  266  12687  403  22013  1035  333  337  370  428  
Multiple Bugtraps  274  19351  1697  29044  3182  325  347  392  439 
As shown in Figure 4, the Euclidean heuristic often leads the search to local traps owing to its ignorance of obstacle structures in the environment, which causes undesired search effort. One may notice that our model often produces jaggy paths. This effect is attributed to the fact that the heuristic prediction by our CNN model appears globally consistent, but locally noisy. However, our main interest is to reduce the computational cost during path search rather than achieving optimality. Furthermore, we could locally optimize or smooth the obtained feasible paths as a postprocessing step.
4 Conclusion
In this paper, we proposed a novel CNNbased heuristic learning framework for rapid planners. Our experiments on path finding problems in 2D grid worlds showed that the proposed learning approaches significantly decrease the search effort compared to a handcrafted heuristic search. Our convolutional method demonstrated a promising direction to learn the heuristic function to minimize the search cost in complicated environments. One can extend our work to more complicated and highdimensional search problems, such as nonholonomic path planning problems of mobile robots (e.g., implementing a learning heuristic as CNNs in the Hybrid A* algorithm), and robot arm motion planning (e.g., learning sampling heuristics or the distance metric in RRT).
References
 Hart et al. (1968) P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, July 1968.
 Vemula et al. (2014) Anirudh Vemula, Sanjiban Choudhury, and Sebastian Scherer. Learning motion planning assumptions. Carnegie Mellon University Techinial Report, August 2014.
 Montemerlo et al. (2008) Michael Montemerlo, Jan Becker, Suhrid Bhat, Hendrik Dahlkamp, Dmitri Dolgov, Scott Ettinger, Dirk Haehnel, Tim Hilden, Gabe Hoffmann, Burkhard Huhnke, Doug Johnston, Stefan Klumpp, Dirk Langer, Anthony Levandowski, Jesse Levinson, Julien Marcil, David Orenstein, Johannes Paefgen, Isaac Penny, Anna Petrovskaya, Mike Pflueger, Ganymed Stanek, David Stavens, Antone Vogt, and Sebastian Thrun. Junior: The stanford entry in the urban challenge. J. Field Robot., 25(9):569–597, 2008.
 Dubins (1957) Lester E Dubins. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of Mathematics, 79:497–516, 1957.
 Reeds and Shepp (1990) J. A. Reeds and L. A. Shepp. Optimal paths for a car that goes both forwards and backwards. Pacific Journal of Mathematics, 145(2):367–393, 1990.

Noh et al. (2015)
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han.
Learning deconvolution network for semantic segmentation.
In
Proc. IEEE Int. Conf on Computer Vision (ICCV)
, pages 1520–1528, 2015.  Yu and Koltun (2016) Fisher Yu and Vladlen Koltun. MultiScale Context Aggregation by Dilated Convolutions. In Proc. Int. Conf on Learning Representations (ICLR), 2016.
 Vinyals et al. (2014) Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. CoRR, abs/1411.4555, 2014.
 Hershey et al. (2016) Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin W. Wilson. CNN architectures for largescale audio classification. CoRR, abs/1609.09430, 2016.
 Silver et al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
 Mnih et al. (2015) Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Humanlevel control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
 Bhardwaj et al. (2017) Mohak Bhardwaj, Sanjiban Choudhury, and Sebastian Scherer. Learning heuristic search via imitation. In Proc. 1st Annual Conf. on Robot Learning (CoRL), pages 271–280, 2017.
 Ross et al. (2010) Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. Noregret reductions for imitation learning and structured prediction. CoRR, abs/1011.0686, 2010.
 Ross and Bagnell (2014) Stéphane Ross and J. Andrew Bagnell. Reinforcement and imitation learning via interactive noregret learning. CoRR, abs/1406.5979, 2014.
 Wulfmeier et al. (2016) Markus Wulfmeier, Dominic Zeng Wang, and Ingmar Posner. Watch this: Scalable costfunction learning for path planning in urban environments. In Proc. IEEE Int. Conf. on Intelligent Robots and Systems (IROS), pages 2089–2095, 2016.
 Wulfmeier et al. (2015) Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. Maximum entropy deep inverse reinforcement learning. In Neural Information Processing Systems Conference, Deep Reinforcement Learning Workshop, 2015.
 Kanezaki et al. (2018) Asako Kanezaki, Jirou Nitta, and Yoko Sasaki. GOSELO: goaldirected obstacle and selflocation map for robot navigation using reactive neural networks. IEEE Robotics and Automation Letters, 3(2):696–703, 2018.

Gao et al. (2017)
Wei Gao, David F. C. Hsu, Wee Sun Lee, Shengmei Shen, and Karthikk Subramanian.
Intentionnet: Integrating planning and deep learning for goaldirected autonomous navigation.
In Proc. 1st Annual Conf. on Robot Learning (CoRL), 2017. 
Tamar et al. (2017)
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel.
Value iteration networks.
In
Proc. IJCAI International Joint Conference on Artificial Intelligence
, 2017. 
Gupta et al. (2017)
Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra
Malik.
Cognitive mapping and planning for visual navigation.
In
Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
, pages 7272–7281, 2017.  Qureshi and Ayaz (2017) Ahmed Hussain Qureshi and Yasar Ayaz. Potential functions based sampling heuristic for optimal path planning. CoRR, abs/1704.00264, 2017.
 Kingma and Ba (2014) Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
Comments
There are no comments yet.