Combining optimal path search with task-dependent learning in a neural network

by   Tomas Kulvicius, et al.
The University of Göttingen

Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman Ford algorithm. The neural network has the same algorithmic complexity as Bellman Ford and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path-augmentation (by learning) is directly coupled with path finding in a natural way.



There are no comments yet.


page 1

page 4

page 7

page 9

page 10


Shortest-Path-Preserving Rounding

Various applications of graphs, in particular applications related to fi...

Extrapolating paths with graph neural networks

We consider the problem of path inference: given a path prefix, i.e., a ...

Towards Time-Optimal Any-Angle Path Planning With Dynamic Obstacles

Path finding is a well-studied problem in AI, which is often framed as g...

Interpreting Basis Path Set in Neural Networks

Based on basis path set, G-SGD algorithm significantly outperforms conve...

Multi-goal path planning using multiple random trees

In this paper, we propose a novel sampling-based planner for multi-goal ...

Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

In many applications, it is important to characterize the way in which t...

Deep Learning without Weight Transport

Current algorithms for deep learning probably cannot run in the brain be...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This This study addresses the so-called single-source shortest path (SSSP) problem. Possibly the most prominent goal for any path finding problem is to determine a path between two vertices (usually called source and destination) in a graph such that the total summed costs for traveling along this path is minimised. Usually (numerical) costs are associated to the edges that connect the vertices and costs are this way accumulated when traveling along a set of them. In the SSSP problem, the aim is to find the shortest paths from one vertex (source) to all other remaining vertices in the graph.

The SSSP problem has a wide range of applications, e.g., in computer networks (shortest path between two computers), social networks (shortest path between two persons or linking between two persons), trading and finance (e.g., currency exchange), multi-agent systems such as games, task and path planning in robotics, etc., just to name a few.

The most general way to solve the SSSP problem is the Bellman-Ford (BF) algorithm ([6, 11, 5]), which can also deal with graphs that have some negative cost values. In this work, we present a neural implementation which is mathematically equivalent to the BF algorithm with an algorithmic complexity which is the same as for BF.

The neural implementation relies on the multiplication of activities (instead of adding of costs). As a consequence, it is directly compatible with (Hebbian) network learning, which is not the case for any additively operating path planning algorithm. To demonstrate this, we are using Hebbian type 3-factor learning [17, 23, 31] to address SSSP tasks under some additional constraints, solved by the 3-factor learning.

Our paper is structured as follows. First, we will provide an overview of the state of the art methods and state our contribution with respect to that. Next, we will describe details of the BF algorithm and the proposed neural network (NN-BF). Afterwards, we will present the 3-factor learning scheme and two learning scenarios: navigation learning and sequence learning. This will be followed by a comparison between the BF algorithm and NN-BF and by examples of combining NN-BF based planning with 3-factor learning. Finally, we will conclude our study with a summary and provide an outlook for future work.

2 Related Work

2.1 State-of-the-art

There are mainly two types of approaches to solve the SSSP problem: classical algorithms and approaches based on artificial neural networks. Classical algorithms exist with different complexity and properties. The simplest and fastest algorithm is ”breadth first search (BFS)” [25, 24]. However, it can only solve the SSSP problem for graphs with uniform costs, i.e., all costs equal to 1. Dijkstra’s algorithm [9] as well as the Bellman-Ford (BF) algorithm [6, 11] can deal with graphs that have arbitrary costs. From an algorithmic point of view, Dijkstra’s algorithm is faster than the BF algorithm, however, Dijkstra only works on graphs with positive costs, whereas the BF algorithm can also solve graphs where some of the costs are negative. Furthermore, Dijkstra is a greedy algorithm and requires a priority queue, which makes it no so well suited for parallel implementation as compared to BF.

Some heuristic search algorithms such as A*

[16] and its variants (see [20, 19, 35, 15]) exist, which are faster than Dijkstra or BF, but they can only solve the single-source single-target shortest path problem.

The most general, but slowest, algorithm to solve the SSSP problem is the Floyd-Warshall algorithm [10, 39], which finds all-pairs shortest paths (APSP), i.e., all shortest paths from each vertex to all other vertices. Another way to solve the APSP problem is by using Johnson’s algorithm [18], which utilises Bellman-Ford and Dijkstra’s algorithms and is – under some conditions – faster than the Floyd-Warshall algorithm (essentially this is the case for sparse graphs).

Many different algorithms exist, which utilise artificial neural networks to solve shortest path problems. Some early approaches, which were dealing with relatively small graphs (below 100 nodes), were based on Hopfield networks [1, 28, 2]

or Potts neurons and a mean field approach

[14]. These approaches, however, may not always give optimal solutions or may fail to find solutions at all, especially for larger graphs.

Some other bio-inspired neural networks were proposed [12, 13, 8, 40, 32, 26, 22] for solving path planning problems. These approaches work on grid structures, where activity in the network is propagated from the source neuron to the neighbouring neurons until activity propagation within the whole network is finished. Shortest paths then can be reconstructed by following the activity gradients. The drawback of these approaches is, however, that they are specifically designed for grid structures and can not be applied at general graphs with arbitrary costs.

Several deep learning approaches were proposed to solve shortest path problems, for example using a deep multi-layer perceptron (DMLP,

[33]), fully convolutional networks [29, 3, 21]

, or a long short-term memory (LSTM) network


. Some of these approaches had also employed deep reinforcement learning

[36, 27, 4]. These approaches are designed to solve path planning problems in 2D or 3D spaces and cannot deal with graphs with arbitrary costs.

In addition to this, deep learning approaches based on graph neural networks (GNNs) have been employed to solve path problems, too [34, 42, 38, 37, 43, 41], mostly in the context of relation and link predictions. While deep learning approaches may lead to a better run-time performance as compared to classical approaches due to fast inference, all deep learning based approaches need to learn their (many) synaptic weights before they are functional. This usually requires a large amount of training data. Another disadvantage of these approaches is that optimal solution is not guaranteed since networks intrinsically perform function approximation based on the training data. Moreover, in some of the cases networks may even fail to find a solution at all, especially, in cases where training data is quite different from new cases (generalisation issue for out-of-distribution data).

2.2 Contribution

Different from the discussed deep learning approaches, we present here a novel neural network for solving the SSSP problem. To the best of our knowledge this is the first neural implementation of the Bellman-Ford algorithm, which finds optimal solutions for graphs with arbitrary positive and (some) negative costs. The advantage of this is that, different from BF, we can also directly apply neuronal learning rules that can dynamically change the activity in the network and lead to re-routing of paths according to some requirements. We show this by two cases where neuronal learning is used to structure the routing of the paths in a graph in different ways.

3 Methods

3.1 Bellman-Ford algorithm

The Bellman-Ford (BF) algorithm finds shortest paths in a weighted graph from a source node (vertex) to all other nodes [6, 11]. Shortest paths are defined in terms of minimal distances from the source node to all other nodes.

Input: list of graph edges with costs , source node
Output: list of distances, list of predecessor nodes

for each node  do
      Initialise all distances with
      Initialise all predecessors with
end for
Initialise distance of the source node with
for  to  do Repeat times ( - number of nodes)
     for each edge with cost  do For each edge perform relaxation
         if  then
         end if
     end for
end for
return , Return list of distances and list of predecessors
Algorithm 1 Pseudo-code of the Bellman-Ford algorithm – version 1.

Input: list of graph edges with costs , source node
Output: list of distances, list of predecessor nodes

for each node  do
      Initialise all distances with
      Initialise all predecessors with
end for
Initialise distance of the source node with
for  to  do Repeat times ( - number of nodes)
     for each vertex  do For each vertex perform relaxation
         for each edge of vertex with cost  do
              if  then
              end if
         end for
     end for
end for
return , Return list of distances and list of predecessors
Algorithm 2 Pseudo-code of Bellman-Ford algorithm – version 2.

There are two versions of the BF algorithm [5]. The pseudo-code of BF – version 1 and BF – version 2 is presented in Algorithm 1 and Algorithm 2, respectively. Both versions have the same basic algorithmic complexity (see analysis below) but BF-Version 2 operates on graph nodes and not on graph edges which allows implementing BF – version 2 in totally asynchronous way (computations at each node are independent) [5]111In spite of this, BF – version 2 is largely ignored in the literature and users will usually be directed to BF – version 1 when doing a (web-)search..

We denote a weighted graph as with vertices , edges and corresponding edge costs . In the first version of the BF algorithm a so called relaxation procedure (cost minimisation) is performed for all edges whereas in the second version of the BF algorithm relaxation is performed for all nodes. The relaxation procedure is performed for a number of iterations until convergence, i.e., approximate distances are replaced with shorter distances until all distances finally converge.

The BF algorithm is guaranteed to converge if relaxation of all edges is performed times222Note that this corresponds to the worst case. See also algorithmic complexity analysis below.. After convergence, shortest paths (sequences of nodes) can be found from a list of predecessor nodes backwards from target nodes to the source node.

3.2 Neural implementation of the Bellman-Ford algorithm

In this section we will describe our neural implementation of Bellman-Ford (BF) algorithm. Similar to BF, the proposed neural network finds shortest paths in a weighted graph from a source node (vertex) to all other nodes, however, shortest paths here are defined in terms of maximal activations from the source neuron to all other neurons.

Input: list of neuron connections with weights , source neuron
Output: list of neuron activations (outputs), list of input nodes with maximal activity

for each neuron  do
      Initialise all neuron activations with
      Initialise all maximal input nodes with
end for
Initialise activation of the source neuron with
for  to  do Repeat times ( - number of neurons)
     for each neuron  do For each neuron compute its activation
         for each input of neuron with weight  do
              if  then
              end if
         end for
     end for
      Set activation of the source neuron to
end for
return , Return list of activations and list of max inputs
Algorithm 3 Pseudo-code of neural network algorithm.

The neural network has neurons where corresponds to the number of vertices in a graph, connections between neurons and () correspond to the edges, and connection weights correspond to the (inverse) costs of edges which are computed by


where . In the Appendix we prove that the solutions obtained with the neural network are identical to the solutions obtained with BF if . However, for practical applications, even when using very large graphs, we have found that this also holds if is a large enough number, e.g., (see section 4.1.1).

Fig. 1: Example of solving a directed weighted graph with four vertices and six edges using (a) BF and (b) NN-BF. Green circle denotes the source vertex/neuron. Yellow numbers in (a) and (b) correspond to the edge and vertex numbers respectively. Black numbers correspond to the edge/connection weights. Red numbers correspond to costs/activations from the source vertex/neuron for the respective edges/inputs. Numbers inside graph vertices/neurons correspond to costs/activations from the source to the respective nodes/neurons. We used to convert edge costs to connections weights, i.e., (). Green numbers in (a) correspond to the numbers of edges and the order in which they were processed, whereas green numbers in (b) correspond to numbers of neurons and their order in which their were processed.

We present the pseudo-code of the neural network algorithm (NN-BF) in Algorithm 3. Similar to BF – version 2 (Algorithm 2), we run NN-BF for a number of iterations until convergence operating on the graph nodes (neurons). In every iteration we update the activation (output) of each neuron by


where correspond to the inputs of neuron , and to the connection weights between input neurons and output neuron . To start this process, we set the activation of the source neuron to . Thus, this way activity from the source neuron is propagated to all other neurons until activation has converged everywhere. After convergence, shortest paths (sequences of neurons) can be found from a list of maximal input activations backwards from target neurons to the source neuron (similar to BF algorithm).

The NN-BF algorithm is guaranteed to converge if activity of the network is updated () times. This can be shown by the the worst case scenario where neurons would be connected in a sequence, i.e., . Thus, to propagate activity from the neuron to the last neuron in the sequence one would need to perform updates.

A comparison of solving a simple graph using BF (version 1) and NN-BF is presented in Fig. 1. Here we used a simple directed weighted graph with four nodes and six edges with one negative cost and five positive costs. We show the resulting distances in (a) and network activations in (b) after each iteration (numbers inside the nodes/neurons of the graph/neural network). Red edges/connections denote distance/activity propagation from the source node/neuron to all other nodes/neurons. For more details please refer to the figure caption. In this case, the algorithm converged after two iterations when using BF and after one iteration when using NN-BF.

3.3 Performance evaluation of the neural planner

We evaluated the performance of the neural network and compared it against performance of the Bellman-Ford algorithm (version 1) with respect to 1) solution equality, 2) algorithmic complexity and 3) number of iterations until convergence.

For the numerical evaluation we generated random directed weighted graphs of different density, from sparse graphs (only few connections per node) to fully connected graphs (every node connects to all other nodes). To generate graphs of different density, we used the probability

to define whether two nodes will be connected or not. For example, a graph with 100 nodes and will obtain a connectivity rate, and, thus, to edges per node and edges in total.

Costs for edges were assigned randomly from specific ranges (e.g., and ), where negative weights were assigned with probability (e.g., ) and positive weights with probability . The specific parameters for each evaluation case will be provided in the results sections.

3.4 Combining neural planning network with neural plasticity

The neural implementation of the Bellman-Ford algorithm allows us to use neuronal learning rules in order to dynamically change connection weights between neurons in the network and, thus, the outcome of the neural planner. In this section we present a combination of the neural planner with Hebbian type synaptic plasticity ([17, 30, 31]) where we first present a learning scheme and rule, and then we present two learning scenarios, namely, navigation learning and sequence learning.

3.4.1 Learning scheme

As described above, the shortest path in the network between a source and a target neuron is given by that specific sequence of connected neurons, which leads to the largest activation at the target neuron. Hence, learning that modifies connection weights (and, hence, the resulting activations) can alter path-planning sequences.

Fig. 2: Schematic diagram of the learning architecture. - external input; - reward signal; and - output of pre-synaptic and post-synaptic neuron, respectively; - synaptic weight between pre- and post-synaptic neuron.

A diagram of the learning scheme is presented in Fig. 2, where we show components of the learning rule to change the synaptic weight between two neurons in the planning network. The outputs of pre-synaptic and post-synaptic neuron, and , are computed as described above using Eq.(2). Note that here neurons will also receive additional external input signaling occurrence of specific events, for instance, if an agent is at the specific location in an environment (otherwise it will be set to ). However, the network will only receive external inputs during learning but not during planning. The driving force of the learning process is the reward signal which can be positive or negative and will lead to weight increase (long-term potentiation [LTP]) or decrease (long-term depression [LTD]), respectively. Thus, the synaptic weight between pre- and post-synaptic neuron is changed according to


where is the learning rate. This learning rule represents a three-factor rule (see [23] for a discussion of three-factor learning), where here the third-factor is the reward signal . Note that in our learning scheme weights were bounded between and , i.e., .

3.4.2 Navigation learning

In the first learning scenario, we considered a navigation task in 2D space, where an agent had to explore an unknown environment and plan the shortest path from a source location to a destination.

We assume a rectangular arena, see Fig. 6(a,b), iteration 0, with some locations (blue dots) and obstacles (grey boxes). The locations are represented by neurons in the planning network, whereas connections between neurons correspond to possible transitions from one location to other neighbouring locations. Note that some transitions between neighboring locations, and, thus, connections between neurons are not possible due to obstacles. The positions of locations and possible transitions between them were generated in the following way. The position of a location is defined by


where , and

is noise from a uniform distribution

. In this study, we used (), , , and .

Possible transitions between neighbouring locations, and, thus, possible connections between neurons were defined based on a distance threshold between locations, i.e., connections were only possible if the Euclidean distance between two locations . In one case (Fig. 6(a)) we used and in another case (Fig. 6(b)) we used .

Synaptic weights between neurons were updated after each transition (called iteration) from the current location (pre-synaptic neuron) to the next neighboring location (post-synaptic neuron) according to the learning rule as described above, where pre- and post-synaptic neurons obtain an external input during this transition (otherwise ). In the navigation scenario the reward signal was inversely proportional to the distance between neighbouring locations , i.e., (here we used ). Thus, smaller distances between locations lead to larger reward values and larger weight updates, and vice versa. In the navigation learning scenario, initially all weights were set to , and the learning rate was set to . The learning and planning processes were repeated many times, where the learning process was performed for 100 iterations and then a path from the source (bottom-left corner) to the destination location (top-right corner) was planned using current state of the network.

We used two navigation learning cases. In the first case, a static environment with multiple obstacles was used where obstacles were present during the whole experiment (see Fig. 6(a)), whereas in the second case a dynamic environment was simulated, where one big obstacle was used for some period of time and then was removed some time later (see Fig. 6(b)).

3.4.3 Sequence learning

In the second learning scenario, we were dealing with sequence learning, where the task was to learn executing a predefined sequence of arbitrary events, which in our study were denoted by letters. For instance, this could correspond to learning to play a tune were letters correspond to sounds.

In this case, we used a fully connected network (without self-connections) were neurons correspond to letters. In total we used six different events (letters), i.e., , , , (see Fig. 7(a)). Initially all weights were set to random values from a uniform distribution

. Here, the planning and learning procedure is performed in epochs, where we execute the sequence based on the planner’s outcome always starting with letter

and ending with letter , and perform weight updates for each pair in the planned sequence. Suppose that at the beginning the planner generates a sequence , then there will be two learning iterations in this epoch, i.e., weight update for the transition from to and from to . As in the navigation learning scenario, pre- and post-synaptic neurons will receive external input whenever a certain transition from one event to the next event happens, e.g., from (pre-synaptic neuron) to (post-synaptic neuron).

Different from the navigation scenario, here we used positive and negative rewards, i.e., if a sequence-pair is correct and otherwise. In case of our previous example, this would lead to an increase of the synaptic weight between neurons and (correct), and to a decease of the synaptic weight between neurons and (incorrect; after should be ). The learning procedure is repeated for several learning epochs until convergence. Here we used the relatively high learning rate , which allows fast convergence to the correct sequence.

We performed two sequence learning cases where in the first case we were learning a sequence consisting of all possible six events , and in the second case a shorter sequence had to be learnt.

4 Results

4.1 Comparison of BF and NN-BF

In the following we first provide a comparison between BF and NN-BF, and then we will show results for NN-BF also on network learning.

4.1.1 Solution equality

To prove equivalence one can pairwise compare two paths A and B between some source and a target, where A is the best path under the Bellman Ford condition and B is any another path. Path A is given by nodes and B by . As described above, for a Bellman-Ford cost of we define the weight of the NN-BF connection as and likewise for . We need to show that:


Note, in general we can assume for both sides a path length of , because – if one path is shorter than the other – we can just extend this by some dummy nodes with zero cost such that the Bellman-Ford costs still add up correctly to the total.

First we analyse the simple case of . The general proof, below, makes use of the structural similarity to the simple case. Thus, we show:


We start from the second line in Eq. 6 and rewrite it as:


We simplify and divide by , where , to:


Now we let and get:


which proves conjecture Eq. 6 for large enough .

To generalize this we need to show that Eq. 5 holds. For this, we analyse the second inequality given by:


From the structure of Eq. 7 we can deduce that:


and likewise for , which allows writing out both sides of inequality Eq. 5. When doing this (not shown to save space), we see that the fore-factor can be eliminated. Then the term also subtracts away from both sides. After this we can divide the remaining inequality as in Eq. 8 above, here using as divisor. This renders:


using the correct individual signs for . Now, for , the disturbance terms vanish on both sides and we get:


as needed.

We conclude that for the neural network is equivalent to the Bellman-Ford algorithm. The examples in the Appendix show that can indeed be unbounded, but in the following we will show by an extensive statistical evaluation that the distribution of for different realistic graph configurations is well-behaved and we never found any case where had to be larger than .

Evidently can grow if . This can be seen easily as in this case we can express , with a small positive number. When performing this setting we get from Eq. 4.1.1 that , which can be fulfilled with large values for .

We have set the word ”can” above in italics, because a broader analysis shows that – even for similar sums – gets somewhat larger only for very few cases only as can be seen from Fig. 3.

For this we performed the following experiment. We defined two paths A and B with different lengths and , where was taken from and all combinations of and

were used. We evaluated 60 instances for each path pair (in total 11760 path pairs). To obtain those, we generated the cost for all edges in both paths using Gaussian distributions, where mean

and variance

are chosen for each trial randomly and separately for A and B with and .

Fig. 3: Statistics for computed from a large number of 2-path combinations. Contrast . For , NN-BF renders identical results as BF.

In Fig. 3, we are plotting on the horizontal axis the ”contrast” between the summed cost of path A and B given by . This allows comparing sums with quite different values. The vertical axis shows , where for the neural network renders identical results as Bellman Ford.

Note that we have truncated this plot vertically, but only values of had to be left out, all with contrast values , where the largest was with a contrast of . Hence, only for small contrast a few larger values of are found. Note also that the asymmetry with respect to positive versus negative contrast is expected, because fewer negative costs exist in paths A and B due to our choice of .

Fig. 4: Analysis of values for graphs of different sizes (500, 1,000, and 2,000), different densities (5, 10, 50 and 100%) and different cost ranges ([1, 10], [1, 100], and [1, 1000]). For cases with graph densities 5 and 10% and cost range [1, 1000], and for with graph density 5% and cost range [1, 1000], 10,000 random graphs were used, whereas in all other cases 1,000 graphs were used.

Considering only pairwise paths is of limited practical use. From such a perspective it is better to consider different graphs with certain cost ranges and different connection densities where one should now ask how likely it would be that results of BF and NN-BF are mismatched due to a possible -divergence. To assess this we had calculated the statistics for three different graphs with sizes 500, 1,000, and 2,000 nodes and randomly created costs taken from three different uniform cost distributions with intervals: [1, 10], [1, 100], and [1, 1000]. We considered four different connectivity densities for each graph with 5, 10, 50, and 100% connections each. Fig. 4 shows the maximal found was , hence that in all cases will suffice, where such a high value is only needed for sparse graphs and large costs.

4.1.2 Algorithmic Complexity of the Algorithms

In the following we will show that all three algorithms have the same algorithmic complexity.

Bellman-Ford (BF) – version 1 consists of two loops (see Algorithm 1), where the outer loop runs in time and the inner loop runs in , where and define the number of graph vertices (nodes) and edges (links), respectively. In the worst case, BF needs to be run for iterations and in the best case only for one iteration. Thus, the worst algorithmic complexity of BF is , whereas the best is .

The algorithmic structure of the second version of the BF algorithm (see Algorithm 2) is the same as that of our NN-BF algorithm (Algorithm 3). Hence their complexity is the same and we will provide analysis for the NN-BF algorithm. NN-BF operates on neurons and their inputs and not on edges of the graph as in BF – version 1. Hence NN-BF is similar to BF – version 2 that operates on the nodes. The outer loop, as in BF – version 1, runs in where is the number of neurons and corresponds to the number of vertices . The second loop iterates through all neurons and the third loop iterates through all inputs of the respective neuron and runs in and , respectively. Here corresponds to the number of inputs of a particular neuron . Given the fact that and (), we can show that the worst and the best algorithmic complexity of the NN-BF algorithm is the same as for BF – version 2, i.e., and , respectively. Hence, this is the same complexity as for BF-version 1, too.

4.1.3 Convergence analysis

Fig. 5: Results of convergence analysis. Maximal number of iterations until convergence of Bellman-Ford – version 1 (BF) or NN-BF vs. number of edges . 500 randomly generated graphs with 5,000 nodes were analysed for each case.

The worst case scenario would be to run the algorithms for iterations where is the number of graph nodes. However, in practice this will be not needed and significantly fewer iterations will usually suffice. Thus, we tested how many iterations are needed until cost convergence of the Bellman-Ford algorithm (version 1) as compared to activation convergence of the neural network. In both cases we stopped the respective algorithm, as soon as its node costs or neuron activations were not changing anymore.

For this analysis, we used 500 randomly generated graphs with 5,000 nodes with positive costs () from the range and connectivity densities corresponding to approximately , , , , , and to edges.

The results are shown in Fig. 5 where we show the maximal number of iterations obtained from 500 tested graphs until convergence. One can see that in case of sparse graphs 16 for BF and 15 for NN-BF iterations suffice for convergence and for denser graphs ( connectivity) only 3 iterations are needed. The resulting run-time per one iteration on a sparse graph with 5,000 nodes and 12,500 edges is 21 and 59 for BF (version 1) and NN-BF, respectively. Run-time on a fully connected graph with 24,995,000 edges is 26 and 14 for BF and NN-BF, respectively333C++ CPU implementation on Intel Core i9-9900 CPU, 3.10GHz, 32.0GB RAM.. Thus, relatively large graphs can be processed in less than .

4.2 Network learning and path planning

Note that the advantage of NN-BF is that it operates multiplicatively on activations. Hence, learning and path finding can rely on the same numerical representation. To achieve the same with BF (or any other additively operating algorithm) one would have to transform costs (for path finding) to activations (for network learning) back and forth. In the following we will show two examples of network learning combined with path finding, which shall serve as a proof of concept.

4.2.1 Navigation learning

Fig. 6: Results of navigation learning: (a) static environment and (b) dynamic environment. Blue dots correspond to locations represented by neurons, and black lines correspond to connections between neurons (paths between locations) where line thickness is proportional to the synaptic weight strength. Grey blocks denote obstacles. Green and red dots correspond to the start- and the end-point of the path marked by the red trajectory.

Results for navigation learning in a static environment are shown in Fig. 6(a), where we show the development of the network’s connectivity (black lines; line thickness is proportional to the weight strength) and the planned path (red trajectory) based on the network’s connectivity during the learning process. After 100 learning iterations, the environment was only partially explored. Therefore, the full path could not yet be planned. After 200 iterations, only a sub-optimal path was found, since the environment was still not completely explored. As exploration and learning process proceeds, connections between different locations get stronger (note thicker lines in the second row), and eventually the planned path converges to the shortest path (after 400 iterations).

In Fig. 6(b), we show results for navigation in a dynamic environment. Here, we first run the learning process until the path has converged (see the top row), and then remove the obstacle after 300 learning iterations. As expected, after obstacle removal, the middle part of the environment is explored and new connections between neurons are built. Eventually, as the learning proceeds, the output of the planning network converges to the shortest path (after 1200 iterations).

Due to the learning rule used, in both cases, systems converge to the shortest euclidean distance paths.

Fig. 7: Results of sequence learning: (a) learning of sequence , and (b) learning of sequence . Blue dots correspond to letters represented by neurons, and black lines correspond to connections between neurons where line thickness is proportional to the synaptic weight strength. Green and red dots correspond to the start- and end-point of the sequence marked by the red trajectory.

4.2.2 Sequence learning

Results for learning of the full sequence and of the partial sequence are presented in Fig. 6(a) and Fig. 6(b), respectively. As in the navigation learning example, we show the development of the network’s connectivity during the learning process. In case (a), we can see that before learning (epoch 0), due to random weight initialisation, the generated sequence is which is neither correct nor complete. However, after the first learning epoch we can see that the connection weight between neurons and was increased (note thicker line between and as compared to epoch 0), whereas the connection weight between neurons and was decreased, which then led to a different sequence . After epoch 3, the connection weight between neurons and was increased since in previously generated sequence (epoch 2) the transition from to was correct. Finally, after learning epoch 7 the network generates the correct sequence.

Similarly, in case of learning of the partial sequence (see Fig. 6(b)), learning converges to the correct sequence already after epoch 5, since the sequence is shorter.

5 Conclusion

Finding the optimal path in a connected graph is a classical problem and, as discussed in the section on state of the art, many different algorithms exist to address this. Hence, the question may arise, why one should be interested in ”yet another algorithm”? The main reason, as we would think, for translating Bellman Ford into a network algorithm is the now-arising possibility to directly use network learning algorithm on a path finding problem. To do this with BF or other cost-based algorithm, you would have to switch back and forth between a cost- and a weight-based representation of the problem. This is not needed for NN-BF. Given that BF and NN-BF have the same algorithmic complexity, computational speed of the path-finding step is similar, too, where learning epochs can be built into NN-BF as needed. The examples shown above rely on a local learning rule that alters the weights only between directly connected nodes. Globally acting rules could, however, also be used to address different problems, too, and – as far as we see – there are no restrictions with respect to possible network learning mechanisms, because BF-NN is a straight-forward architecture for activity propagation without any special limitations.

Hence, network algorithm with NN-BF allows for new and different applications where learned path-augmentation is directly coupled with path finding in a natural way.

6 Appendix

Fig. 3 above had shown how complex the behavior of is and three individual cases may here be instructive to better appreciate this. For example, consider the following three cases of two paths A and B with three nodes each, where we define path costs by:

  1. Path A: 2, 2, 2
    Path B: 2, 2, 2+

  2. Path A: 1, 3, 2
    Path B: 2, 2, 2+

  3. Path A: 2, 2, 2
    Path B: 1, 3, 2+

In all three cases sums over A (and B) are the same and , where we are interested in what happens for small positive values of . One can now calculate for Case 1 that will suffice independently of , for Case 2, on the other hand, we get , while for Case 3 the -dependence vanishes again, albeit with a more complex solution. For case 3 we get: for small one can ignore the under the root and gets .

Hence, in spite of the fact that all cases have the same cost sums, only Case 2 requires large values of .


The research leading to these results has received funding from the European Community’s H2020 Programme (Future and Emerging Technologies, FET) under grant agreement no. 899265 (ADOPD), and the Volkswagen Foundation (IDENTIFIED).


  • [1] M. K. M. Ali and F. Kamoun (1993) Neural networks for shortest path computation and routing in computer networks. IEEE Transactions on Neural Networks 4 (6), pp. 941–954. Cited by: §2.1.
  • [2] F. Araujo, B. Ribeiro, and L. Rodrigues (2001) A neural network for shortest path computation. IEEE Transactions on Neural Networks 12 (5), pp. 1067–1073. Cited by: §2.1.
  • [3] Y. Ariki and T. Narihira (2019) Fully convolutional search heuristic learning for rapid path planners. arXiv:1908.03343. Cited by: §2.1.
  • [4] A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel, M. J. Chadwick, T. Degris, J. Modayil, et al. (2018) Vector-based navigation using grid-like representations in artificial agents. Nature 557 (7705), pp. 429–433. Cited by: §2.1.
  • [5] J. S. Baras and G. Theodorakopoulos (2010) Path problems in networks. Synthesis Lectures on Communication Networks 3 (1), pp. 1–77. Cited by: §1, §3.1.
  • [6] R. Bellman (1958) On a routing problem. Quarterly of applied mathematics 16 (1), pp. 87–90. Cited by: §1, §2.1, §3.1.
  • [7] M. J. Bency, A. H. Qureshi, and M. C. Yip (2019) Neural path planning: Fixed time, near-optimal path generation via oracle imitation. CoRR abs/1904.11102. External Links: 1904.11102 Cited by: §2.1.
  • [8] N. Bin, C. Xiong, Z. Liming, and X. Wendong (2004) Recurrent neural network for robot path planning. In Int. Conf. on Parallel and Distributed Computing: Applications and Technologies, pp. 188–191. Cited by: §2.1.
  • [9] E. W. Dijkstra (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1 (1), pp. 269–271. Cited by: §2.1.
  • [10] R. W. Floyd (1962) Algorithm 97: shortest path. Communications of the ACM 5 (6), pp. 345. Cited by: §2.1.
  • [11] L. R. Ford (1956) Network flow theory. Technical report Rand Corp Santa Monica Ca. Cited by: §1, §2.1, §3.1.
  • [12] R. Glasius, A. Komoda, and S. Gielen (1995) Neural network dynamics for path planning and obstacle avoidance. Neural Networks 8 (1), pp. 125–133. Cited by: §2.1.
  • [13] R. Glasius, A. Komoda, and S. Gielen (1996) A biologically inspired neural net for trajectory formation and obstacle avoidance. Biological Cybernetics 74 (6), pp. 511–520. Cited by: §2.1.
  • [14] J. Häkkinen, M. Lagerholm, C. Peterson, and B. Söderberg (1998) A Potts neuron approach to communication routing. Neural Computation 10 (6), pp. 1587–1599. Cited by: §2.1.
  • [15] D. D. Harabor and A. Grastien (2011) Online graph pruning for path finding on grid maps.. In AAAI, pp. 1114–1119. Cited by: §2.1.
  • [16] P. E. Hart, N. J. Nilsson, and B. Raphael (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics 4 (2), pp. 100–107. Cited by: §2.1.
  • [17] D. O. Hebb (2005) The organization of behavior. New York: Wiley & Sons. Cited by: §1, §3.4.
  • [18] D. B. Johnson (1977) Efficient algorithms for shortest paths in sparse networks. Journal of the ACM (JACM) 24 (1), pp. 1–13. Cited by: §2.1.
  • [19] S. Koenig, M. Likhachev, and D. Furcy (2004) Lifelong planning A*. Artificial Intelligence 155 (1-2), pp. 93–146. Cited by: §2.1.
  • [20] R. E. Korf (1985) Depth-first iterative-deepening: An optimal admissible tree search. Artificial Intelligence 27 (1), pp. 97–109. Cited by: §2.1.
  • [21] T. Kulvicius, S. Herzog, T. Lüddecke, M. Tamosiunaite, and F. Wörgötter (2021) One-shot multi-path planning using fully convolutional networks in a comparison to other algorithms. Frontiers in Neurorobotics 14, pp. 115. Cited by: §2.1.
  • [22] T. Kulvicius, S. Herzog, M. Tamosiunaite, and F. Wörgötter (2021) Finding optimal paths using networks without learning–unifying classical approaches. IEEE Transactions on Neural Networks and Learning Systems (), pp. 1–11. Cited by: §2.1.
  • [23] L. Kuśmierz, T. Isomura, and T. Toyoizumi (2017) Learning with three factors: modulating hebbian plasticity with errors. Current opinion in neurobiology 46, pp. 170–177. Cited by: §1, §3.4.1.
  • [24] D. Merrill, M. Garland, and A. Grimshaw (2012) Scalable GPU graph traversal. Acm Sigplan Notices 47 (8), pp. 117–128. Cited by: §2.1.
  • [25] E. F. Moore (1959) The shortest path through a maze. In International Symposium on the Theory of Switching, pp. 285–292. Cited by: §2.1.
  • [26] J. Ni, L. Wu, P. Shi, and S. X. Yang (2017) A dynamic bioinspired neural network based real-time path planning method for autonomous underwater vehicles. Computational intelligence and neuroscience 2017. Cited by: §2.1.
  • [27] A. I. Panov, K. S. Yakovlev, and R. Suvorov (2018) Grid path planning with deep reinforcement learning: Preliminary results. Procedia computer science 123, pp. 347–353. Cited by: §2.1.
  • [28] D.-C. Park and S.-E. Choi (1998) A neural network based multi-destination routing algorithm for communication network. In 1998 IEEE International Joint Conference on Neural Networks, Vol. 2, pp. 1673–1678. Cited by: §2.1.
  • [29] N. Pérez-Higueras, F. Caballero, and L. Merino (2018) Learning human-aware path planning with fully convolutional networks. In 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1–6. Cited by: §2.1.
  • [30] B. Porr and F. Wörgötter (2003) Isotropic sequence order learning. Neural Computation 15 (4), pp. 831–864. Cited by: §3.4.
  • [31] B. Porr and F. Wörgötter (2007) Learning with “relevance”: using a third factor to stabilize hebbian learning. Neural computation 19 (10), pp. 2694–2719. Cited by: §1, §3.4.
  • [32] H. Qu, S. X. Yang, A. R. Willms, and Z. Yi (2009) Real-time robot path planning based on a modified pulse-coupled neural network model. IEEE Transactions on Neural Networks 20 (11), pp. 1724–1739. Cited by: §2.1.
  • [33] A. H. Qureshi, M. J. Bency, and M. C. Yip (2018) Motion planning networks. CoRR abs/1806.05767. External Links: 1806.05767 Cited by: §2.1.
  • [34] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling (2018) Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Cited by: §2.1.
  • [35] X. Sun, S. Koenig, and W. Yeoh (2008) Generalized adaptive A*. In Proc. of the 7th Int. J. Conf. on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’08, pp. 469–476. Cited by: §2.1.
  • [36] L. Tai, G. Paolo, and M. Liu (2017) Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In 2017 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 31–36. Cited by: §2.1.
  • [37] K. Teru, E. Denis, and W. Hamilton (2020) Inductive relation prediction by subgraph reasoning. In

    International Conference on Machine Learning

    pp. 9448–9457. Cited by: §2.1.
  • [38] P. Velickovic, R. Ying, M. Padovano, R. Hadsell, and C. Blundell (2020) Neural execution of graph algorithms. In International Conference on Learning Representations, pp. 1–14. Cited by: §2.1.
  • [39] S. Warshall (1962) A theorem on Boolean matrices. Journal of the ACM (JACM) 9 (1), pp. 11–12. Cited by: §2.1.
  • [40] S. X. Yang and M. Meng (2001) Neural network approaches to dynamic collision-free trajectory generation. IEEE Trans. on Systems, Man, and Cybernetics, Part B (Cybernetics) 31 (3), pp. 302–318. Cited by: §2.1.
  • [41] Y. Yang, T. Liu, Y. Wang, J. Zhou, Q. Gan, Z. Wei, Z. Zhang, Z. Huang, and D. Wipf (2021) Graph neural networks inspired by classical iterative algorithms. In International Conference on Machine Learning, pp. 1–11. Cited by: §2.1.
  • [42] M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31, pp. 5165–5175. Cited by: §2.1.
  • [43] Z. Zhu, Z. Zhang, L. Xhonneux, and J. Tang (2021) Neural bellman-ford networks: a general graph neural network framework for link prediction. In Neural Information Processing Systems, pp. 1–15. Cited by: §2.1.