Q-learning for distributed routing in LEO satellite constellations
End-to-end routing in Low Earth Orbit (LEO) satellite constellations (LSatCs) is a complex and dynamic problem. The topology, of finite size, is dynamic and predictable, the traffic from/to Earth and transiting the space segment is highly imbalanced, and the delay is dominated by the propagation time in non-congested routes and by the queueing time at Inter-Satellite Links (ISLs) in congested routes. Traditional routing algorithms depend on excessive communication with ground or other satellites, and oversimplify the characterization of the path links towards the destination. We model the problem as a multi-agent Partially Observable Markov Decision Problem (POMDP) where the nodes (i.e., the satellites) interact only with nearby nodes. We propose a distributed Q-learning solution that leverages on the knowledge of the neighbours and the correlation of the routing decisions of each node. We compare our results to two centralized algorithms based on the shortest path: one aiming at using the highest data rate links and a second genie algorithm that knows the instantaneous queueing delays at all satellites. The results of our proposal are positive on every front: (1) it experiences delays that are comparable to the benchmarks in steady-state conditions; (2) it increases the supported traffic load without congestion; and (3) it can be easily implemented in a LSatC as it does not depend on the ground segment and minimizes the signaling overhead among satellites.
READ FULL TEXT