I Introduction
Companies such as Facebook, Google, Amazon, Waze and Garmin are just some examples of corporations that have built successful service delivery platforms using personalised data to develop recommender systems. While products gleaned from data mining of personal information have without doubt delivered a great societal value, they also have given rise to a number of ethical questions that are causing a fundamental revision of how data is collected and managed. Some of the most pressing ethical issues include:
-
preservation of individuals’ privacy (including GDPR compliance);
-
the ability for individuals to retain ownership of their own data;
-
the ability for consumers and regulatory agencies alike to confirm the origin, veracity, and legal ownership of data, products and services;
-
and protection against misuse by malevolent actors.
It is in this context that distributed ledger technology (DLT) has much to offer. For example, it is well known that the use of
technologies such as blockchain has been proved beneficial to alleviate, or even eliminate, some of
these above considerations. Consequently, our objective in this paper is
to design one such system; we are particularly interested in developing a DLT that support the design and realization of
crowdsourced collaborative recommender systems to support a range of mobility applications for smart cities. As
it will be seen, this objective is challenging for a number of reasons.
First, from the perspective of the basic distributed ledger design, we are interested in
a system able to support high-frequency micro-transactions of the type required
to support the rapid exchange of information between the multitude of IoT enabled devices found
in cities. Second, as the DLT must support multiple control actions and recommendations in real-time, transaction
times should be fast with low or zero transaction fees.
Finally, the DLT should penalize malevolent actors who attempt to spam the system or lie to
attack the design of any recommender system based on the DLT.
It should be noted that wrapping a DLT layer around personal information will
fundamentally change the business model of many companies. Many corporations currently monetize recorded personal data with no explicit reward
returned to the owner of such data (other than personalized recommendations or free access to products
in return for the collected data). If such data is no longer
available free of charge to these corporations, that will surely jeopardize existing business models. In future,
most data will be privately held and not available in a public manner, and companies seeking to develop services, will need to
purchase this data to sample an unknown density. In this context, a fundamental question therefore is how to
do this at minimum cost, as quickly as possible, given some desired level of accuracy (e.g., a minimum quality of service).
Given this background, a fundamental requirement is to develop a set of tools to enable such
companies to sample these large data sets, secured in a distributed ledger, in an economic manner.
A second challenge arises from the design of recommender systems itself. In many important
applications, the development of complex decision making tools is inhibited by difficulties in
interpreting large-scale, aggregated data sets. This difficulty stems from the fact that data sets
often represent closed-loop situations, where actions taken under the influence of decision
support tools (i.e. recommenders), or even due to probing of the environment as a part of
the model building,
affect the environment and consequently the model building itself. Recently a number of papers
have appeared
highlighting the problem of recommender design in closed loop [Lazer2014, Sinha_NIPS17, Shorten_IEEETech16, Bottou, jonathan, roman]. Even in cases when there is a separation between
the effect of a
recommender and its environment, the problem of recommender design is complex in many real world
settings due to the challenge of sampling and obtaining real time data at low cost.
In this paper we bring both of the above problems together in one framework. In particular, we consider the problem of sampling an unknown density representing traffic flow in a city, constituted by secured data points, using a DLT type architecture, without perturbing the density through probing action. Specifically, we will use reinforcement learning (RL) [RN229] to sample the density in order to build a model of the environment. However, while classical RL is usually not applicable for this purpose in many smart city applications due to its long training time, and due to the disruptive effects of probing, we shall demonstrate how the use of DLT allows us to achieve rapid probing actions without affecting the environment, and also enabling individuals to not only retain ownership of their own data but also be rewarded for contributing to the RL algorithm.
Ii Related work
Our work brings together ideas from many areas. DLT is a term that describes blockchain and a suite of related technologies. From a high perspective, the DLT is nothing more than a ledger held in multiple places, and a mechanism for agreeing on the contents of the ledger—namely the consensus mechanism. Since blockchain was first introduced in Nakamoto’s white paper in 2008 [nakamoto2008bitcoin], the technology has been used primarily as an immutable record keeping tool that enables financial transactions based on peer-to-peer trust [puthal2018everything, conoscenti2016blockchain, zheng2017overview, banerjee2018blockchain, yli2016current]. Architectures such as blockchain operate a competitive consensus mechanism enabled via mining (Proof-of-Work), whereas architectures such as the IOTA Tangle [wang2018survey] based on graph structures often operate a cooperative consensus technique. In this work, we will use IOTA DLT. Our interest in IOTA stems from the fact that its architecture is designed to facilitate high-frequency microtrading. In particular, the architecture places a low computational and energy burden on devices using IOTA, it is highly scalable, there are no transaction fees, and transactions are pseudo-anonymous [popov2017equilibria]. In terms of mobility applications, we note that several DLT architectures have already been proposed. Recent examples include [carnet, towards, bc]
and the references therein. To the best of our knowledge, our work is the first using a Directed Acyclic Graph (DAG) structure, namely the IOTA Tangle, to support distributed machine learning (ML) algorithms.
In terms of ML, we borrow heavily from RL and Markov Decison Processes (MDPs), and in particular,
crowdsourced ML. The literature on MDPs and RL algorithms is vast and we simply point the reader to the
recent publications [jonathan, Epperlein2018, Krumm2008, SimmonsBrowningZhangEtAl2006], in which
some of this work is discussed. With specific regard to RL and mobility, some applications are presented
in [work15, work18, work17, work19, work16]. As in our previous work [roman], we exploit the
idea of using crowdsourced behavioural experience to augment the training of ML algorithms (see recent
survey for an overview of this area in [Vaughan2018]). As it is also mentioned in [roman], our
work also strong links to adaptive control [n1]. The idea of augmenting offline models with
adaptation is discussed extensively in the recent multiple-models, switching, and tuning
paradigm [n2].
Finally, it is worth mentioning that we are ultimately interested in the design on recommender systems that account for feedback effects in smart city applications. In [florian, bei, arieh], different information is sent to different agents in an attempt to mitigate closed-loop effects. An alternative, more formal, approach is presented in [jonathan]
. There, the authors attempt the identification of a smart city system from closed-loop data sets. In particular, the authors present a Tikhonov regularization procedure for estimating parameters of a closed-loop Markov-Modulated Markov Chain, which consists of two Markov Chains: (i) a chain whose state is visible, and whose transition probabilities are modulated by (ii) a second Markov Chain whose state is hidden and whose transition probabilities can in turn also be modulated. Similar issues have drawn interest from various domains including economics
[HalVarian_NASUS17], recommender systems [Cosley2003, Sinha_NIPS17], physiology [Gollee11], and control engineering in the context of Smart Cities [Shorten_IEEETech16].Iii A distributed ledger for crowdsourced smart mobility - SPToken
Iii-a Design objectives
Our intent is to design a DLT-based system for crowdsourcing in a smart mobility environment. In
particular, we explore how to apply this framework to a RL setting where a third party is interested in
acquiring information from vehicles in order to solve an optimization problem.
The underlying idea is to use a set of virtual vouchers or tokens as a proxy to indicate specific
points of interest that algorithms might be interested in investigating. In RL algorithms, for example,
we are interested in maximizing the expected reward (relative to an objective function) for taking a
specific route across a city. To make this process clearer, consider the following example. Figure
1 shows an instance of a typical scenario where two junctions and are connected to
one another through the road segment . At time , a vehicle updates the ledger
with some information (e.g., pollution levels, travel time) and registers the last visited intersection
(, in this example). Intuitively, this can be depicted as if the vehicle left the aforementioned
token at junction . Then, a new vehicle passing via junction and directed to junction ,
can “collect” this token and, as it passes by junction at time , it updates the ledger
with new information regarding this route link and the new position of the token.
It is noteworthy that a car “deposits” a token when it deviates from the token route. Additionally,
any new car that passes via junction and is addressed along the token route will be able to collect
the token and the procedure is repeated for a new road segment.
The concept of using tokens to mark specific points where measurements are needed perfectly conforms with a DLT-based system. In fact, it is natural to use distributed ledger transactions to update the position of the tokens and to link them to the points of interest, and associated data, using transactions (this can be done, for example, using smart sensors at various junctions linked to digital wallets, as shown in Figure 1). Of course, the design of such a network poses a number of challenges that need to be addressed:
-
Privacy: In the DLT, transactions are pseudo-anonymous111https://laurencetennant.com/papers/anonymity-iota.pdf. This is due to the cryptographic nature of the addressing, which is less revealing than other forms of digital payments that are uniquely associated with an individual [2019arXiv190107302F]. Thus, from a privacy perspective, the use of DLT is desirable in a smart mobility scenario.
-
Ownership: Transactions in the DLT can be encrypted by the issuer, thus allowing every agent to maintain ownership of their own data. In the aforementioned setting, the only information required to remain public is the current ownership of the tokens.
-
Microtransactions: Due to the amount of vehicles in the city environment, and also due to the need of linking the information to real time conditions (such as traffic or pollution levels), there is the demand for a fast and large data throughput.
-
Resilience to Misuse: The system must be resilient to attacks and misuse from malevolent actors. Typical examples include double spending attacks, spamming the system, or writing false information to the ledger. All these instances can be greatly limited by a combined use of a consensus system based on Proof-of-Work (PoW) and Proof-of-Position (PoP), which will be described in the next section.
![]() |
![]() |
To meet all the design objectives described above, in the next section we propose Spatial Positioning Token (SPToken), a permissioned distributed ledger based on the IOTA Tangle.
Iii-B The Tangle and the Proof of Position
As discussed above, we are interested in building Tangle-based, a particular DLT architecture that
makes use of Directed Acyclic Graphs (DAGs) to achieve consensus about the shared ledger. A DAG is a
finite connected directed graph with no directed cycles. In other words, in a DAG there is no directed
path that connects a vertex with itself. The IOTA Tangle is a particular instance of a DAG-based DLT
[popov2017equilibria], where each vertex or site represents a transaction, and where the
graph, with its topology, represents the ledger. Whenever a new vertex is added to the Tangle, this
must approve a number of previous transactions (normally two). An approval is represented by a
new edge added to the graph. Furthermore, in order to prevent malicious users from spamming the network,
the approval step requires a small PoW. This step is less computationally intense than its blockchain
counterpart [banerjee2018blockchain] and can be easily carried out by common IoT devices, but still
introduces some delay for new transactions before they are added to the Tangle. Refer to Figure
2 for a better understanding of this process.
The Tangle architecture has the advantage over blockchain to allow microtransactions without any fees
(as miners are not needed, in order to reach consensus over the network [2019arXiv190107302F]),
which makes it ideal in an IoT setting as it is described in the previous section. Moreover, the Tangle
fits perfectly with the concept of multiple tokens being transferred from one location to another as its
DAG structure makes it natural to describe such a process.
Unlike the Tangle, in which each user has complete freedom on how to update the ledger with transactions, the SPToken network has a regulatory policy in order to prevent agents to add transactions that do not possess any relevant data (since transactions are encrypted). Therefore, as a further security measure, SPToken makes use of PoP to authenticate transactions. In other words, for a transaction to be authenticated, it has to carry proof that the agent was indeed in an area where a token was available. This is achieved through PoP via special nodes called Observers
. Each observer is linked to a physical sensor in a city. A sensor can be a fixed piece of infrastructure, or a vehicle which position is verified. Whenever a car passes by an observer that is in possession of a token, a short range connection is established (e.g., via Bluetooth) and the token is transferred to the vehicle’s account. To deposit the token and to issue a transaction containing data, the agent needs to pass by another observer to establish a short range connection (note that not every observer is available to establish a connection at every moment). To avoid users to hoard tokens, they should be automatically returned if not used after a certain period of time. See Figure
1 for a better understanding of this process. This process ensures that vehicles have to be physically in the interested locations to be able to issue transactions. This further authentication step makes SPToken a permissioned Tangle (similar to permissioned blockchains [puthal2018everything]), i.e., a DAG-based distributed ledger where a certain amount of trusted nodes (the observers, in this case) is responsible to maintain the consistency of the ledger (as opposed to a public one, where security is handled by a cooperative consensus mechanism [popov2017equilibria]).Furthermore, an additional PoW step can be introduced into the network to ensure that multiple vehicles (each with tokens) compete with each other to write to the ledger. In this context, instead of an observer issuing a single token, a number of virtual tokens is issued to appropriate vehicles. Once each of these vehicles completes a physical PoP step (for example, traversing a segment), they then compete to write to the ledger via PoW. While a full discussion of this is beyond the scope of the present paper, it is worth noting that this procedure would make it extremely expensive for dishonest actors to write biased data to the ledger (in a manner similar to the blockchain mining mechanism).
![]() |
![]() |
Iv Application example - Reinforcement learning over SPToken
Our objective now is to implement a RL strategy using the token architecture
described in the previous section. Specifically, instead of using vehicles as RL agents [roman] to
probe an unknown density, we use tokens passing between vehicles to effectively create virtual agents and
emulate the behaviour of agents designed to probe the environment. Formally, we employ a modified version
of the recently proposed model-based MDP learning algorithm called Upper Bounding the
Expected Next State Value (UBEV) [ubev]. UBEV involves a combination of backward
induction with maximum likelihood estimation to (i) construct optimistic empirical estimates of state
transition probabilities, (ii) assign empirical immediate reward, and (iii) compute optimal policy.
In fact, our design of the action space allows us to avoid estimating the transition probabilities,
which significantly reduces the training time. Effectively, the algorithm learns only the reward function
which describes the environment (e.g., traffic patterns in a city).
Since the training time is a common disadvantage of RL algorithms, we propose to launch independent tokens, which act as virtual vehicles and use the same policy to explore different areas of a city. Further details of the proposed approach together with the corresponding experimental assessment are provided in the following sections. In particular, we experimentally assess:
-
how fast the system learns to avoid traffic jams,
-
how quickly the system returns to the shortest path policy once the traffic jams clear up, and
-
how the training time depends on the number of independent tokens.
Essentially, the UBEV algorithm [ubev]
performs a standard expectation-maximization trick. Namely, it first fixes the state transition probabilities of the MDP and the expected reward estimates, and uses backward induction to design the optimal deterministic policy in the feedback form which maximizes the expected reward. Next, this policy is used to “probe” the environment, and the statistics collected over the course of probing are used to update transition probabilities by employing a standard “frequentist” maximum likelihood estimator
[HTF], which simply computes the frequencies of transitioning from one state to another subject to the current action (that can be a function of the current state). Then, the optimal policy (for the updated estimates of the transition probabilities and reward) is recomputed again. This procedure is treated as an episode of the training process and is iterated until convergence (as demonstrated in [ubev]).Iv-a Modified UBEV algorithm
We now present the Modified UBEV (MUBEV) algorithm. Recall that an MDP is a discrete stochastic model defined by a tuple , where
-
is the set of actions, and is the number of actions,
-
is the set of states, and is the number of states,
-
is the probability of transition from state under action to state ,
-
is the reward of choosing the action in the state .
The trajectory of the MDP is defined as follows: it is assumed that , i.e., the state at time is drawn from a distribution P which depends on , and . In this case, the expected reward associated to the policy is defined in this fashion:
(1) |
where is the distribution of the initial state, and is defined as follows:
(2) |
The goal of MDP is to maximize the expected reward (1), and the optimal MDP policy, i.e., the policy maximizing Equation (1), is calculated through the backward induction process given by:
(3) |
We are now in a position to present the MUBEV algorithm. Algorithm 1 represents
a modified version of the UBEV algorithm as a result of adapting the original UBEV algorithm for its
use in the context of our target problem, which includes the following modifications to the UBEV algorithm.
First of all, we use a
specific type of the action space, namely we apply the following actions: 'turn
left', 'turn right', 'go
straight', and 'stay in the same state'. This allows us to
provide the algorithm with the set of predefined transition probabilities. Specifically, for each action
, the corresponding transition probability matrix has rows with zero elements but one at the
location of the state, which represents a road link where the agent jumps from the current state provided
the action was taken. For example, if action is 'turn left', then for
every road link (state) there is just one “utmost left” road link, and so the probability of transitioning
to the corresponding state is . As a result, it is not required to learn the transition probabilities, which
is a significant advantage especially for large road networks. Second, at the beginning of the training,
there is a little or no information of the reward distribution, and the algorithm rather explores than
exploits. For instance, it assigns the optimal policy “randomly”: if all the components of the
-function are equal, i.e., (Algorithm 1, line 22), the original algorithm always selects
the first component of (as per line 25). In other words, it probes the environment without any
preference in term of the direction of the exploration. In contrast, we force it to stick to the shortest
path policy in the case , so that it explores the surrounding along the shortest route, and
gathers the corresponding reward statistics along that route. Once it “faces” a traffic jam after a
certain action , it gets delayed, which in turn introduces the negative reward for the action at
state . As a result, the reward distribution changes, and the shortest path policy is amended to avoid
the jam by looking for a detour. By operating in this fashion, we sample along near optimal trajectories
which has also a practical value. Third, we aim to launch multiple participating tokens always starting at
different (randomly sampled) origins and having the same destination. All these tokens follow the same
policy, and the corresponding statistics are then used to update the expected reward. Consequently,
learning and adaptation happen more rapidly. Finally, we propose a stationary
model of the MDP with (i) the exchange of collected reward
and statistics (Algorithm 1, lines 32-33) between agents (tokens),
and (ii) the contribution of such data to the recommender
system.
Notation for MUBEV and the Reward Function. In Algorithm 1 we have: is the set of states, where a state corresponds to a road link (edge) in a SUMO network; is the set of actions; and denote cardinality of finite sets and respectively; is the length of the MDP’s time horizon; P is an array of predefined transition probabilities; is the shortest path policy; is the number of MUBEV tokens; is the failure probability (see [ubev] for details); is the number of actions taken from state at time ; is accumulated reward from state under action at time ; is the value function from time step for state ; is the Q-function for the appropriate state, action and time [ubev]. Initial values of elements in arrays , , and are zeros for all . Failure tolerance is scaled by . is the maximum reward that the agent can receive per one transition; is the maximum value for next states. and
denote vectors of length
, and is interpreted as a vector of length . is the width of the confidence bound [ubev]; is the Euler’s number; is normalized reward from state under action at time ; and are auxiliary variables. Vector is a vector of initial states of MUBEV tokens, which is uniformly sampled in range from to with no repeated entries. The agents (tokens) interact with the environment each time step , and receive reward determined by the reward function defined in Function 1.Concerning Function 1, it returns total reward, i.e., distance reward plus time reward, at time . Additionally: is actual travel time on an edge that corresponds to state ; is a scale factor that increases minimum travel time on an edge due to traffic uncertainties; is a parameter used for faster learning of congestions; and are the weights of distance and time reward, respectively; is the absolute value of penalty given to the agent if it takes impossible actions during the learning process or when it leaves the destination; is the shortest route length from state to the destination state ; is the edge length that corresponds to state . Finally, is the duration of yellow and red phases of a traffic light signal (TLS) that controls edge (state) ; if an edge is not controlled by a TLS, we apply for that state. If some edges are not controlled by traffic light signals, we employ the edge coefficient for them (Function 1, lines 13-14) which is computed in this fashion: if the length of an edge that corresponds to state is smaller than the average edge length , then , otherwise .
V Numerical simulations
In the following application, we are interested in designing a recommender system for a community of road users.
We distribute a set of MUBEV tokens so that the uncertain environment can be ascertained. These tokens are passed
from vehicle to vehicle using the DLT architecture described in Section 3. Specifically, in what follows tokens are passed from one vehicle to
another in a manner that emulates a vehicle probing an unknown environment. The token passing is
determined both by MUBEV and DLT and can be orchestrated using a cloud-based service. Cars possessing a token are permitted to compete to write data
to the DLT. We refer to such vehicles as virtual MUBEV vehicles. In this way, the token passing emulates the behaviour of a real agent (vehicle)
that is probing the environment. Once the environment has been learnt, it is communicated to the community via some messaging service.
For the experimental evaluation of our proposed approach we designed a number of complex numerical experiments, based on traffic scenarios implemented with the open source traffic simulator SUMO [sumo]. Interaction with running simulations is achieved using Python scripts and the SUMO packages TraCI and Sumolib. The general setup used in our simulations is as follows:
-
In all our experiments, we make use of the area in Barcelona, Spain shown in Figure 3.
-
A number of roads are selected as origins, destinations, and sources of congestion. Experiment 1 uses the set {Origin 1, Congestion 1, Destination 1}, while the Experiment 2 and 3 use {Origin 2, Congestion 2, Destination 2}.
-
In all simulations we use a new vehicle type based on the default SUMO vehicle type222https://sumo.dlr.de/wiki/Definition_of_Vehicles_Vehicle_Types_and_Routes with maxspeed=118.8 km/h and impatience=0.5. To generate traffic jams, we modify the maximum speed of certain cars to be 6.12 km/h and populate the selected roads with them. When these vehicles are in possession of a token, they become virtual MUBEV vehicles.
-
Whenever required, shortest path is calculated with SUMO using the default routing algorithm (dijkstra).
-
We refer to a token trip as a RL episode.

Concerning the design parameters of the reward funcion and the MUBEV algorithm, in all our experiments we set , , , and tuned the other design parameters as follows: , , . The specific setup for each individual experiment will be described in the corresponding subsection below.
V-a Experiment 1: Optimal route estimation under uncertainty
The purpose of the first experiment is to evaluate the performance of our approach for the estimation
of optimal routes under uncertainty, and for this we first remind the reader
the general operation
of our approach. Over a given episode, a number of tokens
follows the system recommendations, and when these tokens reach their destination,
another set of tokens takes over, and over every
consecutive episode MUBEV updates tuning data from each virtual MUBEV car.
For the purpose of this present discussion, we use a token over each episode of learning, meaning that, over each episode, data from the token is used to update the MUBEV policy. For this, the MUBEV token has a fixed origin-destination (OD) pair given by {Origin 1, Destination 1}, and we select the road section labeled as Congestion 1 (which belongs to the shortest path for the selected OD pair) to generate a traffic jam on it at different intervals (see Figure 3). Then, over each new episode we start the token from Origin 1 and ask it to travel to Destination 1, keeping a record of its performance in terms of travel distance (route length) and travel time regardless of its success. Additionally, a token has a maximum number of allowed links (defined by MDP’s time horizon) that it can traverse, and if it does not reach its destination within this restriction, then the token trip is declared incomplete. The results for this experiments are shown in Figure 4, from which we can draw some important conclusions:
-
In general, we can see that the token succeeds in both avoiding traffic jam once congestion is created, and returning to shortest path once congestion is removed, using a reasonably small number of episodes (see Figure 4 bottom).
-
As time passes, more information (statistics) is collected from the environment in the form of reward, and the token is more likely to fully complete a trip for the given OD pair (i.e., fewer red crosses as the experiment progresses in Figure 4).

These two observations validate our expectations about the UBEV-based routing system: (i) it is able to adapt to uncertain environments, and (ii) its performance improves as time passes. It is worth noting that this experiment is useful to analyse the performance of a single token in the learning process from the environment using a fixed OD pair. Note that once the environment has been determined, the recommendations gleaned from the environment can be made available to the wider community of vehicles. We explore this in the following experiment.
V-B Experiment 2: Route recommendations from the UBEV-based system and speedup in learning
The previous experiment is an absolutely simple demonstration of the successful use of UBEV in a mobility
context. We now explore a scenario where multiple tokens, starting from different origins, are used
to update MUBEV policy over each episode. Specifically, in the next experiment, we evaluate the
performance of MUBEV as a function of the number of tokens over each episode, subject to a
uniform geographical distribution of origins and a common destination (namely, Destination 2).
Additionally, we analyze the performance of a (non-MUBEV) car trying to reach Destination 2 from
the given fixed Origin 2, using a recommendation from a simplistic UBEV-based routing system. In
this case, the initial recommendation is shortest path, and further
recommendations come from the MUBEV recommender system for the OD pair {Origin 2,
Destination 2}. In addition, in this case, if a complete route cannot be calculated using the
MUBEV recommender system, then the most recent valid recommendation is reused. The results for this
experiments are depicted in Figures 5 and Figure 6.


In Figure 5, it can be observed that the number of participating tokens directly affects the convergence rate of the algorithm. As expected, the more tokens are involved, the faster the learning process. From Figure 6, we can notice the relationship between the number of participating MUBEV cars and the number of episodes are required to learn a new given traffic condition (either congestion or free traffic).
V-C Experiment 3: Comparative analysis
Finally, the third experiment was designed to compare the performance of our UBEV-based approach with
respect to a reference solution: shortest path (SP) routing. It is worth mentioning that this reference
solution is widely used by a variety of route recommenders, and so the proposed comparison is reasonable.
Therefore, for this experiment, we
use two test cars, one of which uses recommendations from our MUBEV recommender system in
Experiment 2, and the other uses SP policy all the time, both for the OD pair {Origin 2,
Destination 2}, and with congestion on road link Congestion 2 on a given interval. Results for
this experiments are shown in Figure 7, in which we can observe that the
performance of our UBEV-based approach is similar to SP routing under free-traffic conditions, but it
clearly outperforms SP (in terms of total travel time) once traffic jam is introduced. It is also clear
that a route different than the SP route implies longer travel distance (as seen in Figure
7 bottom), but this is ultimately negligible for an end user as long as the
resulting travel time is shorter than the one using SP.

Vi Conclusion and Outlook
We introduced a distributed ledger technology design for smart mobility applications. The objectives of the DLT are: (i) preserving the privacy of the individuals, including General Data Protection Regulation (GDPR) compliance; (ii) enabling individuals to retain ownership of their own data; (iiii) enabling consumers and regulatory agencies alike to confirm the origin, veracity, and legal ownership of data, products and services; and (iv) securing such data sets from misuse by malevolent actors. As a use case of the proposed DLT, we successfuly presented a blockchain-supported distributed RL algorithm to determine an unknown distribution of traffic patterns in a city.
Acknowledgements
This work was partially supported by SFI grant 16/IA/4610.
Comments
There are no comments yet.