1. Introduction
The rapid expansion of urban traffic, with the slow growth of traffic resources, has led to the serious and growing traffic congestion. According to a Baidu traffic report (baidu), the commuting index has raised to 1.973 during rush hours in Beijing. Traffic congestion poses a great threat to traffic safety and also brings losses to the urban economy. Fortunately, previous works have proved that reasonable traffic scheduling can improve the traffic efficiency with less consumption. In (wang2017data), a datadriven optimization algorithm for bus system is proposed, which reduces the average waiting time of citizens. A bikesharing scheduling system (ghosh2016robust) proposes an online and robust framework to minimize the loss of customers.
However, there are still two disadvantages that restrict the further improvement of efficiency: (1) typical bike scheduling, e.g, Liu et al (liu2016rebalancing), proposes a hierarchical optimization model for rebalancing by exploring multisource data. But they only consider single planning in a short time, ignores analyzing the situation after first planning. A common example is in Figure 1 (a). There are three bikesharing stations (, and ) without available bikes. We assume that 10 customers ride from to , and 15 customers ride from to during . From to , 10 customers ride from to . And only 10 bikes can be dispatched before . According to the greedy strategy, during , will be assigned to 10 bikes. But with consideration of the dynamic flow, 10 bikes should be moved to A, as it can finally serve 20 customers. Therefore, if the secondary planning can be carried out, the local greedy problem could be alleviated. (2) Current works (ghosh2016robust; wang2017data; liu2016rebalancing; li2018dynamic) only adopt single modes of transport, while ignoring the multimodal characteristics of urban public transport. For example, as shown in Figure 1 , (b) indicates the normal operation of the bus, and (c) indicates that when the bus is unavailable, the system can automatically move the shared bikes to replace the bus.
Joint scheduling for long periods is usually difficult. Human activities are social and uncertain, which may result in an extremely imbalance between supply and demand of the traffic. Moreover, it increases the difficulties of traffic scheduling: (1) The accuracy of the predication of future demand should be as high as possible, as it directly affects the subsequent optimization. (2) Traditional scheduling is complex and can not be applied to largescale problems. Furthermore, traditional algorithms use MIP to solve optimization, which only suits for obtaining a solution in special environments. (3) Joint scheduling depends on setting rules manually, which may produce greedy and shortsighted strategies.
In this paper, we investigate multimodal transposition carefully, and discover that different traffic modes can be scheduled complementarily to improve efficiency. It should be noted that the characteristics of different transports are different. For example, bikesharing scheduling is flexible, so it could be affected by other transports easily. Therefore, bikesharing scheduling is suitable for scheduling with others. Some traffic scheduling, such as bus scheduling is less affected by other transports, and thus relatively fixed.To consider the interaction between different traffic systems, we design a global scheduling method that can schedule both types of traffic at the same time, so that the flexible traffics can be dispatched in coordination with the fixed traffic to achieve the global optimality. Specifically, for the most common transports — bus and bike, a joint traffic scheduling framework named JLRLS based on reinforcement learning is proposed. JLRLS incorporates the bike flow into consideration, which helps to avoid local greed. Meanwhile, it also incorporates the observation information of other traffic scheduling systems, so that the reinforcement learning model can learn the strategy of joint scheduling. Compared with traditional traffic scheduling methods, it has the following advantages:

We adopt reinforcement learning to learn the scheduling strategy, which is robust to the inaccuracy of demand prediction and adaptable in complex scheduling situations.

Compared with other scheduling schemes, it can take into account longerterm traffic demand, avoid local greed and achieve optimal scheduling over a long period of time.

JLRLS can realize joint scheduling among different traffic modes. When a certain traffic service is temporarily unavailable or inappropriate, more flexible traffic can be dispatched in time to meet the corresponding demand. The framework has strong scalability and can be applied to joint dispatch between buses on different routes. In the future, it can also be applied to the more different joint scheduling.
2. Overview
In this section we define some concepts and notations used in the paper, and overview the framework of our model JLRLS.
Notation  Description  

The number of stations in the cluster  
The number of agents in system  
The th time segment in the future  
The number of predicted time segments  
The longest time the passengers willing to wait  
The th station  
The feature dimension of the station in other systems  
The feature dimension of the current system  
The observations for the stations in other systems  
The environmental factors of the current system  


The vector dimension after encoding 
2.1. Preliminary
Definition 1.
Agent: Agent indicates buses in bus systems and the dispatching vehicles in bikesharing systems.
Definition 2.
Cluster: Two types of cluster are defined for two different situations. For bus systems, the cluster is the bus stations sharing the same route. For bikesharing systems, the cluster represents the similar stations, which are close to each other after clustering shown in section 3.1.2.
Definition 3.
Demand: We define two types of demand here. The first, demand for riding bikes, and taking buses from origin station to terminal. The second, demand for returning bikes, and taking buses from terminal to origin station.
Definition 4.
Time segment: Time segment is a period of time with fixed length, e.g, 15 mins.
Definition 5.
Capacity: Capacity stands for bus carrying capacity of passengers in bus systems, and vehicles for dispatch carrying capacity of bikes in bikesharing systems.
Definition 6.
Episode: Episode is defined as a certain period in a day.
2.2. Framework
We propose a general framework of Joint Longterm Reinforcement Learning Scheduling system (JLRLS). As shown in Fig 2, our model includes demand forecasting for stations and joint dispatching based on reinforcement learning. In bikesharing scheduling, we incorporate the information of bus stations in bus scheduling system, so that the reinforcement learning model can learn the strategy of joint traffic scheduling.
3. Method
3.1. Forecast System
Since the characteristics are largely different between scenarios of bus systems and bikesharing systems, we propose two kinds of prediction frameworks, bus flow forecast system and bike flow forecast system.
3.1.1. Bus Flow Forecast System
In bus flow forecast system, there is a relatively stable hierarchical concept, which is the passenger flow in each bus station is equal to that in the bus system. As passenger flow is regular and periodic in one day, daily total passenger flow of the bus system is also stable. We find that the scenario of bus passenger flow forecasting is similar to power system consumption forecasting. Inispired by it, we propose a bus flow prediction algorithm based on hierarchical time series.
Since the time series of daily passenger flow in a bus station exhibit strong regularity, in order to reduce the complexity of calculation, we use linear model to learn the time series of the past time period for each station and the total bus system, and predict the passenger flow in a short feature term. Because the individual forecast of the traffic at each station does not guarantee that their sum is consistent with the total flow of the bus system, there is a summing matrix in the hierarchical time series forecasting to transform the whole problem into a regression problem which needs to be optimized.
3.1.2. Bike Flow Forecast System
Modeling the scene of bikesharing flow is a very complex problem, because the time sequence in this scenario does not have strong regularity. In a bus system with fixed routes and stable users, many users only use shared bikes temporarily. Therefore, we might not use the method of bus flow forecast system. For the bikesharing flow prediction system, our prediction model needs to predict the traffic flow between each station in the future. As a lot of stations sharing bikes, we implement (li2018dynamic) to group the stations for simplifying the situation. The stations in each group are closer to each other and the traffic between them is more frequent than others. And we only consider the bike movement situation between the stations in each group.
Compared with traditional linear prediction algorithm, deep learning model like Long shortterm memory
(hochreiter1997long) (LSTM) is more suitable for modeling such unstable and nonlinear time series of bikesharing station. Thus, we use LSTM to model the bike departure situation of each station in the short time of future. Specifically, indicate the time sequence of a bike station, the LSTM model maps an input sequence to outputs via a sequence of hidden states by computing the following equations recursively from to :where , are the input and hidden vectors of the th time step, , , , are the activation vectors of the input gate, forget gate, memory cell and output gate, is the weight matrix between vector and (e.g, is weight matrix from the input to the input gate ), is the bias term of and
is the sigmoid function, and
is the prediction of time series .Considering the movement of bikes between stations, when a bike leaves from station
, other stations in the same group may become the destination. For this problem, we use the method of frequency replace probability to calculate the probability of bikes leave from station
to other stations according to the past period. Then the number of bikes predicted from station to station in the future period is the product of the total number of bikes predicted leave from station and the probability of station to .3.2. Scheduling System
After forecasting the bus flow and bike flow, we use reinforcement learning to produce the scheduling strategy, and adopt a Deep Deterministic Policy Gradient (DDPG) approach (lillicrap2015continuous). It incorporates the information of bike flow to avoid local greed. At the same time, it also incorporates the observation information of other traffic scheduling systems, so that the reinforcement learning model can learn the strategy of traffic joint scheduling.
3.2.1. The State of Scheduling System
In order to describe the state clearly, we summarize it in Figure 3. As we can see, the state is divided into five categories: predicted demand, station information, agent states, scheduling information of other traffic systems and the state of system. There are a matrix and three vectors in the predicted demand, where represents the number of bikes from to in the th time segment. represents vector after encoding . respectively indicate the first and the second type of demand for each station in the th time segment. Both and denote station information. In bus systems, they represent the time interval of the most recent bus going forward and backward at a station, respectively. In bikesharing systems, means the available bikes, and is the available docks at the station. is an onehot vector, which stands for which station the current agent will locate at. respectively stand for the capacity occupied and the remaining capacity for current agent. represents the operation being performed. In bus systems, there are three types of operations, {1,0,1}, 1 standing for driving from to , 0 for halting, and 1 standing for driving from to . In bikesharing systems, means how many bikes are loaded or unloaded, ¿ 0 for loading, ¡ 0 for unloading, and = 0 standing for not moving. respectively stand for the corresponding state of other agents. can be the observation of the bus system by bikesharing system, or it can be the observation of the bus system on different routes. represents the environmental factors such as the weather, temperature, distance.
3.2.2. Bus Scheduling System
A State. For bus scheduling system, we define the state as follows:

Observation for bus stations,

The state of the scheduled bus,

The state of other buses on the same route,

Observation for the system, ().

Observation for stations in other traffic systems, ().
An Action. For a bus, there are three types of an action:
(1) towards to terminal ; (2) toward to origin station ; (3) stoping at or .
A Reward. We set the reward mechanism as follows: (1) Each time the bus travels from a to b, the reward is the reduced waiting time, where the punishment is related to the driving time; (2) The bus stops driving, no rewards and punishments.
Stopping Condition. A passenger waiting for p time segments or an episode is completed.
3.2.3. Bike Scheduling System
A State. For a bikesharing scheduling system, the state consists of the following five parts:

Observation for bikesharing stations, ().

State of the current dispatch vehicle,

State of other vehicles for dispatch in the same cluster, .

Observation for the system, ().

Observation for bikesharing stations in other traffic systems (), such as in the bus system.
Different from (li2018dynamic), we consider more detailed information on the flow of bikes between stations. We use to represent the predicted flow network of bikes between stations in the future. The matrix will result in a high complexity, so we encode the matrix to get a vector , which keeps the bike flow between stations. It comprehensively describes the bike flow network, and simplifies the representation of the state, which is more conducive to the convergence of the strategy and the exploration of the agent. To enable the bikesystem working with the bus system, we incorporate the observation information of bus scheduling systems in reinforcement learning.
An Action. An action is defined as . denotes which station the current dispatch vehicle will unload or load bikes, and denotes the number of unloaded or loaded bikes.
Reward. After an episode is completed, we set the reward mechanism as follows: we reward the agent as the number of services provided by bike; The punishment is related to the cost of scheduling and the number of bikes exceeding total capacity.
Stop Condition. When an episode is completed.
Our model has the following advantages over (li2018dynamic):

The representation of the state contains more detailed flow information between stations in the future, which is more conducive to policy convergence.

We adopt a Deep Deterministic Policy Gradient (DDPG) approach. First of all it is an ActorCritic network, taking into account the advantages of ValueBased and PolicyBased methods. Secondly, using LSTM inside the Actor network, it can comprehensively consider the historical information of the state.

We consider the interactions between different traffic systems, so as to jointly dispatch different traffic systems and improve traffic efficiency.
4. Conclusions
In order to provide a better travel experience, we urgently need a joint scheduling system capable of jointly schedule multiple modes of transportation. Therefore, we propose the above topic and give our solution. To successfully complete this research, we need more resource, including but not limited to the following:

Complete query records of Baidu map App.

The bicycle histories of Baidu partners.

The routes of buses and the passenger flow at different time.

Enough GPU resources.
Multimodal scheduling is an indispensable part of smart city. The successful development of multimodal transportation scheduling could make a lots of advantages, such as reducing transport times, balancing traffic flows, reducing traffic congestion, and ultimately, improving efficiency of intelligent transportation systems. Therefore, the research of our topic is valuable to the project of smart city. We believe that after possessing these resources we can develop a more comprehensive and efficient multimodal joint scheduling system.
Comments
There are no comments yet.