Fuzzy Q-Learning Based Multi-Agent System for Intelligent Traffic Control by a Game Theory Approach

This paper introduces a multi-agent approach to adjust traffic lights based on traffic situation in order to reduce average delay time. In the traffic model, lights of each intersection are controlled by an autonomous agent. Since decision of each agent affects neighbor agents, this approach creates a classical non-stationary environment. Thus, each agent not only needs to learn from the past experience but also has to consider decision of neighbors to overcome dynamic changes of the traffic network. Fuzzy Q-learning and Game theory are employed to make policy based on previous experiences and decision of neighbor agents. Simulation results illustrate the advantage of the proposed method over fixed time, fuzzy, Q-learning and fuzzy Q-learning control methods.



There are no comments yet.


page 1

page 2

page 3

page 4


Effects of Smart Traffic Signal Control on Air Quality

Adaptive traffic signal control (ATSC) in urban traffic networks poses a...

Learning to Communicate with Reinforcement Learning for an Adaptive Traffic Control System

Recent work in multi-agent reinforcement learning has investigated inter...

A self-organizing system for urban traffic control based on predictive interval microscopic model

This paper introduces a self-organizing traffic signal system for an urb...

Design of Intelligent Agents Based System for Commodity Market Simulation with JADE

A market of potato commodity for industry scale usage is engaging severa...

Performance Evaluation of Road Traffic Control Using a Fuzzy Cellular Model

In this paper a method is proposed for performance evaluation of road tr...

Analysis of OODA Loop based on Adversarial for Complex Game Environments

To address the problem of imperfect confrontation strategy caused by the...

Multi-agent control of airplane wing stability under the flexural torsion flutter

This paper proposes a novel method for prevention of the increasing osci...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Urbanization, increasing number of vehicles, and lack of transport infrastructures have increased travel time, fuel consumption, and air pollution. Therefore, urban life equals with waste of time, less clean air, and acoustic pollution. Conventional fixed traffic management systems are not able to fight complexity and dynamic of large traffic networks. While, artificial intelligence (AI) are greatly employed to develop intelligent traffic systems (ITS)  

Kponyo et al (2016); Balaji et al (2010); Bazzan and Klügl (2014); Rida (2014), multi-agent system is an approach to model ITS  Roess et al (2004); Vilarinho et al (2017). This framework consists of a population of intelligent and autonomous agents work together in an environment  Schaefer et al (2016). Traffic lights  Liu (2007), vehicles  Adler et al (2005), and pedestrians  Teknomo (2006) are considered as agents in modeling of urban traffic networks. Each agent needs to learn from the past experiences which is a key point to approximate a better decision-making policy. Multi-agent model-based  Wiering (2000) as well as model-free  Chin et al (2011)reinforcement learning (RL) techniques are widely used in researches on ITS  Prashanth and Bhatnagar (2011); Balaji et al (2010).

In a multitude of researches, any agent only considers its own traffic state in order to determine the control policy. For example, single intersection with two phases is investigated in  Abdulhai et al (2003). Length of vehicles queue waiting on the light is considered as state which can be measured by the agent. It decides on extend green time or change it to the next phase so that the number of vehicles waiting on the light is minimized. The results show superiority of Q-learning agent over uniform traffic flows and constant-ratio traffic flows. In  Wiering (2000)

, traffic lights are considered as agents which communicate with vehicles. The vehicles estimate their mean waiting time and transmit this time to traffic light where a popular RL algorithm, namely Q-learning, is used to provide a control for traffic signal scheduling. Results of this study show 22% reduction in waiting time compared to constant time lights. Multi-objective reinforcement learning is utilized to control several traffic lights in  

Houli et al (2010)

. Optimization goals include number of stops of a vehicle, mean stopping time, and length of vehicles’ queue on the next intersection. Its results indicate that multi-RL can effectively prevent the queue spillovers under congested condition to avoid large-scale traffic jams. Bull et.al. used learner classifiers to control light traffic including 4 intersections  

Bull et al (2004). In this research, traffic lights include two phases at each intersection, where one phase is for moving north-south and one is for east-west. Controller at each intersection, obtains optimum phase time through extracting if-then rules. Its results show that performance of the traffic light using learner classifier system has improved significantly compared to constant time traffic light. In  Steingrover et al (2005), the learning purpose is modeled in such a way that states indications are based on the summation of the cars waiting times. Obviously the more cars information is received, the model will be more complicated and state space will be larger. This issue is one of the significant problems of large networks. Adaptive control, which is introduced in  Prashanth and Bhatnagar (2011), uses the approximate of a function as mapping of states to scheduling. Fuzzy inference engine is exploited to decrease systematic faults of Q-algorithm in  Pacheco and Rossetti (2010). The results demonstrate that not only learning in fuzzy framework is done faster than Q-learning but also delay in intersections is decreased considerably. A multi-agent fuzzy approach is proposed in Iyer et al (2016), where Q-learning updates the set of rule base in fussy inference engine. In  Da Silva et al (2006) a new method which has the capability to estimate an incomplete model of environment is described for a given non-static environment. This method is applied in a network composed of 9 intersections. The reported results show that this method has better performance than the model-free methods and model-based methods, but could not be generalized and used in larger networks.

In other researches, agents consider other agents in determination of their own control policy. For instance, coordination among agents is desired in  Medina et al (2010) where the agents not only consider number of waiting vehicles on its own intersection but also they consider number of vehicles which have stopped in adjacent intersections. The RL is applied on 5 intersections within three different scenario. The overall results show improvement in delay time. In  Wiering (2000), RL is used to control the traffic in a grid where a type of cooperative learning simultaneously controls the traffic signals and determines the optimal routes. One of the main drawbacks of this method is the high costs of communication and information exchange, specifically when intersections of network are increased. Cooperative RL tries to extract the knowledge from neighbor agents in a scheduling learning  Salkham et al (2008). This method is implemented in an area of Dublin including 64 intersections.

This paper introduces a hybrid fuzzy Q-learning and Game theory method for control of traffic lights in multi agent framework. It exploits the benefits of fuzzification as well as interaction with other agents. The traffic network is modeled by considering an autonomous agent controls in which each intersection decides on duration of green phase. The number of vehicles in different inputs of the intersection are measured by the corresponding agent. Any agent interacts with neighbour agents by getting a reward from each decision. This paper proposes that each agent fuzzify the inputs and utilizes in a fuzzy inference system for fuzzy estimation of traffic model states. The agent uses a Q-learning approach modified by Game theory to learn from the past experiences and consider the interaction with neighbor agents. The agent gets a reward proportional to its own traffic state and a reward from each decision from neighbour agents to update its Q-learning algorithm. The neighbour reward and its weighting in Q-value update is proposed to be fuzzy in the proposed method. The proposed method is applied on a five-intersection traffic network. The simulation results indicate that proposed method outperforms the fixed time, fuzzy, Q-learning and fuzzy Q-learning control methods in the sense of average delay time.

This paper is unfolds as follows. After this introduction, Q-learning and its fuzzy version are described in the next section. Section 3 is devoted to application of Game theory in ITS. Section 4 and 5 are about problem statement and proposed solution, respectively. Simulation results are given in section 6. Finally, the paper is concluded in section 7.

Ii Q-learning and fuzzy Q-learning

The objective of agents which act in dynamic environments is making optimum decisions. If the agents are not aware of rewards corresponding to various actions, selecting a proper action would be challenging. To achieve this goal, learning adjusts agents’ action selection based on collected data. Each agent tries to optimize its actions with dynamic environment via trial and error in reinforcement learning (RL). The RL is actually how different situations are mapped upon actions to receive the best results or the highest reward. In many cases, actions influence the reward of next steps as well as affect the reward of its corresponding step. There are model-based  Wiering (2000) as well as model-free  Chin et al (2011) RL techniques. In model-free RL, the agent does not need explicit modeling of the environment because its actions could be directly selected based on rewards. Q-learning is a model-independent approach where the agent does not access to transfer model  Watkins and Dayan (1992); Abdoos et al (2011). Suppose that the agent is in a state , performs an action , from which it gets the rewards from the environment and the environment changes to state . This is given by a tuple in the form of . State-action value which represents the expected total reward resulting from taking action in state is denoted by Q-value . The agent starts with random value and after each action they receive a tuple in the form of . For each tuple the value of state-action could be calculated according to the following equation:


where is the learning rate of agent. means that merely new information is considered and zero means that the agent does not have any learning. is discount factor which determines future rewards. Zero value for this factor makes the agent opportunist which means that the agent only considers current reward. On the other hand, means that the agent will wait for a longer time to achieve a large reward. Q-learning will converge to optimum value

with probability of one if all state-action pairs are experienced repetitively and learning rate decrease during the time  

Pacheco and Rossetti (2010). Generally, RL is useful for solving problems with small dimension discrete state and action space. When the dimension of state and action space becomes larger, the size of search table will be so large that it makes the algorithm very slow due to computational time. On the other hand, when the states or actions are stated continuously, using search table will not be possible. To tackle this problem fuzzy theory is employed. If the intelligent agent has a proper fuzzy set as expert knowledge about the desired area, the ambiguity could be resolved. Thus, intelligent agent can understand vague objectives and unknown environment. In practice, the action in large spaces is facilitated by eliminating Q-values table. In this method everything is based on quality values and fuzzy inference. Fuzzy inference system (FIS) deals with input and Q-learning algorithm uses the follower section and its active rules as states. Reward signal of Q-algorithm is built in accordance with fuzzy logic, environment reward signal and performance estimation of current action. It is tried to select the action which maximizes the reward signal Glowaty (2005); Bonarini et al (2009). Learning system is able to select one action among actions for each rule. -th possible action in -th rule is denoted by and its value is shown by consider the following rules  Bonarini et al (2009):

Learning should find the best result for each rule. If the agent selects an action which results in high value, it may learn optimum policy. Thus, fuzzy inference system may obtain necessary action for each rule  Bonarini et al (2009) .

Iii Game theory in ITS

Relation between agent oriented environments and games theory originates from the fact that each state of agent-oriented environments can be resembled to a game environment. Profit function of players would be current state of the environment and goal of players is to move toward balanced or equilibrium point (reaching the best decision making policy). Some scholars have studied the application of Game theory to control of traffic lights Goyal and Kaushal (2017); Groot et al (2017). They integrate Game theory into the multi-agent interaction approach. Some of them suit the traffic problem into a rigorous mathematical game model  Bell (2000); Chen and Ben-Akiva (1998); Alvarez et al (2008) while others modify the learning method of agents based on Game theory  Xinhai and Lunhui (2009). In  Alvarez et al (2008)

, signalized intersections are modeled as finite controlled Markov chains and each intersection is seen as non-cooperative game where each player try to minimize its queue. The solutions are given as Nash equilibrium and Stackelberbg equilibrium and the simulation results indicate shorter queue length than adaptive control. In  

Bell (2000), a two-player non-cooperative game is articulated between user seeking a path to minimize the expected trip cost and choosing link performance scenarios to maximize the expected trip cost. It shows that the Nash equilibrium point measures network performance. Intelligent traffic control is expressed as a Cournot game where the traffic authority and the users choose their strategies simultaneously and as a bi-level Stackelberg game where the traffic authority is the leader which determines the signal settings in anticipation of the user reactions. In  Xinhai and Lunhui (2009), Game theory is used to address coordination between agents based on traffic signal control with Q-learning. It specifies strategies ({red light time plus 4sec, red light time plus 8sec, red light time minus 4s, red light time minus 8s,unchangeably}) and actions ({east west straight and right turn, south north straight and right turn, east west left turn, south north left turn}). Then, an interaction mathematical model via Game theory as a four parameter group is presented. is a group of decision-makers as players. is a group of any possible strategies and actions, i.e. . represents the information which agents masters. is the benefit function which adopts Q-value. So, the Nash equilibrium is  Xinhai and Lunhui (2009):


where and denote action of -th agent and actions of other agents, respectively. and represent the actions at Nash equilibrium. The renewed Q-values in distributed reinforcement Q-learning is used to build the payoff values. Q-value function is updated as:


where and are learning rate and discount factor, respectively. and are current state of traffic environment and current action, respectively. is its next state, is the number of traffic signal control agents surrounding -th agent, is the Q-value function for -th agent when selects action in state . is reward function of -th agent and is reward function of -th agent neighboring -th agent. is a weighted function which shows the effect of on -th agent. Mathematical functions are suggested in  Xinhai and Lunhui (2009) for and . Assumption of discrete action-state space and determination of reward and weighting functions are drawbacks of that work.

Iv Problem Statements

Consider a traffic network in which the lights of each intersection is controlled by an autonomous agents without any centeralized management. Some sensors which are installed below the surface of surrounding streets or traffic cameras of each intersection provide information about traffic situation for the corresponding agent. An agent has to decide on duration of green light at North-South (NS) and West-East (WE) paths. Also, any agent interacts with neighbour agents. Anyway, the agent is expected to schedule traffic lights optimally, in the sense of average delay, based on the received information from its sensors and received information from neighbor agents.

The agents may have little knowledge about others’ decision due to distribution of information. Even if an agent has previous known information about others’ decision, it is not valid as other agents are also learning. Thus, the environment is dynamic and the behavior of other agents may change during time. Lack of prediction of other agents causes uncertainty in problem solving procedure. This paper looks for a decision-making algorithm for lights control agents which considers neighbour agents information in addition to its own information.

V Proposed algorithm

We consider a constant duration for green plus red phases. So, if the agent determines the green phase duration , then the red phase duration is . Any typical agent receives number of vehicles on the NS and WE streets from its own sensors and the green phase duration of neighbour agent in order to schedule its own green phase duration. This paper proposes an autonomous agent with structure in Fig.1 to control each intersection.

Figure 1: The proposed structure for a typical agent

The number of vehicles in WE and NS streets which are measured by sensors are fuzzified. Then, a fuzzy inference engine with rules as Eq.II are employed to fire the corresponding output membership functions. Finally, defuzzification results to duration of green phase in NS path (). Thus, the duration of green phase in other path, WE, is . We propose that, Q-value function which is updated by Eq.4 be the value of each action in Eq.II which is denoted by . This update equation takes the neighbour agents’ decision into account.

The -th agent takes decision of neighbor agent into account by reward and a weighting function . The reward is calculated based on average delay obtained from the decision made by the agent and current traffic situation in a fuzzy manner. A fuzzy inference engine obtains these two inputs after fuzzification and gives the reward after defuzzification; see Fig.1. weighting function shows the effect of on the decision of -th agent. This weight is also calculated by a fuzzy inference engine. This engine takes its own , the neighbour agents’ , and number of waited vehicles and gives . Suitable choice for reward and weighting function plays a significant role in agent learning. The agent with structure in Fig.1 runs the following algorithm:

  1. Initial value of -value for i-th traffic signal control agent is in the form of .

  2. Observing by WE and NS sensors which is the current state of -th intersection.

  3. Selecting a proper estimation for desired state by fuzzy inference system.

  4. Calculating the reward related to -th and -th traffic signal control agent and the weighting function for neighboring agents separately.

  5. Observing new state .

  6. Updating -value according to equation 4.

  7. Returning to step 2 till the variation of Q-value becomes less than .

Vi Simulation results

Consider a traffic network with a center and four neighbor intersection. The delay in each intersection depends on physical characteristics of the intersection, traffic light scheduling and number of cars in input streets. We utilized traffic model which is given by the American Highway Capacity Manual (HCM)  (Akgungor and Bullen, 1999, Eq.20):


where , , , and are average delay (sec), cycle time (sec), green ratio, and degree of saturation, respectively. and , where , , and are capacity (vehicle per hour), green time (sec), and input volume, respectively. We use this model to calculate average delay based on the green phase duration and number of vehicles. For more details of this equation we refer to  Akgungor and Bullen (1999).

Assume that and . is volume of vehicles entering each street which varies between to . is duration of the green phase which each agent selects considering fuzzy Q-learning and interaction with adjacent agents. The traffic network simulation algorithm is as follow:

  1. The volume of vehicles entering each intersection (

    ) are randomly generated by a discrete uniform distribution on the interval


  2. Average delay is calculated by Eq.5.

  3. Each agent decides on the time of green phase .

  4. Go to step 1 until end of simulation time.

Assume structure of the agents as in Fig.1 with the Mamdani FIS with input membership function as in Fig.2 for number of input vehicles and Fig.3 for average delay to calculate the reward functions . Centroid defuzzification by the output membership function as in Fig.4 is considered to estimate a reward value in interval .

Figure 2: Membership function of number of vehicles enter the street for reward FIS
Figure 3: Membership function of average delay for reward FIS
Figure 4: Membership function of output for reward FIS

The weighting function FIS has number of vehicles, its own green phase duration and the neighbour agents’ green phase duration as inputs. Fig.2 shows the membership function for number of vehicles and Fig.5 depicts the membership function for its own and neighbour green phase duration. Centroid defuzzification is applied to calculate weights on output membership function as in Fig.6 which should be a value between and .

Figure 5: Membership function of green phase duration for weighting function FIS
Figure 6: Membership function of output for weighting function FIS

Finally, the agent uses fuzzy Q-learning (Eq.II) with Q-value update rule (Eq.4) where learning and discount factor are selected to be 0.5 and 0.7, respectively. The membership function for each measured number of vehicles is shown in Fig.7. The output estimates green phase duration with membership functions as in Fig.8.

Figure 7: Membership function of number of vehicles for fuzzy Q-learning
Figure 8: Membership function of green phase duration for fuzzy Q-learning

The proposed method is compared with Fuzzy Q-learning (using Eq.II where is the Q-value which updates with Eq.1), Q-learning (using Q-learning method with Q-value which updates with Eq.1), fuzzy(using traditional fuzzy inference method) and fixed time () in the sense of total average delay. Average delay in each time interval is depicted in Fig.9 and the total average delay is illustrated in Fig.10. The results illustrate that total average delay decrease from more than for fixed time scheduling to approximately for the proposed method.

Figure 9: Delay of the proposed method, fixed time, fuzzy Q-learning, Q-learning and fuzzy in each time step
Figure 10: Average of delay for the proposed method, fixed time, fuzzy, Q-learning, fuzzy Q-learning

Vii Conclusion

In this study an intelligent control method of a controlling traffic network was performed to decrease average delay time. Each traffic light is considered as a learning agent. This paper proposed a structure for the agents. Each agent learn to decide on the duration of green phase through a fuzzy Q-learning algorithm which is modified by Game theory. Each agent receives a reward from neighbour agents. The reward received from the neighbour and weighted functions of neighboring agents are factors learning algorithm. These parameters are fuzzified through a FIS. Also, the number of vehicles in each street is measured and fuzzified to be used in decision making process. The simulation results were compared with fixed time method and other intelligent methods. The results revealed that our proposed method achieves considerable reduction of average delay in intersections.


  • Abdoos et al (2011) Abdoos M, Mozayani N, Bazzan AL (2011) Traffic light control in non-stationary environments based on multi agent q-learning. In: 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), IEEE, pp 1580–1585
  • Abdulhai et al (2003) Abdulhai B, Pringle R, Karakoulas GJ (2003) Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering 129(3):278–285
  • Adler et al (2005) Adler JL, Satapathy G, Manikonda V, Bowles B, Blue VJ (2005) A multi-agent approach to cooperative traffic management and route guidance. Transportation Research Part B: Methodological 39(4):297–318
  • Akgungor and Bullen (1999) Akgungor AP, Bullen AGR (1999) Analytical delay models for signalized intersections. In: 69th ITE Annual Meeting, Nevada, USA
  • Alvarez et al (2008) Alvarez I, Poznyak A, Malo A (2008) Urban traffic control problem a game theory approach. In: 47th IEEE Conference on Decision and Control, IEEE, pp 2168–2172
  • Balaji et al (2010) Balaji P, German X, Srinivasan D (2010) Urban traffic signal control using reinforcement learning agents. IET Intelligent Transport Systems 4(3):177–188
  • Bazzan and Klügl (2014)

    Bazzan AL, Klügl F (2014) A review on agent-based technology for traffic and transportation. The Knowledge Engineering Review 29(03):375–403

  • Bell (2000) Bell MG (2000) A game theory approach to measuring the performance reliability of transport networks. Transportation Research Part B: Methodological 34(6):533–545
  • Bonarini et al (2009) Bonarini A, Lazaric A, Montrone F, Restelli M (2009) Reinforcement distribution in fuzzy q-learning. Fuzzy sets and systems 160(10):1420–1443
  • Bull et al (2004) Bull L, Sha’Aban J, Tomlinson A, Addison JD, Heydecker BG (2004) Towards distributed adaptive control for road traffic junction signals using learning classifier systems. In: Applications of Learning Classifier Systems, Springer, pp 276–299
  • Chen and Ben-Akiva (1998) Chen O, Ben-Akiva M (1998) Game-theoretic formulations of interaction between dynamic traffic control and dynamic traffic assignment. Transportation Research Record: Journal of the Transportation Research Board (1617):179–188
  • Chin et al (2011) Chin YK, Bolong N, Kiring A, Yang SS, Teo KTK (2011) Q-learning based traffic optimization in management of signal timing plan. International Journal of Simulation, Systems, Science and Technology 12(3):29–35
  • Da Silva et al (2006) Da Silva BC, Basso EW, Perotto FS, C Bazzan AL, Engel PM (2006) Improving reinforcement learning with context detection. In: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, ACM, pp 810–812
  • Glowaty (2005) Glowaty G (2005) Enhancements of fuzzy q-learning algorithm. Computer Science 7:77–87
  • Goyal and Kaushal (2017) Goyal T, Kaushal S (2017) An intelligent scheduling scheme for real-time traffic management using cooperative game theory and ahp-topsis methods for next generation telecommunication networks. Expert Systems with Applications
  • Groot et al (2017) Groot N, Zaccour G, De Schutter B (2017) Hierarchical game theory for system-optimal control: Applications of reverse stackelberg games in regulating marketing channels and traffic routing. IEEE Control Systems 37(2):129–152
  • Houli et al (2010) Houli D, Zhiheng L, Yi Z (2010) Multiobjective reinforcement learning for traffic signal control using vehicular ad hoc network. EURASIP journal on advances in signal processing 2010(1):724,035
  • Iyer et al (2016) Iyer V, Jadhav R, Mavchi U, Abraham J (2016) Intelligent traffic signal synchronization using fuzzy logic and q-learning. In: International Conference on Computing, Analytics and Security Trends (CAST), IEEE, pp 156–161
  • Kponyo et al (2016) Kponyo J, Nwizege K, Opare K, Ahmed A, Hamdoun H, Akazua L, Alshehri S, Frank H (2016) A distributed intelligent traffic system using ant colony optimization: A netlogo modeling approach. In: Systems Informatics, Modelling and Simulation (SIMS), International Conference on, IEEE, pp 11–17
  • Liu (2007) Liu Z (2007) A survey of intelligence methods in urban traffic signal control. IJCSNS International Journal of Computer Science and Network Security 7(7):105–112
  • Medina et al (2010) Medina JC, Hajbabaie A, Benekohal RF (2010) Arterial traffic control using reinforcement learning agents and information from adjacent intersections in the state and reward structure. In: Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on, IEEE, pp 525–530
  • Pacheco and Rossetti (2010) Pacheco JC, Rossetti RJ (2010) Agent-based traffic control: a fuzzy q-learning approach. In: 13th International IEEE Conference on Intelligent Transportation Systems (ITSC), IEEE, pp 1172–1177
  • Prashanth and Bhatnagar (2011) Prashanth L, Bhatnagar S (2011) Reinforcement learning with function approximation for traffic signal control. IEEE Transactions on Intelligent Transportation Systems 12(2):412–421
  • Rida (2014) Rida M (2014) Modeling and optimization of decision-making process during loading and unloading operations at container port. Arabian Journal for Science and Engineering 39(11):8395–8408
  • Roess et al (2004) Roess RP, Prassas ES, McShane WR (2004) Traffic engineering. Prentice Hall
  • Salkham et al (2008) Salkham A, Cunningham R, Garg A, Cahill V (2008) A collaborative reinforcement learning approach to urban traffic control optimization. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, pp 560–566
  • Schaefer et al (2016) Schaefer M, Vokřínek J, Pinotti D, Tango F (2016) Multi-agent traffic simulation for development and validation of autonomic car-to-car systems. In: Autonomic Road Transport Support Systems, Springer, pp 165–180
  • Steingrover et al (2005) Steingrover M, Schouten R, Peelen S, Nijhuis E, Bakker B (2005) Reinforcement learning of traffic light controllers adapting to traffic congestion. In: BNAIC, Citeseer, pp 216–223
  • Teknomo (2006) Teknomo K (2006) Application of microscopic pedestrian simulation model. Transportation Research Part F: Traffic Psychology and Behaviour 9(1):15–27
  • Vilarinho et al (2017) Vilarinho C, Tavares JP, Rossetti RJ (2017) Intelligent traffic lights: Green time period negotiation. Transportation Research Procedia 22:325–334
  • Watkins and Dayan (1992)

    Watkins CJ, Dayan P (1992) Q-learning. Machine learning 8(3-4):279–292

  • Wiering (2000) Wiering M (2000) Multi-agent reinforcement learning for traffic light control. In: ICML, pp 1151–1158
  • Xinhai and Lunhui (2009) Xinhai X, Lunhui X (2009) Traffic signal control agent interaction model based on game theory and reinforcement learning. In: International Forum on Computer Science-Technology and Applications, IEEE, vol 1, pp 164–168