Military vehicles operate in a large variety of environments and scenarios resulting in a diverse set of requirements from fleet mix. The special functionalities of military vehicles and incessantly updated technologies make them hardly reusable after military operation (Shinkman, 2014). In order to reduce the wastage, the US Army requires that fleets of vehicles can be reutilized across a large array of military mission scenarios. Modular vehicles are then introduced by constructing from special components, which are named as modules (Dasch and Gorsich, 2016). Modules are assumed to be a special type of components which can be easily coupled/decoupled through simple plug-in/pull-out actions on battle fields. This property enables vehicles to be quickly assembled, disassembled and reconfigured (ADR) (as shown in Fig.1) on field in reacting to demands.
Because of the close connection between operation strategy and fleet performance, researchers start investigating the potentials of modularity in the sights of performance boost during fleet operation. Bayrak et al. proposed a mathematical model to simulate modular fleet operation in a logistic mission scenario. They noticed a significant operational cost reduction after fleet modularization (Bayrak et al., 2016). In 2017, Li and Epureanu (Li and Epureanu, 2018)
proposed an intelligent agent-based model in managing modular fleet operation. Agents are classified into three categories: camp, distributor, and supply. Different types of agents collaboratively and real-timely yield operational decisions in reacting to stochastic field demands(Li and Epureanu, 2017a). Later, Li and Epureanu also modeled the fleet operation as a dynamic system and implemented model predictive control to manage the system dynamics. Their results show that modular fleet exhibits a better robustness than conventional fleet in reacting to the disturbance and noise from battlefields (Li and Epureanu, 2017b).
However, most of previous research focused on a single decision maker who operates (play a game) against the environment. However, unpredictability from the enemy’s reaction is essential and non-negligible in every form of warfare, which leads to an inability to forecast the outcome of actions or weakly perceived causal links between the events on the battlefield (Lynch, 2015). Furthermore, smart system and artificial intelligence are playing an ever-increasing role in our daily lives. This trend does not spare military operations. Autonomous vehicles, especially Unmanned Aerial Vehicles, have been widely used to assist military operations (Landa, 1991; Jose and Zhuang, 2013; Evers et al., 2014). With no surprise, artificial intelligence will play a significant role in management of a large scale fleet of autonomous vehicles. Given the equivalent autonomous decision making system, our goal is to emphasize the synergy between modularity and autonomy by performing an attacker-defender game between conventional fleet and modular fleet.
The use of games in modeling the relationship between an attacker and a defender has a long history starting with the work of Dresher (Dresher, 1961). The variety of applications and research relate to issues in military OR and defense studies are rich (Kardes and Hall, 2005; Hausken and Levitin, 2009; Zhuang et al., 2010; Paulson et al., 2016). There are also several studies of attacker-defender games that consider the resource dependent strategies. Power use game theoretical approach for finding a defender’s resource allocation strategy for protecting sets from being destroyed by a strategic adversary, i.e., terrorist group. The defender is uncertain about the targets that enemy is likely to strike. Given the distribution of enemy’s behavior, they derived a pure strategy for defender that leads to a Bayesian Nash equilibria (Powell, 2007). Hausken and Zhuang consider a multi-period game where a defender can allocate resources to both defend its own resources and attack the resources of the enemy. Similarly, attacker can also determine the use of their resources for attacking the defender or protecting itself. They adopt a strategy pair which is proven to be a subgame perfect Nash equilibrium for a two-stage game and illustrates how strategy depends on changes in the enemy’s resources (Hausken and Zhuang, 2011).
Game theory related techniques are popular in the existing literature related to resource dependent attacker-defender game. However, these applications mainly focus on single period games or repeated games where the significant information from previous periods are ignored. Furthermore, heavy assumptions on previous approach, i.e., single resource type, sequential move, perfect information on enemy also make the previous research inapplicable to real-world military mission, where the demands and environment are stochastic and unpredictable (Shinkman, 2014; Lynch, 2015; Xu et al., 2016). Furthermore, the recent study also proves that the performance of modular fleet is heavily influenced by the optimality of operation decisions (Li and Epureanu, 2018, 2017a), which make the analytical solution intractable. A new method is required to investigate the tactical advantages of fleet modularity.
In this study, we go beyond existing literature and formulate an attacker-defender game and formulate an intelligent agent-based modeling techniques (Adhitya et al., 2007; Yu et al., 2009; Onggo and Karatas, 2016; Li and Epureanu, 2017a) to imitate the human-like decision making process. We combine optimization techniques and artificial intelligence to enable each player (fleet) to make real-timely decisions based on experience and optimize their decisions accordingly. By selecting one player as modular fleet and another as conventional fleet, we emphasize the benefits of fleet modularity, including adaptability and unpredictability, through competing in a generalized mission scenario.
2 Game Formation
During the armed conflicts with Iraq and Afghanistan, the U.S. faced supply shortages due to exogenous supply chain disruptions (Xu et al., 2016). We borrowed this example and modeled it as a competition between two military forces. In this game, we assume each force is composed of a fleet of vehicles, i.e., fleet red and fleet blue. The goal of fleet operation is to satisfy the supply demands which randomly appear at battle fields. Each demand contains the due time, required materials, personnel, and target fleet to accomplish the supply. In order to satisfy the demand, we need to ensure the dispatched convoy, which are vehicles selected from fleet, owns enough capacities and firepower to guarantee the safety of transportation. For convenience, all demands are automatically converted into the attribute requirements for the convoy, i.e., fire power, water capacity, etc. We denote the demands received at time as . According to the due time of demand, attributes required to be satisfied at a future time can be obtained as , where represents the attributes of type to be satisfied before time . Matrix can also be created and updated to record the demands to be satisfied in the planning horizon , i.e., . Correspondingly, attributes carried by on-field convoy at time are .
The fleet targeted by demand becomes defender, with the goal of delivering a qualified convoy to field on time. The other becomes attacker automatically, which aims to disrupt the delivery of defender by dispatching an attacker convoy. In other words, each demand initializes a supply task for one fleet and attack task for another. Based on the demand, the role of the player dynamically changes correspondingly. Common rules of the game are assumed to be known to the players: delivery of a qualified convoy to the assigned battlefield makes the defender win and attacker lose. Because the damage from the attacker can reduce the attributes of the convoy, defender may lose the game even if a qualified convoy is dispatched.
We denote the resulted conflict between convoys as an event and assume the dispatch decision made from attacker and defender are simultaneous, which specifies the game as simultaneous-move. To simplify the problem, we assume that both fleets can simultaneously sense the demands regardless of its target. Thus, given the same probability to be selected as target fleet, each fleet will play equivalent times of defender or attacker to guarantee the fairness of the game. Fig.2 illustrates the convoy competition in a multiple battle fields scenario.
Following the assumptions from previous research (Azaiez and Bier, 2007; Wang and Bier, 2011), both attacker and defender are modeled as rational and strategic. Based on the simulation results, we summarized 10 types of dispatch strategies for attacker and defender respectively, as shown in Tab.1. For an attacker, firepower is the only attribute needed to win. Thus, we measure the strategy by times of required firepower for enemy. For defender, they need to guarantee the delivered convoy to satisfy the demands with consideration of attribute losses from the enemy’s attack. The strategy is a mixture of decisions in selecting of safety coefficients of fire power and capacity . We cluster all the dispatch orders that are less than the requirements in strategy 1, which means the defender gives up the game to save force once a strong enemy convoy is predicted.
In this study, the amount of damage is based on the comparison of the fire power carried by different convoys. The probability of damage of type component for team red and team blue are,
where is the amount of firepower carried by convoy . represents the damage factor for a component in type . Each component in the convoy gets damage stochastically based on the calculated probability.
In order to create a fair game, we constrain the amount of supplies, and assume all damaged resources are recoverable. Thus, the amount of resources for both fleets are constant, but the conditions of resources are dynamic. We penalize damage by a long time to recover. Recovery strategy for damaged vehicles is to replace all the damaged components by healthy ones. Once fleet modularity is considered, disassembly becomes another option in dealing with a badly damaged vehicle.
Several assumptions are also used to simplify this problem by reserving a reasonable level of fidelity:
Each fleet can accurately observe and record the damage occurred in its convoy.
Each fleet can accurately observe the composition of enemy’s convoy in every event, i.e., number and types of vehicles.
Convoy will return to base immediately after task.
Mission success will be reported to both fleets once it is completed.
No other type of vehicle damage is considered besides attack.
All components are recoverable.
Damage independently occurs based on the probability of damage.
Inventory status is updated every hour and accessible to all the agents.
The sequence of vehicle assembly, disassembly, and recovery is ignored.
3 Problem Statement
In order to explore the advanced tactics from fleet modularity, we need to first consider the way a human commander performs yield operation decisions. With simplification, the main procedure involved in human-like decision making (Chen and Barnes, 2013) are 1) perceive field information, 2) analyze enemy’s behavior based on the received information, 3) optimally schedule the on-field actions to beat enemy. Through these procedures, decision makers could adaptively adjust their dispatch strategy and operation plan based on the learned enemy’s behaviors, i.e., what kind of strategy they might adopt in a specific situation. Combining with fleet modularity, how to imitate the human-like decision making process with guaranteed decision optimality and efficiency remain to to be solved.
Another existing challenge is the management of inventory with high diversity. We denote a vehicle with one or more than one damaged components as damaged vehicles, which generates numerous types of vehicle damage. For example, a single vehicle with 5 different components has types of vehicle damage, which leads to distinct recovery strategy. Furthermore, given the limited working capacity and inventory status, it is still challenging to effectively and efficiently schedule the operations and select the recovery strategy in reaction to the stochastically arrived demands while maintaining reasonably healthy stock levels.
4 Agent-Based Model
With consideration of computational load and battlefield decision making process, we present an agent-based model to automatically yield adaptive tactics and real-timely plan for operational actions accordingly. To simplify the notation, we describe the approach from the standpoint of fleet blue, and proposed an approach to beat the enemy, i.e., fleet red. Three types of agents models are created to perform different functionalities. The decisions making process is then achieved by the cooporation of three types of agents. The interconnections are shown in Fig. 3.
Inference Agent: analyze enemy’s historical behaviors, forecast enemy’s future actions.
Dispatch Agent: optimize dispatch order based on inference.
Base Agent: optimally plan for the operation actions to satisfy the dispatch order.
4.1 Inference Agent
As simultaneous-move is considered, it is critical to forecast the enemy’s actions to counter. As combat resources and workshop capacity are limited, it is possible to possible to get cues from the enemy’s historical dispatch actions in inference. For example, if enemy dispatched a significant size of convoy in the short past, it is possible to conclude that the enemy is not capable of grouping up a strong force in the short future. Meanwhile, existing damage in enemy’s resources can also be analyzed by comparing the fire power of dispatched convoys in historical events. The amount of damage is also useful for decision maker to infer enemy’s available forces.
The information that can be used for inference is very limited, including demand records , our previous dispatch, , and enemy’s previous dispatches . Dispatch decisions depend on optimization algorithm, inference and personality of commander, which leads to a remarkable nonlinearity in modeling the decision making process. Meanwhile, as commander needs to adjust its strategy after learning from the enemy. It requires the prediction model capable of exchanging information from outsides, and make corrections once needed. We adopt techniques from artificial intelligence to solve this problem.
. Compared to the neural network, it can memorize a certain period of historical data and analyze its influence on the future. Long short-term memory model networks is one of the popular RNN, which is capable of learning long-term dependencies which is widely implemented in the neural language without gradient vanishing problem in RNN. In this study, we implement a variant model of RNN, long short-term memory (LSTM) as our predictive model to capture the correlations in enemy’s sequential decisions. The model is widely used in forecasting based on sequential data, including, stock market price(Chen et al., 2015; Di Persio and Honchar, 2016; Fischer and Krauss, 2017), traffic (Ma et al., 2015; Zhao et al., 2017), etc. In this study, we model the inference of the enemy’s strategy as a classification problem, where each class corresponds to a strategy. The inputs of training data is the record for each event, including enemy’s dispatched convoy, our dispatched convoy and received demand, which is a time-series data recording all the information during review horizon . The outputs of the training example is the actual dispatch strategy adopted by the enemy. The architecture of LSTM used in this study is shown in Fig. 4.
The status of LSTM at time are described by input gate , forget gate , output gate , cell state and activation at hidden layer . The forward propagation can be described by
where, are weights and biases to be obtained through training.
is the sigmoid function.is the number of hidden layer. is the softmax function.
records the estimated probability of each class based on inputs from training sampleand weights of the model, i.e.,
. The loss function is represented by a cross entropy equation:
where, is a binary indicator (0 or 1), with value if class label is the correct classification and otherwise. Thus, the training of the model is to minimize the sum of entropy of the training set to find the best model parameters through backward propagation (Hecht-Nielsen, 1992), i.e.,
thus, enemy’s behavior can be forecasted through Eq.14.
Based on the predicted enemy’s strategy , we can calculate the possible enemy’s dispatch order by using the upper bound of strategy of Tab. 1.
4.2 Dispatch Agent
The goal of convoy dispatch is to determine the desired attributes that need to be carried by our convoy to maximize the win rate. Based on game formation, a convoy with higher attributes, especially in fire power, indicates a higher chance to win. However, as resources are limited, the less vehicles are ordered, the higher chance that the order can be achieved by the inventory planner. The convoy dispatch should be carefully planned to guarantee the win rate of current mission without overdrawing resources.
To avoid the overuse or underutilization of available attributes, it is important to have an accurate evaluation of our current forces before making dispatch decisions. Available attributes in the future depend on inventory status and on-base action scheduling. As time delays exist in operations and future demands are only partially observable. It is hard to infer the actual planning that will be made in the short future, which may change the available attributes at dispatching time totally. This difficulty becomes one of the challenges in placing the dispatch order. Even we ascertain that our convoy order can be achieved by base, it is still uncertain for them to determine the probability that this convoy can win. Because all the players do not know how the vehicle gets damaged, it requires players to speculate the damage mechanism based on the experience.
To resolve this problem, we decoupled the estimation of event success in two parts, which are feasibility of order and conditional success rate if order is feasible . The probability of wining an event for convoy blue can be calculated.
Based on the historical records, the feasibility of order can be flagged by (infeasible) and feasible based on the comparison between dispatch order and actually dispatched convoy . We denote the order is feasible if , is infeasible otherwise. As optimizations are implemented in operation planning, the relationship between factors and feasibility is complex and nonlinear. We implement a neural network model (Hagan et al., 1996; Atsalakis et al., 2018; Rezaee et al., 2018) to capture these nonlinear inter-connections, as shown in Fig. 5. The outputs of training set is the feasibility of the order (1 for feasible, 0 for infeasible).
With enough training, the model is capable of evaluating the feasibility of dispatch order across diverse operation situations. To capture the changing of inventory operation strategy, we periodically retrain the model based on the latest operation information. The relationship between model inputs and feasibility rate can be described by Eq. 16,
where, records the number of vehicles and component stocks on base at time . is the changes in inventory stocks at time from unfinished actions.
4.2.2 Conditional Success Rate
Vehicle damage plays an important role in determining the success of a mission. However, it is driven by a stochastic process and will vary according to the changes in terrain, operational preparations and soldiers’ reactions. It requires the model to be able to capture the complexity in the damage mechanism. Combining with the nonlinearity in predicating the success, we adopted another neural network model for success rate forecasting, as shown in Fig. 6.
Similarly, the outputs of training set are success reports of previous events (1 for success, 0 for fail). Given forecasted enemy’s convoy attributes , the trained model will yield conditional win rate for a certain dispatch order and mission requirements. The model is capable of capturing the changes in damage mechanism by continuously feeding in the latest combat information and results. By denoting the trained neural network model for success as , the probability of success can be also calculated.
For each dispatch order , the above approach provides the way to estimate the probability of success and feasibility based on predicted enemy’s behavior, demand information and inventory status. An optimization model can be used to seek the optimal dispatch order to maximize win rate or minimize failure rate, i.e., . Combining with Eqn. 16,17, a nonlinear programming model can be formulated to seek the optimal dispatch order.
where, is the decision variable that specifies the desired attributes to be carried by the convoy. Thus, the number of decision variables is the number of attribute types (3 10). However, because of the non-convexity in objective function, it is intractable to get global optimum by the gradient-based approach. In this study, we implement a pattern search technique to yield optimal dispatch decisions.
As the minimized failure rate can be any value in the range of , dispatch agent should be capable capable of giving up the mission once a very high failure rate is calculated. There is also a stream of literature studying risk preferences in repeated and evolutionary games (Roos and Nau, 2010; Lam and Leung, 2006; Zhang et al., 2018). We define the as a customizable parameter to represent the minimal failure rate that can be tolerated, which filters the dispatch order as
Thus, a convoy can be dispatched only when commander is confident enough. In this study, risk aversion behavior is purely related to which is constant during operation. As a future work, it is also interesting to vary to seek an advanced fleet operation strategy, i.e., combination of risk-prone and risk-averse (Roos and Nau, 2010).
4.3 Base Agent
Based on the behavior analysis from inference agent and dispatch order suggestion from dispatch agent, base agent is the one to plan operational actions to accomplish the orders. Li et al. proposed a model predictive control based approach to real-timely schedule the operation actions in reacting to the received demands. However, they did not consider the possible damage that occurs during the fleet operation (Li and Epureanu, 2018). In this section, we further their research by considering the possible damage during fleet operation and manage the inventory based on the resulting diverse conditions. For convenience, we simplify the notation of operation actions for fleet from to in this section as no enemy is considered.
As resources for each player are limited and repairable, it is important to schedule the operation actions properly to recover damaged resources and increase utility rate. It is also essential to allocate the capacity properly to balance between order satisfaction and damage recovery. In this section, we first model military fleet operation as a time-varying dynamical system. Then, a model predictive control is proposed to manage system dynamics thus achieving operation management.
4.3.1 Dynamical System
The dynamics in fleet operation is mainly located at the changes of inventory stocks and remaining demands, in terms of
Unsatisfied demands, .
Although healthy vehicles and damaged vehicles are recorded in the similar structures, their meanings are totally different. For healthy stocks and damaged components, the subscript of variable is the type of vehicle/component; the value of variable indicates the number. For damaged stocks, the subscript is the index of damaged vehicles, which is created based on the vehicle receive date. Binary values are used to represent the status of the vehicle, where, 1 represents that the damaged stocks remain to be repaired; 0 indicates the damaged stock is recovered or not received yet. Vehicle type , number of damaged components and healthy components in the damaged vehicle of type are also recorded as a reference for the repair. These data are time variant as the number and type of damaged vehicles keep changing with newly occurred damage and vehicle recovery. These data structure can bypass the numerous states incurred from diverse vehicle damage patterns, as we distinguish the vehicles with different damage as different damaged vehicles. We create a state for each newly arrived damaged vehicle and remove the corresponding state once the damaged vehicle is recovered, i.e., state value changes from 1 to 0.
Vehicle conditions are reported to base agent as one of the inputs, we summarize all inputs to the system as
Returning healthy vehicles, ,
Returning damaged vehicles, ,
Dispatch order from dispatch agent, .
Based on the characteristics of fleet operation, the operational actions to be determined are also distinct. For conventional fleet, the operation actions include
Recovery of damaged vehicle,
Recovery of damaged component, .
The dynamics of vehicle stocks of type , component stocks of type , damaged vehicle of index , damaged components of type , and remaining attributes of type are shown by Eq. 20, 21, 22, 23, 24 respectively
By introducing fleet modularity, several additional operation actions are available, in terms of
Damaged vehicle disassembly, .
With consideration of these actions, dynamic equations become
Because of the delays in operation actions, current inventory stocks might be influenced by previously-determined actions. In other words, the current actions may impact the stock level in the future. Thus, we define the state of the system by all inventory statuses that might be influenced by current actions, , i.e.,
We use input matrices to connect the current actions at time to inventory level at a later time . Furthermore, damage on stocks keep changing along with time. The matrices that connect to previous states , actions and inputs are also time-varying matrices. Thus, the system dynamics for both fleets can be written as
Thus, a state space model can be created to record the influence from the actions at a single time point to the states in the short future, as shown in Eq. 35
4.3.2 System Control
The goal of system control is to meet the received dispatch orders on time. In the decision making process, the predictions of future system are always involved. For example, given several dispatch orders, one may want to know what are the influences from satisfying one order on others. Compared to the classical control methodologies, e.g., PID control, MPC makes better use of future information and adapts to the system changes (Li and Epureanu, 2017b). We separate this section into two parts, future state prediction and optimization of operation decisions.
Future State Prediction
Because of time delays in the operation actions, the operation decisions made at the current time have to guarantee the match between the attributes of the dispatched convoy and ordered attributes. Given
current system states
operation actions in the future
system input, ,
The future system states are predictable by iteratively substituting Eq. 35. Thus, we can express the as a function of
with being the matrix that connects the future system outputs with current system states, and being the matrix that connects the system outputs with the future operation actions and inputs respectively. Although the dynamical system is changing along with time, we assume it is constant at each decision making time. The system keeps updating to ensure we optimize the operation actions based on accurate system status.
The optimization of fleet operation originates from two facts: 1. convoy with insufficient attributes suffer a remarkable risk in losing the mission; 2. convoy with redundant attributes can also deteriorate the overall fleet performance from utility reduction. Furthermore, we also consider several operational costs that may be significant in the real-world fleet operation. As a summary, the costs of interest are,
Attribute redundancy cost,
Attribute insufficiency cost,
ADR action cost,
Inventory holding cost,
Therefore, the cost function is shown as
where are non-negative variables created to remove the nonlinearity, which satisfies that
We record the insufficient and redundant attributes during the planning horizon as and respectively. The holding costs and actions related costs are also aggregated as and . By substituting in Eq.39, we created a mixed-integer programming model to optimize operational decisions
where, are index of inventory stocks and remaining dispatch orders in states respectively. Constraint (a) ensures that all operational decisions are non-negative and integer; (b) indicates that the amount of inventory stocks are non-negative; (c) ensures that the on-base ADR actions are always constrained by the maximum action capacity ; (d) preserves the balance between auxiliary variables and remaining orders to be satisfied; (e) specifies that each damaged vehicle can only be recovered by one recovery strategy. As cost function and constraints are linear and the number of decision variables is huge, we first implement a cutting-plan to reduce the decision space and then use the integer programming solver to get the solution. Time required for decision making of each time point is less than 1 second for operating 5 types of modular vehicles with planning horizon as 12 hours.
5 Numerical Illustrations
In this section, we provide numerical illustrations in a generalized mission scenario to study the different impacts of modularity on fleet performance. In general, it may be difficult to estimate the parameters accurately. However, we believe that it may be possible to get reasonable estimates for these parameters by using expert judgments and data from the existing literature. In this study, we assume the resources provided to fleet operation is constant and equal, which can be imagined as a competition of two fleets at an isolated island. One of them is conventional fleet; the other is modular fleet. Initially, ten of each type of vehicle and component are provided to both fleets. Demands randomly occur at battle field based on Poisson distribution with time interval as 10 hours. Demands include personnel capacity, material capacity and fire power
, which are generated based on Gaussian distribution as shown in Eq.46, 47, 48. Because of the lack of diversity in the existing designs of modular vehicles, we borrowed five types of modular vehicles as well as six resulted modules from (Li and Epureanu, 2017b). The attributes carried by each vehicle are summarized in Tab. 2.
The costs of insufficiency and redundancy are created based on the heuristic rules. For example, convoy usually suffers a high risk of failure once attributes of dispatched convoy are less than ordered. Thus, the cost for attribute insufficiency is assigned much higher than attribute redundancy, i.e.,. The costs for operation actions are created based on the difficulties and time required.
We assigned the time required for module assembly and disassembly as constant vectors. Vehicle assembly/disassembly time is calculated by summing up all the time required for its components. Similarly, for repair and reconfiguration, we first sum all the actions required to process each individual components in the vehicle. We assume that the interface between components are well-designed to achieve quick vehicle reconfiguration, where assembly and disassembly time for all types of components are 1 hour and 0.5 hour respectively. We assume that on-base ADR actions are proceeded in a generalized work station, thus, the number of stations determines the amount of available capacity. In this study, the number of available work stations for both fleets is assigned as 10.
We propose a discrete event model to simulate fleet competition for three years. We separate the mission into two parts, which are stochastic stage ( year)and learning stage ( and year). In stochastic stage, dispatch agent randomly picks up a dispatch strategy based on Tab.1 and passes this decisions to base agent. First-year operations generate time-series data, including combat history, feasibility records, etc., which are important inputs for the learning model. Training of learning model starts at the beginning of learning stage, where inference agent and dispatch agent make decisions based on the historical enemy’s behavior. Learning models are also updated monthly to ensure they reflect the enemy’s latest behavior.
6 Fleet comparison
In this section, we compare the fleet performance between modular fleet and conventional fleet in different stages. As one of the important metrics in measuring the fleet performance, we first compare the probability of win based on the results from multiple simulation, which is shown in Fig.7. According to the plot, conventional fleet outperforms modular fleet at the stochastic stage. However, once the intelligence of agent is introduced, i.e., both fleets entered learning stage, modular fleet gradually receives more wins of the game. A well separation can also be noticed along with the learning stage, which indicates a solid leading position of modular fleet during the learning stage. To explain these results, we first compare the attributes carried by the actual dispatched convoy between fleets, and then compare the estimation accuracy in inference of enemy, order success and feasibility.
During the stochastic stage, dispatch agents from both teams place dispatch orders based on randomly selected strategy. The way of strategy selection and order achievement are also equivalent for both fleets. In order to explain the better performance of conventional fleet, it is necessary to explore the accuracy of both fleets in satisfying dispatch order. Mismatched attributes dramatically change the fleet performance: convoy with insufficient attributes may significantly raise the failure rate and damage; convoy with redundant attributes may increase the win rate slightly, but it can also contribute to the insufficiency in the short future because of the limitation of resources. Thus we calculate the amount of overused and insufficient convoy attributes during every month and compare the dispatch accuracy in Fig. 8.
Compared to the modular fleet, conventional fleet suffers remarkable redundancy. The higher redundancy comes from the rigidity of conventional fleet operation. As we know, modular fleet can real timely reconfigure itself to fit the dispatch order, however, conventional fleet can only wait for vehicles returned from field or recovery. This limitation is proven to hamper the ability of conventional fleet in satisfying the dispatch order. Once proper vehicles are scarce, conventional fleet has to use improper vehicles with little desired attributes to avoid the insufficiency. This rigidity in fleet operation is beneficial in improving the success rate during the stochastic stage, because its opponent cannot be aware of the unexpected additional attributes. However, once the opponent start to study the behavior, this advantage no longer exists. From the failures at stochastic stage, modular fleet realize that conventional fleet intents to dispatch convoy with superfluous attributes. As a solution, they increase the attributes of dispatch order correspondingly. These redundant attributes are powerless in reacting to a well-armed enemy’s convoy to make the conventional fleet stay the lead.
Besides the better understanding of enemy’s behavior, intelligent agents also improve their understandings of the game along with time. To address the changes, we first denote the ability of convoy as maximum attributes can be achieved at each month, and compare the ability of both fleets in Fig.9. After entering the learning stage, both fleets raise the ability of convoy in all types of attributes, especially in firepower. Once modular fleet realize the importance of fire power to win the event by model , the combat vehicles are rapidly formed from reconfiguration to boost the available fire power.
The swift reconfiguration of modular fleet lead to a dramatic increase of damage to the enemy in the first few months of learning stage, as shown in Fig.10. Although conventional fleet intend to increase the fire power to fight back, the limitation in vehicle structure results in a lower upper limit in the convoy ability. Thus, the difference in ability makes conventional fleet suffer higher damage from more dispatch, which forces the conventional fleet to operate in sub-healthy conditions for a long time.
The strategies used in learning stage are also distinct between two fleets. Fig.11 compares the proportion of strategies adopted by different fleets. After learning of game, both fleets prefer to select the defense strategy with large amount of firepower and fair amount of capacity, i.e., strategy 8,9. Because of the flexibility of fleet structure, modular fleet can be easily adapted to the vehicle damage and enemy’s behavior, which leads to a better balance between different types of vehicles to perform a stronger strategy. The defense strategy selection also impacts the attack strategy. Compared to modular fleet, conventional fleet is much more likely to give up mission because of resources insufficiency. This weakness makes the modular fleet confident in dispatching little or even no combat vehicles to win the game. As an evidence, the proportion of strategy 1 used by modular fleet is much higher than that by conventional fleet. Meanwhile, modular fleet is more capable of performing aggressive strategies, i.e., strategy 8,9,10, more often than conventional fleet once a strong enemy is sensed.
To further investigate the improved performance of modularized fleet, we also compared the inference accuracy between different fleets. We denote the mean square error (MSE) between forecasted and actual convoy attributes as the metric to quantify the inference accuracy. As can be seen from the comparison in Fig.12. Inference errors are significantly high at the beginning of the learning stage, because agents are trained by the data from stochastic dispatch, which contributes little to forecasting the behavior of a rational competent. Along with the learning process, more combat and operation records, which are based on trained ABM, are generated, in which enemy’s behaviors are more explainable. As a result, inference errors are significantly reduced in the following four months. However, the inference error keep fluctuating during the rest of the learning stage, because both fleets keep checking and countering the other’s behavior.
The results also show that it is easier to infer the strategy of conventional fleet than modular fleet, especially in the attribute of firepower. It originates from the higher freedom in decision making after fleet modularity. As a defender, fleet usually needs to prepare a convoy with all types of attributes to satisfy the demands. With limited vehicle stocks, decision maker of conventional fleet has constrained choice of strategy. However, for the modular fleet, they could vary the dispatch strategy by real-time vehicle reconfiguration, i.e., reconfigure cargos to combat vehicles to achieve the switch from Strategy 4 to Strategy 8.
However, the burden of modularity is also significant, which is the high acquisition of capacity. According to Fig.13, modular fleet always requires more machines than conventional fleet because of additional ADR actions. It can also be observed that machine requirements are increased significantly once entering the learning stage, which comes from damage from smarter strikes by enemy. The higher losses in conventional fleet also shrink the difference in machine usage at learning stage. In this study, we only test the fleet performance at a certain capacity, studies investigating on the influence of capacity can be found in the following literature (Li and Epureanu, 2017b, 2018).
In this paper, we investigate the benefits and burdens from fleet modularization by simulating an attacker-defender game between modular fleet and conventional fleet. We simulate the fleet competition for three years which are divided into stochastic stage and learning stage. By contrasting the simulation results from two fleets, we find that conventional fleet stay the leads when both fleets are selecting strategies stochastically; modular fleet outperforms conventional fleet once intelligence of the decision maker is considered. With additional operational flexibility from on-base ADR actions, modular fleet exhibits a better adaptability in reacting to the enemy’s actions, higher upper limit in convoy formation and a more significant unpredictability from the additional flexibility in operation.
This research has been supported by the Automotive Research Center, a US Army Center of Excellence in Modeling and Simulation of Ground Vehicle Systems, headquartered at the University of Michigan. This support is gratefully acknowledged. The authors are solely responsible for opinions contained herein.
- Adhitya et al. (2007) Adhitya, A., Srinivasan, R., and Karimi, I. A. (2007). A model-based rescheduling framework for managing abnormal supply chain events. Computers & Chemical Engineering, 31(5):496–518.
- Atsalakis et al. (2018) Atsalakis, G. S., Atsalaki, I. G., and Zopounidis, C. (2018). Forecasting the success of a new tourism service by a neuro-fuzzy technique. European Journal of Operational Research, 268(2):716–727.
- Azaiez and Bier (2007) Azaiez, M. N. and Bier, V. M. (2007). Optimal resource allocation for security in reliability systems. European Journal of Operational Research, 181(2):773–786.
- Bayrak et al. (2016) Bayrak, A., Egilmez, M., Kaung, H., Xingyu, L., Park, J., Umpfenbach, E., Anderson, E., Gorsich, D., Hu, J., Papalambros, P., et al. (2016). A system of systems approach to the strategic feasibility of modular vehicle fleets. IEEE Transactions on Systems Man and Cybernetics.
- Chen and Barnes (2013) Chen, J. Y. and Barnes, M. J. (2013). Human-agent teaming for multi-robot control: A literature review. Technical report, Army Research Lab Aberdeen Proving Ground Md.
- Chen et al. (2015) Chen, K., Zhou, Y., and Dai, F. (2015). A lstm-based method for stock returns prediction: A case study of china stock market. In Big Data (Big Data), 2015 IEEE International Conference on, pages 2823–2824. IEEE.
- Dasch and Gorsich (2016) Dasch, J. M. and Gorsich, D. J. (2016). Survey of modular military vehicles: benefits and burdens. Technical report, Army Tank Automotive Research, Development and Engineering Center (TARDEC) Warren United States.
- Di Persio and Honchar (2016) Di Persio, L. and Honchar, O. (2016). Artificial neural networks approach to the forecast of stock market price movements. International Journal of Economics and Management Systems, 1:158–162.
- Dresher (1961) Dresher, M. (1961). Games of strategy: theory and applications. Technical report, Rand Corp Santa Monica CA.
- Evers et al. (2014) Evers, L., Barros, A. I., Monsuur, H., and Wagelmans, A. (2014). Online stochastic uav mission planning with time windows and time-sensitive targets. European Journal of Operational Research, 238(1):348–362.
- Fischer and Krauss (2017) Fischer, T. and Krauss, C. (2017). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research.
- Hagan et al. (1996) Hagan, M. T., Demuth, H. B., Beale, M. H., et al. (1996). Neural network design, volume 20. Pws Pub. Boston.
- Hausken and Levitin (2009) Hausken, K. and Levitin, G. (2009). Minmax defense strategy for complex multi-state systems. Reliability Engineering & System Safety, 94(2):577–587.
- Hausken and Zhuang (2011) Hausken, K. and Zhuang, J. (2011). Defending against a terrorist who accumulates resources. Military Operations Research, pages 21–39.
Hecht-Nielsen, R. (1992).
Theory of the backpropagation neural network.In Neural networks for perception, pages 65–93. Elsevier.
- Jose and Zhuang (2013) Jose, V. R. R. and Zhuang, J. (2013). Technology adoption, accumulation, and competition in multi-period attacker-defender games. Military Operations Research, 18(2):33–47.
- Kardes and Hall (2005) Kardes, E. and Hall, R. (2005). Survey of literature on strategic decision making in the presence of adversaries. Technical report, CREATE Homeland Security Center.
- Lam and Leung (2006) Lam, K.-m. and Leung, H.-f. (2006). Formalizing risk strategies and risk strategy equilibrium in agent interactions modeled as infinitely repeated games. In Pacific Rim International Workshop on Multi-Agents, pages 138–149. Springer.
- Landa (1991) Landa, M. d. (1991). War in the age of intelligent machines. Zone Books.
- Li and Epureanu (2017a) Li, X. and Epureanu, B. I. (2017a). Intelligent agent-based dynamic scheduling for military modular vehicle fleets. In IIE Annual Conference. Proceedings, pages 404–409. Institute of Industrial and Systems Engineers (IISE).
- Li and Epureanu (2017b) Li, X. and Epureanu, B. I. (2017b). Robustness and adaptability analysis of future military modular fleet operation system. In ASME 2017 Dynamic Systems and Control Conference, pages V002T05A003–V002T05A003. American Society of Mechanical Engineers.
- Li and Epureanu (2018) Li, X. and Epureanu, B. I. (2018). An agent-based approach for optimizing modular vehicle fleet operation. Review of International Journal of Production Economics.
- Lynch (2015) Lynch, J. (2015). On Strategic Unpredictability. https://mwi.usma.edu/2015223on-strategic-unpredictability. [Online; accessed 2 March 2015].
- Ma et al. (2015) Ma, X., Tao, Z., Wang, Y., Yu, H., and Wang, Y. (2015). Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies, 54:187–197.
- Mikolov et al. (2010) Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
- Onggo and Karatas (2016) Onggo, B. S. and Karatas, M. (2016). Test-driven simulation modelling: A case study using agent-based maritime search-operation simulation. European Journal of Operational Research, 254(2):517–531.
- Paulson et al. (2016) Paulson, E. C., Linkov, I., and Keisler, J. M. (2016). A game theoretic model for resource allocation among countermeasures with multiple attributes. European Journal of Operational Research, 252(2):610–622.
- Powell (2007) Powell, R. (2007). Defending against terrorist attacks with limited resources. American Political Science Review, 101(3):527–541.
- Rezaee et al. (2018) Rezaee, M. J., Jozmaleki, M., and Valipour, M. (2018). Integrating dynamic fuzzy c-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange. Physica A: Statistical Mechanics and its Applications, 489:78–93.
- Roos and Nau (2010) Roos, P. and Nau, D. (2010). Risk preference and sequential choice in evolutionary games. Advances in Complex Systems, 13(04):559–578.
- Shinkman (2014) Shinkman, P. D. (2014). Trashed: US Gear in Afghanistan to be Sold. https://www.usnews.com/news/articles/2014/06/04/us-military-equipment-in-afghanistan-to-be-sold-scrapped. [Online; accessed 4 April 2014].
- Wang and Bier (2011) Wang, C. and Bier, V. M. (2011). Target-hardening decisions based on uncertain multiattribute terrorist utility. Decision Analysis, 8(4):286–302.
- Xu et al. (2016) Xu, J., Zhuang, J., and Liu, Z. (2016). Modeling and mitigating the effects of supply chain disruption in a defender–attacker game. Annals of Operations Research, 236(1):255–270.
- Yu et al. (2009) Yu, L., Wang, S., and Lai, K. K. (2009). An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring. European journal of operational research, 195(3):942–959.
- Zhang et al. (2018) Zhang, J., Zhuang, J., and Jose, V. R. R. (2018). The role of risk preferences in a multi-target defender-attacker resource allocation game. Reliability Engineering & System Safety, 169:95–104.
- Zhao et al. (2017) Zhao, Z., Chen, W., Wu, X., Chen, P. C., and Liu, J. (2017). Lstm network: a deep learning approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2):68–75.
- Zhuang et al. (2010) Zhuang, J., Bier, V. M., and Alagoz, O. (2010). Modeling secrecy and deception in a multiple-period attacker–defender signaling game. European Journal of Operational Research, 203(2):409–418.