Online multiplayer games are a tremendously popular part of the video game industry which now has revenues in excess of other entertainment industries such as film and sports (Wit20). These games typically pit either individual players or teams of players against each other in an adversarial game scenario (player versus player or PVP for short) with particular goals, such as capturing a flag or a match to the death.
In particular, we will focus on the online game Battlefield 3 produced by Electronic Arts. In Battlefield 3, online players are segregated into two teams by a queuing algorithm and then both teams are placed on a military battlefield with a designated goal determined by the game type, such as capture the flag or deathmatch. Players must coordinate with their teammates towards their goal while being opposed by players on the other team. Individual player success is measured by several outcomes, including number of kills, number of deaths and a game score that is influenced by the type of game.
In games such as Battlefield 3, the complexity of the battle scenarios allows for many different paths to victory and so a primary interest is isolating common play styles and strategies employed by participants. Identifying subgroups of the player population that employ similar play styles can help game designers to tailor tutorials for newer players in a style-specific way as well as aid queuing procedures to construct teams of players with complementary styles.
From a statistical perspective, the clustering of players into groups with similar play styles can serve the additional role of dimension reduction which is an important factor given the size of most game play datasets. As we will see in Section 2, our Battlefield 3 application consists of the results from half a million PVP matches involving over a thousand players that have been designated by the company as “focal” players. The amount of data available for all players of Battlefield 3 or similar games is much larger.
Clustering has been recognized previously as an important tool for understanding player preferences and their interactions with the game (GameAnalytics2013). Drachen12GunsSwords used clustering to identify player preferences for using vehicles over direct combat in Tera and Battlefield 2: Bad Company 2. Drachen09SOMTombRaider, Sifa13TombRaiderArchtype and Sifa13TombRaiderArchtype modeled styles of game play and how they evolve in Tomb Raider: Underworld.
Thurau11WOWLevels clustered players by how their character level changed over time and Thurau10Guilds employed clustering to understand how guilds evolve over time in World of Warcraft. Holmgard13SuperMario
used a hierarchical clustering method to group players based on how they differed from a “perfect” automated player inSuper Mario Brothers. Nogueira14FuzzyAffective used clustering for modeling how player emotions related to game events in the survival horror game Vanish. Bauckhage14BeyongHeatmaps identified hotspots of player activities based on clusters of player trajectories in Quake III. Tychsen08Personas defined design-based clusterings of players called personas in Hitman: Blood Money.
The most common approach among these previous studies is the clustering of players directly on outcomes of matches, such as kills, deaths or the match score. In contrast, we are more interested in discovering different player styles rather than just differentiating players based on their overall ability. We are defining play styles as the choices that players make in terms of the roles, game type and map type that they prefer, and how they perform under their choices. Thus we will focus on clustering players based on how their choices affect the Battlefield 3 game outcome, rather than clustering on game outcomes directly.
Our approach to play styles is based upon a regression model with game score as a function of the roles, game type, and map chosen by each player in each match. We also include the rank of each player in this regression model to account for differences in player ability. We estimate both global and player-specific coefficients on each of these covariates. The player-specific coefficients can also be interpreted as measures of how well that player performs relative to the global performance across all players under specific role/game/map choices.
The set of player-specific coefficients is what defines a player’s style: how each role/game/map choice by the player relates to their team’s performance in the match. We then discover common play styles across players by employing a semi-parametric Bayesian clustering approach based on a Dirichlet process, which allows us to discover groups of players that have similar coefficients. The Dirichlet process (Ferguson74PriorDistributions), as reviewed in Muller04NonparametricBayesian, is the basis for many model-based clustering approaches (Griffin06OrderBased; Teh06DP).
Relative to k-means, our choice of a model-based Bayesian clustering procedure has the important advantage that the number of clusters, i.e. uniqueplayer styles, does not have to be pre-specified. We employ k-means clustering to provide a good initialization for our model estimation, but the number of clusters can grow (or shrink) as the algorithm proceeds based on whether extra (or fewer) clusters are needed to provide the best explanation of the observed data.
Our semi-parametric Bayesian clustering approach is also related to the DP-means procedure of Kulis12DPMeans. However, our focus is on clustering player-specific regression coefficients (that define latent player styles
) rather than Gaussian means. More importantly, our Markov Chain Monte Carlo implementation does not produce a “hard clustering” that we would get from DP-means. Rather, players are able to switch between clusters (play styles) in our approach.
Our “soft clustering” strategy that allows players to move between clusters accommodates two types of potential variability exhibited by players in Battlefield 3. First, certain players may have a play style that is a true hybrid between two (or more) common play styles shared by many players. Second, certain players can transition to new play styles over time as they continue to play the game and so they should be allowed to move between play style clusters as we observe more matches for that player.
In Section 4, we apply our model to the large Battlefield 3 dataset described in Section 2. We first analyze a subset of our data with the static version of our model and then apply the adaptive version of our model to the larger complete dataset of Battlefield 3 matches. We conclude with a discussion in Section 5.
2 Battlefield 3 Data
Data from Battlefield 3 was provided by Electronic Arts through the Wharton Customer Analytics Initiative. Battlefield 3 is first-person military-themed online game which allows players to use a variety of weapons and vehicles in diverse environments across the globe, ranging from tight urban landscapes to open desert. Our data consists of 515,605 player-versus-player match logs taken from 1221 players.
In each match, online players are segregated into teams that are placed on a chosen map with a particular goal determined by the game type. There are 9 possible roles (e.g. assault, support, recon, engineer along with the vehicle roles of armored land, unarmored land, helicopter, boat, and jet), 17 possible game types (e.g. conquest, rush, and death match) and 30 possible maps (e.g Grand Bazaar, Noshahr Canals, or Operation Metro). For each match, our data contains information about each player’s chosen roles, map, and game type as well as each player’s rank (a measure of their progression).
Each player may choose more than one role in a match and they might join a match late or quit a match early. We only consider player matches for which the player played more than 5 minutes and accumulated more than 100 points. Shorter matches with less than 100 points tend to be the matches where a player quit early or joined late and thus did not have much time to exhibit their play style in the game.
There are several different match outcomes that we can consider: total match score, score just from combat, number of kills and number of deaths. In our analysis, we focus on total score as an outcome variable since it includes points due to both combat and objectives and there is a better indicator of how players performed overall in the match regardless of whether their team won or lost.
3 Model and Estimation
Our approach begins with a linear regression model on each player’s total score as a function of the character rank, roles, game type, and map. The set of player-specific coefficients from this regression defines a player’splay style: how each role and game and map choice relates to their performance. We then employ a Bayesian nonparametric clustering to find subsets of players that all share similar play styles.
3.1 Regression Model for Game Play Data
The first component of our method is a regression model of the total score for player in each match as a function of that player’s rank as well as their chosen roles.
As mentioned in Section 2, we focus on the total score, , as the outcome for player in match since it combines performance from both combat and objectives, so it should be a more comprehensive measure of performance than player kills or deaths. We log transform so that our residual errors
more closely match the assumption of being normally distributed.
Let us first consider the following linear model,
where . Each player’s is a non-negative integer value, ranging from 0 to 145 that measures the player’s progression in previous games up to that point. We include this covariate since players with higher ranks typically achieve higher scores due to their greater experience with game play.
The other covariates , and are indicator variables for the roles, game type, and map that are used by player in match . For example, if player used a helicopter in match , the corresponding role indicator variable would be 1; otherwise, it is zero. The coefficients on these indicators represent how players perform on average in particular roles/games/maps beyond what would be expected just based on their player rank.
Note that the coefficients in (3.1) are not indexed by player and so represent the global effects of each covariate estimated over all players in the data. These global coefficients provide insight into the relative importance of rank, roles, game types and maps on the performance of all players. However, model (3.1) is insufficient for our goal of estimating differences between players in terms of play styles.
We build player heterogeneity into our model by adding player-specific coefficients,
The coefficients and capture the average performance across all players and how, on average, player performance increases as their Battlefield 3 rank increases. The player-specific coefficients and capture how the overall performance of player differs from the average player and how player differs from average in their change in performance as their Battlefield 3 rank increases. The remaining player-specific coefficients represent how the performance of player
differs from the average performance of players in the specific role, map, and game types encountered in each of their matches. We denote with the vectorall the player-specific coefficients for player .
A player’s performance is also affected by the play of their opponents, and so ideally we would incorporate opponent choices into our model. Unfortunately our available data only consists of the choices/performance of our set of 1221 focal players, not their match-specific opponents.
3.2 Dirichlet Process Prior for Different Play Styles
The player-specific coefficients from our regression model (3.1) are what we define as the play style for each player: how that player’s rank and in-match role/game type/map choices affect their performance. Without any further modeling, we would be estimating a unique (i.e. unique play style) for each player in our Battlefield 3 data.
However, we may not have that many games available for each player and we risk over-fitting our match data with so many parameters in our model. A general solution to over-parameterized models is shrinkage that is encouraged by employing a common prior for the player-specific parameters . We employ a Dirichlet process (DP) specification for our common prior which encourages clustering of the player-specific parameters , allowing us to discover distinct play styles that are shared by groups of players.
In this formulation, we assume that the player-specific coefficients come from a shared but unspecified distribution and then a Dirichlet process prior is specified for ,
where is a prior measure that encapsulates prior beliefs about and is a concentration parameter that specifies the strength of those beliefs in . We provide details about our specification for prior parameters and in the next subsection.
Intuitively, a Dirichlet process prior can be viewed as a less parametric alternative to traditional random effects models, such as , where the player-specific ’s would be shrunk towards a single mean . In contrast, a Dirichlet process prior allows for an unspecified number of player-specific parameter means that are shared by subsets of players. As we will see in the next subsection, the player-specific for each player is clustered together with other highly similar player-specific parameters ’s from other players.
Thus, we will be creating a data-driven grouping of players that exhibit similar play styles. This approach also allows for a subset of players to be left ungrouped from the rest of the population (i.e. players that show a unique play style compared to all other players).
3.3 Model Estimation
We use Markov chain Monte Carlo to sample from the posterior distribution of the Bayesian semi-parametric model outlined in Sections 3.1-3.2. Specifically, we use a Gibbs sampling algorithm (GemGem84) where each parameter value is iteratively sampled based on the current values of the other parameters.
The primary step of our Gibbs sampling algorithm is the sampling of a new value for each , the play style parameters of player , conditional on the current values of the play styles of other players (which we collectively denote as
) and the residual variance,
With the Dirichlet Process prior outlined in Section 3.2, the conditional posterior distribution (3.4) is a mixture of the continuous prior measure and point masses located at each of the unique values (ie. clusters) in the current set of sampled play styles for other players in our data (Liu96).
We now describe this sampling step in more detail using an informed set of initial values for each
, though the Gibbs sampling algorithm can also be initialized with random starting values. We can use the ordinary least squares estimatesfrom the regression model (3.1) as initial estimates for each player’s play style parameters . We then cluster these player-specific ’s using K-means clustering (Mac67)
where we initially set the number of these initial clusters using the heuristic, where is the number of players in our dataset (MultivariateAnalysis80).
The centers of these initial clusters are also used to estimate the mean and variance of a multivariate normal distribution over these starting clusters. This multivariate normal distribution is used as the prior measure from which we can sample potentially new unique play styles (new cluster centers) during our cluster refinement. We then replace each player-specific with its K-means cluster center , so that each set of player-specific parameters initially take on only one of unique play style values.
During each step of our Gibbs sampling algorithm, we revisit the cluster assignment for each player’s play style based on the current set of clusters in the sampled play styles of all other players.
Specifically, for the sampling of a particular player’s play style parameters , we examine the current partition of the play styles of all other players except player . Let’s assume this current partition consists of clusters, where each cluster has cluster members and a cluster mean of . For each cluster
, we calculate the conditional posterior probability ofbelonging to that cluster,
where is the log total score of player in match and are their rank, role, game type, and map covariates for match as outlined in equation (3.1). The leading term of in equation (3.5) comes from our assumed Dirichlet process prior which gives preferential allocation towards larger clusters (WalJenDic10).
So we can see that the conditional posterior probabilityof belonging to cluster favors clusters that represent a common play style (i.e. large ) or that have a play style that predicts well the observed scores for player , (i.e. small ).
Through the prior measure , our Dirichlet process prior formulation also allows for the creation of a new cluster if player exhibits a particularly unique play style that is not represented well by the existing clusters. We implement this option in our Gibbs sampling algorithm by also considering a random sample from as a candidate value for .
Specifically, we sample a new unique set of play style parameters from the distribution that we specified above as our prior measure . Then the conditional posterior probability of taking on this new is
We see that the conditional posterior probability of creating a new unique play style for player is driven by the prior concentration as well as whether that new play style predicts well the observed scores for player (i.e. small ). In our application to Battlefield 3 in Section 4, we set the concentration parameter .
Combining all choices from equations (3.5) and (3.6) together, player is then sampled into one of the currently existing clusters or a potentially new cluster with probabilities proportional to . If player is sampled into an existing cluster , then their play style is set equal to the mean of that play style cluster, . If player is sampled into the new cluster defined by , then their play style is set equal to .
During this sweep through all players, any empty clusters are removed and the cluster means are updated any time that their cluster membership changes. So we see that each iteration of our Gibbs sampler iteratively refines the current set of common play styles while also allowing the number of play styles to vary.
We can also sample the residual variance from its conditional posterior distribution as part of our Gibbs sampling procedure. However, in our current implementation we instead fix equal to a point estimate where is the mean squared error from the ordinary least squares estimation of the regression model (3.1).
4 Application to Battlefield 3
As discussed in Section 2, data for the online military game Battlefield 3 was provided by Electronic Arts through the Wharton Customer Analytics Initiative. The dataset consists of 515,605 player-versus-player match logs taken from 1221 players. The global regression model (3.1) has 58 coefficients , which includes the intercept and slope on rank plus 56 coefficients for the different maps, roles and game types that can be chosen by players.
The player-specific regression model (3.1) adds 58 coefficients for each player that represent how their overall performance differs from the average player (), how they differ from the average player in their change in performance as their rank increases (), and how their performance differs from average in each role, map, and game type ().
Thus, we need to estimate a total of 70876 coefficients (58 global coefficients plus 58 coefficients per player 1221 players) as well as the residual variance parameter using the Markov Chain Monte Carlo procedure described in Section 3.3. Our model estimation was implemented in Matlab where parameter initialization via ordinary least squares takes 1.5 minutes and each step of the Gibbs sampler takes approximately 3 minutes on a 64-bit laptop having 8 GB RAM and 2.5 GHz processors. Our sparse matrix takes 20 MB of disk space and could be optimized to require less space.
We first provide an overall evaluation of our model fit in Section 4.1. We interpret the clusters of player-specific coefficients from our model to detect common play styles across Battlefield 3 players in Section 4.2, as well as examining hybrid players that switch between play styles in Section 4.3.
4.1 Evaluation of Model Fit
We can evaluate our model fit by examining the out-of-sample (OOS) prediction errors when using a 90-10 hold out scheme where the training set consists of a random 90% of each player’s matches and the test set is the remaining 10% of each player’s matches.
As a baseline for these evaluations, using a single global average across all players had an OOS root mean square error (RMSE) of 0.97. The initial regression model with player-specific coefficients (3.1) fit by ordinary least squares has an OOS RMSE of 0.79, which is a 19% predictive improvement. The best partition from our MCMC implementation had a similar OOS RMSE of 0.80 but with a substantially reduced number of unique play styles (only 13% of the unique coefficients compared to the OLS model).
4.2 Common Play Styles
We now shift our attention to interpretation of the parameters of our estimated model, with our first focus being an examination of common play styles exhibited by subsets of players in our Battlefield 3 data. These common play styles are inferred as the different clusters of player-specific coefficients in the maximum a posteriori (MAP) partition found by our Gibbs sampling algorithm. This MAP partition contains 90 clusters of play styles shared by multiple players.
Figure 1 gives the distribution of cluster sizes for the 90 clusters of multiple players in the MAP partition. The largest 30 clusters contain 72% of the 1221 players in our Battlefield 3 data.
We now examine in more detail some of the most common play styles (i.e. largest clusters) exhibited by players in our Battlefield 3 data. In Figure 2, we visualize the four most common play styles exhibited by players in our data, i.e. the four largest clusters in the MAP partition of the player-specific coefficients across all players.
Recall that the player-specific coefficients from our linear regression model (3.1) defines each player’s style: how that player’s score is affected by their rank and the roles, games, and maps encountered by that player in their Battlefield 3 matches. For example, a large player-specific intercept indicates that player performs better than the average player overall, whereas a large positive player-specific coefficient on rank indicates that player improves more quickly as they gain ranks than the average player.
The first cluster in Figure 2 corresponds to players with substantially positive player-specific intercepts, indicating overall performance that is higher than the average Battlefield 3 player. The other three clusters correspond to players who perform substantially better than the average player within specific combinations of role, game, and map.
Note that the roles, maps, and game types are inter-related since each map may support only a subset of game types and roles, e.g. if a player excels at a given map, they will often have higher scores for the roles and game types associated with that map. It is also worth noting that a subset of maps are only available as additional downloadable content and so we have less observed data for this subset of maps.
We label the first cluster in Figure 2 as the All-Stars: players with a large positive player-specific intercepts which indicates that the average game scores in their matches are substantially larger than the average Battlefield 3 player. The average log score of players in our data was approximately 8 whereas players in this All-Stars cluster had average log scores closer to 9. This group generally plays equally well on all roles, game types, and maps (indicated by relatively small coefficients for roles/games/maps). This cluster is also the largest found by our model, containing 11% of the players in our data. We will examine a representative member of this cluster in Figure 3.
Many clusters found by our model have particular map types as their primary determinants of the variation in their score. The second cluster in Figure 2 is labelled as the Map Specialists since this cluster has large positive coefficient values for two maps, Operation Metro and Grand Bazaar, indicating higher performance for these players than the average Battlefield 3 player in those specific map situations. This cluster contained about 4% of players in our data. In Figure 3, we show a representative member of this cluster who consistently performs well in Operation Metro.
Several common play styles found by our model have a particular game type as the main determinant of the variation in their score. The third cluster in Figure 2 is labelled as the Game Specialists, which have a high weight on rank and on two types of team death matches. This cluster also contained about 4% of players in our data. In Figure 3, we show a representative member of this cluster who consistently performs well in one type of death match.
Finally, we label many clusters as Role Specialists as they had large coefficients on a particular chosen role along with large coefficients on particular game types or maps that are well suited for that role. The fourth cluster in Figure 2 is a cluster with the largest positive coefficients on the “assault” role as well as “team death match” game type, which incidates better performance than the average Battlefield 3 player in those game and role situations. This cluster contained about 3% of the players in our data. Figure 3 shows a representative member of this cluster, who performs significantly better than average in one team death match type as well as assault.
In Figure 3, we give examples of four player profiles, each one being a representative member of the four clusters shown in Figure 2. The All-Stars player in Figure 3 tends to perform above average across almost every role, game and map (recall that the average log score of players in our data was approximately 8 whereas this player had average log scores closer to 9). The Map Specialist player displays their best performance in certain maps, in this case the Operation Metro map, as well as certain game types associated with that map. The Game Specialist player consistently performs well in death match, which is one particular game type. The Role Specialist player in Figure 3 performs especially well in the assault role as well as the corresponding team death match game type.
Box plots of the distribution in performance for 4 players across a variety of different match settings. From left to right, colors indicate the distribution of log match scores for different subset of matches for that player: overall/all matches (red), by different role (green), by different game type (cyan), and by different maps (blue). Blank spaces indicate that the player did not have a match with that particular role, game, or map. Stars indicate that the player’s average log match score is significantly different from the global average (as determined by a t-test).
We note that all four players shown in Figure 3 had player-specific coefficients that tended to remain in the same clusters throughout the iterations of the Gibbs sampler. However, some players in our Battlefield 3 data had player-specific coefficients that frequently moved between different clusters, which would indicate a change in their play style over time. We examine players with these “hybrid” play styles in the next Section 4.3.
4.3 Hybrid Play Styles: Players that frequently change clusters
As outlined in Section 3.3, our model estimation allows each player to “move” between different play style by sampling their player-specific coefficients into different clusters of unique values across iterations of the Gibbs sampler. The transitions (or lack thereof) that a player makes between clusters gives an indication of how well that player fits any particular common play style.
In Figure 4, we show the distribution of the number of different play styles exhibited by the players in our Battlefield 3 data.
We want to contrast players whose cluster memberships do not change very often across MCMC iterations (‘stable’ players) versus players whose cluster memberships change frequently across MCMC iterations (‘hybrid’ players). In this analysis, we define players who remain in the same cluster for over 50% of the Gibbs sampler iterations as ‘stable’. With this criterion, 28% of the players in our Battlefield 3 data are stable players that consistently display the same common play style. All four representive players that are displayed in Figure 3 are stable members of their respective play style clusters.
We define hybrid players as those that transitioned between the same two clusters at least 4 times over all iteration of the Gibbs sampler. Under this criterion, 7% of players in our Battlefield 3 data are defined as hybrid players. In some cases, hybrid players belong to similar clusters, i.e. two different clusters that both have large coefficient values on the same feature.
A small group of players (3%) transitioned betweeen 9 different clusters of play style coefficients. Most of these players played fewer than 30 games at beginner ranks and so our model may not have enough data to estimate a more precise and stable play style for these players.
The complexity of game play in online multiplayer games such as Battlefield 3 has spurred a tremendous amount of interest in modeling different play styles and strategies towards victory. In particular, identifying subgroups of participants that use similar play styles would aid game developers in developing specialized tutorials for new participants as well as improving the construction of complementary teams in their online matching queues.
We contribute to this endeavor by developing a hierarchical Bayesian regression approach for Battlefield 3 players that models total game score as a function of player rank as well as the roles, game type, and map taken on by that player in each of their matches. Player-specific intercepts account for differences in overall player ability while player-specific coefficients for the other covariates can be interpreted as the strenths or weaknesses of each player under specific role, game and map choices.
We use a Dirichlet process prior that enables the clustering of players that have similar player-specific coefficients, which allows us to discover common play styles amongst our sample of Battlefield 3 players. This flexible semi-parametric Bayesian clustering approach does not require the number of distinct play styles to be known a priori and also allows for a subset of players to be left ungrouped with their own unique play styles. These characteristics are important for this application since prior information is lacking about the landscape of Battlefield 3 play styles.
In terms of overall predictive performance, the ordinary least squares version of our regression model has 19% better predictive accuracy compared to a baseline global average across all players whereas using the Dirichlet process prior leads to similar predictive accuracy but with a substantial reduction in the number of parameters (only 13% of the unique coefficients compared to the OLS model).
We examine several of the most common play styles among Battlefield 3 players and find a set of “all-star” players that exhibit high overall performance, as well as groupings of players that perform particularly well in specific game types (“game specialists”) or specific maps (“map specialists”), as well as groupings based on performance in specific roles (e.g. healing or assault). We are also able to differentiate betweeen players that are stable members of a particular play style from “hybrid” players that exhibit multiple play styles across their matches.
Future work will investigate whether these discovered groupings can help players improve their own performance or help matchmaking algorithms to form better teams of players with complimentary skill sets. We also plan to explore using additional game features as well as experimentiing with different outcome variables beyond total game score.
We wish to thank Electronic Arts and the Wharton Customer Analytics Initiative (WCAI) for their feedback and support.