A Discrete Evolutionary Model for Chess Players' Ratings

03/08/2011 ∙ by Trevor Fenner, et al. ∙ Birkbeck, University of London 0

The Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments. It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time t as a normal distribution, where the variance increases in time as a logarithmic function of t. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model. The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Elo system for rating chess players [Elo86], named after its creator Arpad Elo, has been employed by the World Chess Federation for over four decades as a method for assessing players’ strengths and ranking them in official tournaments. Although not without controversy, it is accepted as generally reliable, and is also used in other games and sports such as Scrabble, Go, American football and major league basketball.

The Elo rating system is based on the model of paired comparisons [Dav88], which can be applied to the problem of ranking any set of objects for which we have a preference relation. The model is particularly useful in that a ranking can be obtained in situations where a preference exists only for some of the pairs of objects under consideration. Paired comparison models have been successfully applied to measure ability in competitive games and sports [Joe91, Gli99], the most notable example being the widely used Elo system for rating chess players.

Several extensions to the Elo system have been proposed, notably the Glicko [Gli99] and TrueSkill [HMG06]

Bayesian rating systems. Both these systems estimate, in addition to the rating, the degree of uncertainty that the rating represents the player’s true ability. The uncertainty allows the system to control the change made to the rating after a game has been played. In particular, if the uncertainty is low then the changes made to the rating should be smaller as the rating is already reasonably accurate, while if the uncertainty is high then the changes made to the rating should be larger.

Here we adopt the Bradley-Terry model [BT52]

, which provides the theoretical underpinning of Elo’s model, where the probability

that a player , whose strength is , wins against a player , whose strength is , is given by the logistic function , namely

(1)

where is a positive scaling factor. We note that is strictly monotonically increasing, , and . Moreover,

(2)

In this paper we are interested in the distribution of ratings within the pool of players that arises as a result of the model induced by (1). We are not aware of any research in this direction, although it is generally accepted that this distribution is well approximated by a Gaussian (i.e. normal) distribution [CG96, BSMG09]. It is worth mentioning that Elo [Elo86]

claimed that the distribution of ratings of established chess players was not Gaussian, and suggested the Maxwell-Boltzmann distribution as an alternative that fitted the data he used slightly better.

The rest of the paper is organised as follows. In Section 2 we review the Elo rating system, and in Section 3

we do some exploratory data analysis on published official chess rating data. We show that the Gaussian distribution provides a very good fit to the data, but there is a small negative skew present. In Section 

4 we propose an evolutionary stochastic model, which as a first attempt assumes a symmetric distribution of ratings. The derivation of the distribution is presented in Section 5, where we prove that the resulting distribution is indeed normal, with the interesting feature that the variance increases with time in a logarithmic fashion. In Section 6 we validate the model using published rating data from January 2007 to January 2010, and in Section 7 we modify the model to allow for the skewness present in the data. With reference to this data, we show through simulation that the modified model yields a better approximation to the actual distribution. Finally, in Section 8 we give our concluding remarks.

2 Elo’s Rating System

We now summarise Elo’s rating system [Elo86] in order to set in context the evolutionary model that we present in Section 4.

The fundamental assumption of Elo’s rating system is that each player has a current playing strength. In a game played between players and , with unknown strengths and , the score of the game for player is denoted by , where is 1 if wins, 0 if loses and if the game is a draw. Its expected value is assumed to be [GJ99]

(3)

where is the expectation operator and

(4)

The Elo system attempts to estimate the strength of player using a calculated rating , which is adjusted according to the results of games played by . We observe that this model is related to the Bradley-Terry model for paired comparison data [BT52]; see also [Dav88].

After playing a game against player , player ’s rating is adjusted according to the following formula (see equation (2) in [GJ99])

(5)

where (known as the -factor) is the maximum number of points by which a rating can be changed as a result of a single game. (A high -factor gives more weight to recent results, while a low -factor increases the relative influence of results from earlier games.) In the Elo system the -factor is typically between 10 and 30. (There has been some controversy involving a recent proposal by the World Chess Federation to change the -factor [Son09, Zul09].) For the purpose of experimentation we have fixed the -factor at 20.

When using (5) to update , is estimated from (3) using the current values of and as estimates of and , respectively.

Player ’s rating is updated similarly. We note that, after updating both ’s and ’s ratings, the sum of their ratings remains unchanged. The above method can be straightforwardly extended to the case of a player competing in a tournament, or to a number of games played over a given period.

3 The Distribution of Elo Rating Data

The World Chess Federation, known as FIDE, publishes a rating list several times each year. Traditionally FIDE published the rating list every three months, but from 2009 has moved to bi-monthly publication; the official rating data can be obtained from http://ratings.fide.com.

Here we are interested in the distribution of the players’ ratings. It has been confirmed by Charness and Gerchak [CG96], and by Bilalić et al. [BSMG09]

that the distribution is well approximated by a Gaussian distribution. We recall that the probability density function for a Gaussian random variable

takes the form,

(6)

where is the mean and

is the standard deviation of

.

With these observations in mind, we performed some exploratory data analysis on the FIDE rating data from January 2007 to January 2010. To test the normality of the data, we binned each of the four data sets, taking the bin width to be 20 (the fixed -factor). The resulting plots for January 2007 to January 2010 are shown on the left-hand side of Figure 1.

We then fitted a constant multiple of a Gaussian distribution to each of the four data sets, using Matlab. The plots for the fitted data for January 2007 to January 2010 are shown on the right-hand side of Figure 1. The fitted parameters, , and , are shown in Table 1, where is the multiplicative constant. Clearly, is an approximation to the actual number of players . Table 1 also shows , the coefficient of determination [Mot95]. It can be seen that this is close to 1, which indicates a very good fit. For comparison, the last two columns in the table show the mean and standard deviation computed from the actual FIDE rating data. On average is about 7-15 Elo points greater than the fitted standard deviation .

Figure 1: Plots of the binned rating data (left) and the fitted Gaussians (right) from January 2007 to January 2010
Year
2007 75167 2096.400 151.604 0.9936 77056 2100.127 166.203
2008 84844 2077.400 166.452 0.9908 87075 2073.566 181.918
2009 97070 2034.000 183.706 0.9859 99223 2044.687 196.639
2010 107874 2007.600 202.092 0.9815 109373 2015.650 209.622
Table 1: The parameters for the fitted Gaussians in Figure 1

It can be seen that the plots on the left-hand side of Figure 1 appear to show a small negative skew. (We note that this is in contrast to the positive skew of the Maxwell-Boltzmann distribution, suggested by Elo [Elo86].) As a next step, we therefore investigated the skewness of the data for 13 rating periods from October 2006 to September 2009. The skewness is defined by

Figure 2: Skewness of rating (left) and mean rating linear fit (right), for 13 periods from October 2006 to September 2009

The skewness of the actual FIDE rating data is shown in the left-hand plot in Figure 2. As can be seen, it shows that there is a small negative skew, which has generally slowly increased over the period. (The increase in skewness in September 2009 is mostly due to FIDE temporarily lowering the minimum rating for new players from 1400 to 1200, and then reverting to the original policy in the following period.) The negative skew can be attributed to the slow decrease in the mean rating with the growing number of players, since it is more likely that a new player joining the pool will enter with a rating lower than the average. This can be formalised as follows.

Let be the number of players in the pool at the end of the first period and let be the mean rating of those players. We define and similarly for the second period. Then the total of the ratings of all players in the pool is for the first period and for the second period. Assuming the average rating of new players joining during the second period is , we have

yielding

(7)

We can approximate (7) by the differential equation

which has the solution

(8)

where is a constant.

The right-hand graph in Figure 2 shows the mean Elo rating plotted against the logarithm of the number of players . The linear fit shown is in good agreement with (8), with , , and . Thus the average rating is decreasing slowly as a linear function of the logarithm of the number of players in the pool. In addition, knowing would allow us to predict the rate of decrease, and also to estimate the skewness shown in the left-hand graph in Figure 2.

As we have seen above, the Gaussian distribution is a good first approximation. We pursue this further in Section 5 after we formalise the evolutionary model for players’ ratings in Section 4. We return in Section 7 to a more general model that takes skewness into account.

4 An Evolutionary Urn Transfer Model

In our evolutionary stochastic model for rating game players, two main types of event may take place. The first event type occurs when a new player enters the system. We make two assumptions related to such an event:

  1. that new players enter the system at a fixed rate, and

  2. that once players enter the system they do not leave it.

(We note that the model can be extended to allow players to leave the pool as long as the rate at which players enter the pool is greater than the rate at which they leave.)

The second event type occurs when a game is played between two players. In this case, we assume

  1. that the outcome of the game is either a win or loss for the first player, and

  2. that every game occurs between two players of fairly similar strength; in particular, we assume that the absolute value of the difference in strength between the players in any game is at most .

Assumption (iii) is often made, cf. [Gli99], to avoid including extra parameters in the model, as it is reasonable to assume that a draw is equivalent to half a win and half a loss (which is consistent with the score of a draw being , as in Section 2); see [Joe91, Hen92, GJ99] for alternative ways of dealing with draws. The basis for Assumption (iv) is that players will normally play games against players of comparable strength; for example, many tournaments are divided into separate grading sections for that reason. We note that the win probabilities given by (1) satisfy

(9)

which is consistent with Assumption (iii).

In our model, we approximate the ratings using a discrete numerical scale of values at intervals of . We use urns to store the pool of players, with each urn containing players of approximately similar strength. Let denote the average rating of all the players. Then , the th urn, where , contains those players whose rating is in the range , i.e. the players are grouped into bins of width . Thus a player with rating will be in urn number .

Players enter the system at a rate , where . After playing a game, a player may stay in the same urn or be transferred to one of the two neighbouring urns, depending on the result of the game. We now describe the urn model in detail.

We assume a countable number of urns, with being the central urn; to its left are the urns with negative subscripts and to its right are the urns with positive subscripts. We let denote the number of players in at stage of the stochastic process. Initially , , with , i.e. initially has players in it, and all other urns are empty, i.e. for .

When a player enters the system, an existing player is selected uniformly at random from the urns and the new player is put into the same urn as player , i.e. we assign the new player the same approximate rating as the selected existing player . In other words, new players enter the system according to the distribution of players currently in the system.

The stochastic process modelling the changes in rating can be viewed as a random walk [RG04], where the probabilities of players increasing, decreasing or maintaining their ratings depend on their current ratings, as explained below.

At time , , a player is chosen uniformly at random from the urns, say from , i.e. is selected with probability

(10)

where means is approximately equal to for large t. (This approximation holds since the expectation of the number of players at time is .)

As above, we assume . Then one of two things may occur:

  1. with probability , a new player is inserted into , i.e. into the same urn as the chosen player ;

  2. with probability , an opponent for the player is chosen from urns

    where .

    The probability that player is chosen from is , , where for symmetry we assume . Depending on the result of the game, player either moves to or , or remains in . The probabilities of these events are chosen so that the expected change in ’s rating is identical to that prescribed by the Elo system.

As we are working in terms of urn numbers rather than Elo ratings, we let , so is the scaling factor in terms of urn numbers. Thus, since and , the probability that player wins is , by (1). Therefore, from (5) and (2), when wins ’s new rating is given by

(11)

In order to find the new urn number for , corresponding to the rating , we first normalise (11) by subtracting and dividing by , giving

We restrict player to moving up or down by at most one urn. Moreover, we discretise the change stochastically so that the new urn number will be integral but the expected change unaffected. Hence,

(12)

We note that has to be chosen so that the probability in (12) does not exceed 1 for all , . We therefore require . For simplicity, we will choose .

The probability that player moves to is

i.e. the product of the probability that wins against and the corresponding discretisation probability.

Similarly, when loses we have

Again restricting to moving up or down by at most one urn, on stochastically discretising, we obtain

(13)

Therefore the probability that moves to is

i.e. the product of the probability that loses against and the corresponding discretisation probability.

Let

(14)

Then, in summary, if the selected player is from and the chosen opponent is from , ,

  1. with probability player moves to ,

  2. with probability player moves to , and

  3. with probability player stays in .

We note that is proportional to the derivative of the logistic function, viz.

This symmetric bell-shaped curve is proportional to the probability density function of the logistic distribution, with standard deviation [EHP00].

It is easy to show that, conditional on being chosen from and from , the variance of the change in rating is , whereas with the Elo system it is only ; the additional variance is due to the stochastic discretisation. It therefore follows that the unconditional variance in our model will also be increased by a factor of compared to that for the Elo system.

It is clear that, according to the Elo model, player ’s rating should be updated in a similar manner to player ’s. However, we simplify the analysis by considering each game as essentially equivalent to two “half games”, since the players are chosen randomly. It is therefore sufficient to analyse only the change to ’s rating.

(We note that, unlike the proposal in [GJ99], our evolutionary model does not take into account, for example, the fact that junior players tend to be under-rated and to improve more rapidly than older players.)

5 Derivation of the Distribution of Players’ Ratings

Considering all possible choices for player , it follows from the above discussion that the probability that will move to is given by

(15)

and, by symmetry, that this is also the probability that will move to .

At time , , a game is played with probability , and there are then the following three possible ways that the contents of may change.

  1. The player chosen uniformly at random is selected from , and then plays an opponent from say . By (15), the probability that beats and moves to is , that loses to and moves to is , and that stays in is . Thus the net expected loss from is .

  2. The player chosen uniformly at random is selected from , and then plays an opponent from say . By (15), the probability that beats and moves to is ; so the net expected gain to is . (In all other cases the contents of do not change.)

  3. The player chosen uniformly at random is selected from , and then plays an opponent from say . By (15), the probability that loses to and moves to is ; so the net expected gain to is . (In all other cases the contents of do not change.)

If is selected from any of the other urns, the contents of do not change.

We now obtain the difference equation for the urn transfer model, by considering the expected change to , as discussed above. For integer and ,

(16)

To derive (16), we follow a mean-field theory approach, such as that in [OS01, LFLW02], replacing by its expectation , as in (10). The expected value of is equal to the previous number of players in plus the two probabilities of inserting a player into , from either or , minus the probability of moving a player from to either of the neighbouring urns, i.e. and , plus the probability of inserting a new player into .

We now take expectations in (16), and we write for . By the linearity of , we obtain

(17)

We note that (17) defines a symmetric random walk by the selected player at time , where the probability of moving right or left is proportional to , but the probability that is selected decreases over time. Thus the distribution of the players in the urns flattens asymptotically over time and the standard deviation increases, as in a diffusion process [DB03].

We will see that in our case the variance increases logarithmically with time and thus the distribution will flatten very slowly.

We now approximate our discrete model by a continuous model using a continuous function to approximate . In particular, we may approximate

and

From (17

), we thus derive the partial differential equation

(18)

where

(19)

is a constant.

If we now let

we can transform (18) into

(20)

We now transform (20) into the following simple form of the standard diffusion equation (also known as the heat equation) [DB03, RG04], by making the substitution and writing for :

(21)

The initial conditions of the discrete model are , where , and for . Since

the boundary conditions for the continuous model become , where is the Dirac delta function. This yields the boundary conditions

(22)

Equation (21) with boundary conditions (22) has the following standard solution:

(23)

and we see from (6) that this is the density function of the Gaussian distribution with mean 0 and variance .

From (23) it follows that

(24)

6 Modelling the Distribution of Chess Players’ Ratings

In order to run simulations of the model that we described and analysed in Sections 4 and 5, respectively, we first need to specify or derive values for the various parameters involved.

We are assuming that and , as stated previously in Sections 2 and 4; thus . We consider the cases and , and for simplicity we assume that the urn from which the opponent is selected is chosen uniformly, i.e. . We can then compute from (14) and from (15).

Finally, we need estimates for , and . We assume, as indicated in Section 3, that the ratings are normally distributed; we relax this assumption in Section 7 to cater for some degree of skewness in the distribution. In order to validate our model, we obtain estimates for these parameters using the published official rating data from January 2007 to January 2010, as described in Section 3. Our methodology is to extract values for these parameters from this data, using the analysis in Section 5, and then run simulations of our model in order to see how closely the resulting distribution matches that obtained from the actual data.

To estimate from the actual rating data, we proceed in the following way. Let be the number of rated players recorded at January of a given year. Let be the number of games played and be the number of new players joining the pool of rated players during the previous year (computed as the difference between and its value for the previous January). According to the data, the rate at which players entered the system during the previous year is given by

The values for these parameters from January 2007 to January 2010, calculated using the official FIDE data, are presented in Table 2. In the simulations we took the rate to be , the average rate over the complete four-year period, as shown in the summary row. It can be seen from the table that, in reality, fluctuates somewhat, but as an approximation we assume that is constant. We can then compute from (19).

Year
2006 67349
2007 77056 881089 9707 0.010897
2008 87075 1009067 10019 0.009831
2009 99223 1181206 12148 0.010180
2010 109373 1285607 10150 0.007833
Summary 4356969 42024 0.009553
Table 2: The data used to compute

Lastly we need to obtain values for and . From (6) and (24), it follows that at time the expected number of players is given by

(25)

and that , the variance of the rating distribution, is . We thus obtain

(26)

To get a single value for , we simply take the average over the years 2007 to 2010, where we compute a year-specific value for from (26) using the values of and from Table 1. Finally, we estimate using (25).

For and , the estimated values for and are presented in Table 3, where the values for are rounded to the nearest 10. We also obtained alternative estimates by replacing by in (25) and (26); the two alternatives are indicated by the first column of Table 3. The alternatives will be denoted by and , respectively.

Using until 2007 until 2008 until 2009 until 2010
1 20701 5899130 6947900 8219530 9282010
1 20242 5749450 6762420 8042210 9173160
2 20569 5912960 6961730 8233360 9295840
2 20113 5762980 6775950 8055750 9186690
3 20373 5933480 6982250 8253880 9316360
3 19921 5783070 6796030 8075830 9206770
Table 3: Derived and for 2007 to 2010, for and

As mentioned above, we fixed at , the value obtained in Table 2. For each set of values for the parameters , and in Table 3, we ran 10 simulations of the stochastic process described in Section 4, implemented in Matlab. In each case we then fitted a Gaussian to the distribution of the number of players in the urns, again using Matlab. Each row in Table 4 was computed from the average of the 10 simulations in exactly the same way that the values in Table 1 were computed from the actual rating data. That is, , and are the values calculated from the results of the simulations, and , and are the values obtained by fitting a Gaussian distribution to the simulation results. (In order to obtain Elo ratings from the urn numbers of the players in the simulation, the urn numbers were calibrated by means of a suitable shift. This was chosen so that the means from Table 1 for each of the four years were within the range of .) It can be seen that, in each row of Table 4, all the fitted and calculated values are very close to each other. This and the fact that is so close to one gives strong confirmation of our analysis in Section 5.

We now compare the fitted and calculated parameters from Table 4 with those in Table 1. Obviously, by construction, and are very close to the corresponding values in Table 1. In addition, it can be seen that the values for and when using and are very close to the values for in Table 1, and correspondingly close to the values for in Table 1 when using and . However, the calculated standard deviation in Table 4 is consistently lower than its counterpart in Table 1. For 2007 they are very close, for 2008 they are about 10 Elo points apart, for 2009 they are about 17 points apart, while for 2010 they are about 24 points apart. Although these results are very encouraging, we will see in the next section that we can get much closer to the actual standard deviations by introducing skewness into the model.

Using Year
1 2007 77016 2092.231 164.476 0.9993 77064 2092.179 164.890
1 2007 74895 2103.906 164.826 0.9993 74899 2104.053 164.923
2 2007 77048 2096.596 164.865 0.9994 77045 2096.646 164.933
2 2007 75206 2090.258 164.489 0.9993 75233 2090.195 164.752
3 2007 77201 2091.796 164.679 0.9994 77228 2092.199 164.981
3 2007 75066 2095.629 164.533 0.9993 75111 2096.049 164.893
1 2008 87049 2075.647 171.938 0.9994 87081 2075.632 172.178
1 2008 84901 2077.914 172.079 0.9994 84962 2078.193 172.515
2 2008 87080 2084.221 172.265 0.9995 87111 2083.853 172.573
2 2008 84829 2069.993 172.106 0.9994 84843 2070.036 172.171
3 2008 87073 2075.797 172.173 0.9994 87144 2075.890 172.676
3 2008 84897 2070.297 171.903 0.9994 84926 2070.029 172.086
1 2009 99267 2033.206 179.812 0.9995 99324 2033.229 180.227
1 2009 96970 2036.315 179.620 0.9995 97030 2036.540 180.000
2 2009 99121 2037.768 179.613 0.9995 99141 2037.980 179.862
2 2009 96985 2041.688 179.421 0.9995 97043 2041.843 179.820
3 2009 99149 2032.080 179.770 0.9995 99166 2031.835 179.943
3 2009 97138 2032.044 179.542 0.9994 97172 2031.974 179.804
1 2010 109458 2019.797 185.313 0.9995 109485 2019.818 185.528
1 2010 107868 2026.636 186.163 0.9995 107893 2026.662 186.304
2 2010 109457 2014.156 185.612 0.9995 109469 2013.943 185.691
2 2010 107847 2021.331 185.932 0.9995 107860 2021.283 186.047
3 2010 109373 2013.765 185.203 0.9994 109386 2013.833 185.344
3 2010 107840 2020.426 185.611 0.9995 107865 2020.262 185.806
Table 4: Actual and fitted parameters for simulation results

7 Taking Skewness into Account

As discussed in Section 3, the actual rating data exhibits a small negative skew. We now consider modifying the urn model presented in Section 4 to take this into account. Since it is likely that a new player will enter with a rating lower than the average, we can model this skewness in a simple way by making a small change to the way in which new players are added. Instead of inserting the new player into the same urn as the chosen player , say , we put the new player into , where determines the amount of negative skew we wish to introduce.

To validate the modified stochastic process, we ran a batch of simulations in Matlab, starting the process with the actual rating data as of October 2006 and ending in January 2010. For the October 2006 starting data , and (as shown in Figure 2). From October to December 2006 the number of games played was , and the number of new players was . Using these values together with the data in Table 2, we therefore took the number of simulation steps to be and, as before, the rate at which players enter the system to be . Tables 5, 6 and 7 show the average skewness , mean rating and standard deviation over 10 simulations, for and , respectively, with varying from to . As a reference point, for the actual rating data as of January 2010, and , as in Table 1, and we computed .

It can be seen that the results are rather similar in all three tables. As is increased, the skewness becomes more negative, the mean decreases and the standard deviation increases, as expected. The closest fit to the actual skewness and the standard deviation is when is . However, the closest fit to the mean Elo rating is when is or . The suggested values for therefore correspond to a new player being rated Elo points below the average rating. This latter value is in broad agreement with the value obtained in Section 3 from Figure 2. Although this value was obtained using the entire three year period, the values for the individual years calculated from (7) are similar, being roughly in the range . These results confirm that the modified process is a reasonable model for obtaining rating data with the observed parameters, despite the discrepancy between the values for . This discrepancy is not surprising, since the modified model, as a first approximation, is clearly an oversimplification. We note that the value of seems to have very little effect on the results, although it is possible that some pattern might be noticeable if a significantly larger value for was used.

-0.2284 2015.650 209.622
0 -0.0920 2105.641 186.980
1 -0.0884 2097.907 187.030
2 -0.0933 2089.894 188.557
3 -0.0995 2082.068 190.721
4 -0.1064 2074.321 193.440
5 -0.1276 2066.435 197.046
6 -0.1541 2058.940 201.387
7 -0.1881 2050.446 206.373
8 -0.2309 2043.175 211.843
9 -0.2745 2035.115 218.153
10 -0.3206 2027.404 224.623
11 -0.3742 2019.250 232.101
12 -0.4227 2012.005 239.422


Table 5: Simulation results allowing skewness, for
-0.2284 2015.650 209.622
0 -0.0907 2105.510 186.788
1 -0.0917 2097.558 187.265
2 -0.0963 2089.933 188.447
3 -0.1008 2082.062 190.765
4 -0.1114 2074.058 193.490
5 -0.1305 2066.190 197.117
6 -0.1532 2058.516 201.448
7 -0.1845 2050.812 206.251
8 -0.2297 2043.222 212.007
9 -0.2744 2035.224 217.962
10 -0.3226 2027.335 224.759
11 -0.3722 2019.737 231.852
12 -0.4346 2011.701 239.575
Table 6: Simulation results allowing skewness, for
-0.2284 2015.650 209.622
0 -0.0931 2105.713 186.641
1 -0.0905 2097.470 187.274
2 -0.0929 2090.024 188.341
3 -0.1014 2082.383 190.398
4 -0.1105 2074.388 193.320
5 -0.1230 2066.305 196.642
6 -0.1571 2058.612 201.401
7 -0.1860 2050.754 206.245
8 -0.2310 2042.906 211.766
9 -0.2749 2035.191 217.848
10 -0.3249 2027.141 224.830
11 -0.3716 2019.805 231.613
12 -0.4290 2011.446 239.475
Table 7: Simulation results allowing skewness, for

8 Concluding Remarks

We have constructed a stochastic evolutionary urn model that generates the distribution of players’ ratings and have validated this model using published official rating data on chess players. For the symmetric case, our analysis of the model yielded a Gaussian distribution, which has the interesting feature that the variance increases logarithmically with time. This implies that the distribution of ratings is quite stable, but has the tendency to flatten extremely slowly over time. These results were validated by simulating the model. Although the data is well approximated by a Gaussian, there is a small negative skew present in the data. An improvement can be made to the model to account for this by breaking the symmetry and putting new players into lower-numbered urns, corresponding to new players generally having lower than average ratings. The modified stochastic process was validated by simulation starting with actual rating data. Deriving analytically the distribution for the modified process remains an open problem.

Throughout the paper we have assumed that the -factor is fixed at 20. It would be interesting to allow the -factor to vary with players’ ratings and the number of games they have played, as suggested in [GJ99], and to see whether such a modification could shed some light on the -factor controversy mentioned in Section 2.

References

  • [BSMG09] M. Bilalić, K. Smallbone, P. McLeod, and F. Gobet. Why are (the best) women so good at chess? Participation rates and gender differences in intellectual domains. Proceedings of the Royal Society of London, Series B, 276:1161–1165, 2009.
  • [BT52] R.A. Bradley and M.E. Terry. Rank analysis of incomplete block designs, I. The method of paired comparisons. Biometrica, 39:324–345, 1952.
  • [CG96] N. Charness and Y. Gerchak. Participation rates and maximal performance: A log-linear explanation for group differences, such as Russian and male dominance in chess. Psychological Science, 7:46–51, 1996.
  • [Dav88] H.A. David. The Method of Paired Comparisons. Hodder Arnold, London, UK, 2nd edition, 1988.
  • [DB03] K.A. Dill and S. Bromberg. Molecular Driving Forces, Statistical Thermodynamics in Chemistry and Biology. Garland Science, New York, NY, 2003.
  • [EHP00] M. Evans, N. Hastings, and B. Peacock. Statistical Distributions. John Wiley & Sons, New York, NY, 3rd edition, 2000.
  • [Elo86] A.E. Elo. The Rating of Chessplayers Past & Present. ARCO Publishing, New York, NY, 2nd edition, 1986.
  • [GJ99] M.E. Glickman and A.C. Jones. Rating the chess rating system. Chance, 12:21–28, 1999.
  • [Gli99] M.E. Glickman. Parameter estimation in large dynamic paired comparison experiments. Applied Statistics, 48:377–394, 1999.
  • [Hen92] R.J. Henery. An extension of the Thurstone-Mosteller model for chess. The Statistician, 41:559–567, 1992.
  • [HMG06] R. Herbrich, T. Minka, and T. Graepel. TrueSkill: A Bayesian skill rating system. In Proceedings of Advances in Neural Information Processing Systems (NIPS) 19, pages 569–576, Vancouver, B.C., Canada, 2006.
  • [Joe91] H. Joe. Rating systems based on paired comparison models. Statistics & Probability Letters, 11:343–347, 1991.
  • [LFLW02] M. Levene, T.I. Fenner, G. Loizou, and R. Wheeldon. A stochastic model for the evolution of the Web. Computer Networks, 39:277–287, 2002.
  • [Mot95] H. Motulsky. Intuitive Biostatistics. Oxford University Press, Oxford, 1995.
  • [OS01] M. Opper and D. Saad, editors. Advanced Mean Field Methods; Theory and Practice. MIT Press, Cambridge, Ma., 2001.
  • [RG04] J. Rudnick and G. Gaspari. Elements of the Random Walk, An Introduction for Advanced Students and Researchers. Cambridge University Press, Cambridge, UK, 2004.
  • [Son09] J. Sonas. Ratings summit in Athens. See http://www.chessbase.com/newsdetail.asp?newsid=5527, June 2009.
  • [Zul09] D. Zult. On the increase of the K-factor (Update). See www.chessvibes.com/reports/on-the-increase-of-the-k-factor, May 2009.