Log In Sign Up

Estimating the concentration parameter of a von Mises distribution: a systematic simulation benchmark

by   Guillaume Marrelec, et al.

In directional statistics, the von Mises distribution is a key element in the analysis of circular data. While there is a general agreement regarding the estimation of its location parameter μ, several methods have been proposed to estimate the concentration parameter κ. We here provide a thorough evaluation of the behavior of 12 such estimators for datasets of size N ranging from 2 to 8 192 generated with a κ ranging from 0 to 100. We provide detailed results as well as a global analysis of the results, showing that (1) for a given κ, most estimators have behaviors that are very similar for large datasets (N ≥ 16) and more variable for small datasets, and (2) for a given estimator, results are very similar if we consider the mean absolute error for κ≤ 1 and the mean relative absolute error for κ≥ 1.


page 1

page 2

page 3

page 4


On Uses of Mean Absolute Deviation: Shape Exploring and Distribution Function Estimation

Mean absolute deviation function is used to explore the pattern and the ...

Equivariant Estimation of the Selected Guarantee Time

Consider two independent exponential populations having different unknow...

Stochastic modeling of in vitro bactericidal potency

We provide a Galton–Watson model for the growth of a bacterial populatio...

Inference for spherical location under high concentration

Motivated by the fact that many circular or spherical data are highly co...

An Estimation and Analysis Framework for the Rasch Model

The Rasch model is widely used for item response analysis in application...

1 Introduction

In directional statistics, the von Mises distribution is a key element in the analysis of circular data. Its distribution is given by


where is the location parameter, the concentration parameter, and the incomplete Bessel integral of the first kind,


The von Mises distribution is a unimodal distribution with its mode at .

corresponds to the uniform distribution, and the larger

, the more concentrated the distribution is around its mode. The von Mises distribution is a convenient way to model unimodal circular or directional data in many fields of science (Fisher, 1995) and naturally arises in a wide variety of cases (Mardia and Jupp, 2000, §3.5.4)

. It can be obtained by conditioning a bivariate normal distribution on the circle, is close to the wrapped normal distribution, and tends to a normal distribution when

. Also, it is a maximum-entropy distribution and can be characterized by the fact that the maximum likelihood of its location parameter is the sample mean direction. As such, it is an analogue on the circle of the Gaussian distribution. It also appears as the equilibrium distribution of a von Mises process.

While there is a general agreement regarding the estimation of (Mardia and Jupp, 2000, §12.4.1), estimation of has generated more research, and several methods have been proposed so far. For a partial review, see, e.g., Mardia and Jupp (2000, §12.4.2). Here, we provide a systematic benchmark of existing estimators. More specifically, we considered 12 estimators: seven frequentist estimators, including the maximum-likelihood estimator (Mardia and Jupp, 2000, §5.3.1), the marginal maximum-likelihood estimator (Schou, 1978), two estimators with bias correction (Best and Fisher, 1981), two estimators based on the circular median (Lenth, 1981; Ko, 1992), and an estimator based on the normal approximation (Abeyasekera and Collett, 1982); and five Bayesian estimators (Dowe et al., 1996), including three maximum a posteriori (MAP) estimators and two minimum message length (MML) estimators. We assessed the behavior of these 12 estimators for datasets of size ranging from 2 to 8 192 generated with a ranging from 0 to 100. We provide detailed results as well as a more global view of the estimators’ performance. Our two main findings are that (1) for a given , most estimators have behaviors that are very similar for large datasets () and much more variable for small datasets, and (2) for a given estimator, results are very similar if one consider the mean absolute error for and the mean relative absolute error for .

The outline of the manuscript is the following. In Section 2, we provide a quick description of the 12 estimators used here. The simulation study itself is detailed in Section 3. Further issues are raised in the discussion.

2 Overview of existing estimators

We here quickly review existing approaches, including frequentist (Section 2.1) and Bayesian (Section 2.2) estimators. All approaches start from a sample of independent and identically distributed (i.i.d.) realization of a distribution with and unknown.

2.1 Frequentist estimators

Most frequentist estimators of relie on the likelihood of the parameters. Letting be the sample circular mean of the data,


the likelihood can be expressed from Equation (1) and the properties of as


The maximum-likelihood estimator is the value of that cancels the derivative of . Setting


it is the solution of (Mardia and Jupp, 2000, §5.3.1)


A marginal maximum-likelihood estimate was also proposed. It is based on the expression of the density of with respect to Lebesgue measure (Watson and Williams, 1956; Mardia, 1975),


The maximum of this expression depends on the relationship between and (Schou, 1978)

  • For , ;

  • For , is the solution of

  • For , there is no maximum (as it would correspond to ).

Using both a calculation based on a Taylor expansion around of from Equation (6) and a simulation study, Best and Fisher (1981) showed that the (regular) maximum-likelihood estimator of Equation (6) can be strongly biased for small and . They proposed to correct for this bias using an approximate expansion, leading to


They also proposed a jackknife correction of the bias,


By analogy with the median absolute deviation, Lenth (1981) proposed an estimator based on the median


where is the circular median (Mardia and Jupp, 2000, §2.2.2). Ko (1992) improved this approach by replacing the expectation and average of Equation (6) with median and sampling median (see also Ducharme and Milasevic, 1990), yielding an estimator such that


where is any estimator of that is standardized bias (SB) robust, such as the circular median or the least median square (LMS) estimator.

Finally, Abeyasekera and Collett (1982) made use of the fact that a distribution is well approximated by the normal distribution for large and therefore proposed as estimator of

the unbiased estimator for

, leading to the so-called linear estimator for


where is the linear mean, that is, the usual arithmetic mean

Note that there is an issue induced by the discontinuity at 0, which is dealt with by using a technical trick. They also proposed an improved estimator where the bias is corrected by jackknifing the linear estimator.

2.2 Bayesian estimators

In a Bayesian setting, all the information that can be inferred about the parameters and from a dataset is summarized in the posterior distribution

, which, according to Bayes’ theorem, can be expressed as


where “” relates two expressions that are proportional. is the data likelihood, already expressed in Equation (4). is the prior distribution, which translates the information we have about the parameters before we have the data. A common approach is to assume no prior dependence between parameters, so that . is classically set as a noninformative uniform distribution on the circle, . As for , Dowe et al. (1996) considered the following two prior distributions


The posterior distribution then yields


They then proposed to use the maximum a posteriori (MAP) estimators


For , they also proposed to consider estimating as the MAP of the distribution of , leading to


Finally, they proposed two minimum message length (MML) estimators. Minimizing the message length is equivalent to maximizing


where is the determinant of the Fisher matrix for the von Mises distribution. The resulting estimators for and are





2.3 Existing simulation studies

We found three existing simulations studies assessing the relative performance of estimators of the concentration parameter: Best and Fisher (1981), Abeyasekera and Collett (1982), and Dowe et al. (1996). Their scope and methodology are summarized in Table 1. Best and Fisher (1981) found that and were similar and better than for small values of and . was less biased than and was less median biased. However, the bias remained for small and . Abeyasekera and Collett (1982) found that was better than , and ; that was better than for small values of and worse when was large; and, finally, that and had similar behaviors. Finally, Dowe et al. (1996) found that Bayesian methods outperformed frequentist methods, with a limited effect of the prior distribution on the behavior.

Article Estimators Measures of fit
Best and Fisher (1981) qualitative on sampling features
, qualitative on sampling features
100 , qualitative on sampling features
, qualitative on sampling features
100 , , qualitative on sampling features
Abeyasekera and Collett (1982) 1 000 , , , sample bias and MSE
Dowe et al. (1996) 1 000 , , , MAE, MSE, MKL
1 000 , , , MAE, MSE, MKL
our simulation 1 000 see Table 2 MAE, MRAE
Table 1: Existing simulation. Summary of scope and methodology. is the sample size and

the number of samples used to compute mean errors. MAE: mean absolute error; MSE: mean squared error; MKL: mean Kullback–Leibler divergence; MRAE: mean relative absolute error.

3 Simulation study

3.1 Data

We generated samples from von Mises distributions. More specifically, for each of the values of , we generated “maximal” datasets. Each dataset was composed of i.i.d. realizations of a distribution with uniformly distributed on the circle and with (i.e., ). Each sample was generated using Berens (2009)’s CircStat. This procedure therefore generated a total of maximal datasets of size . From each maximal dataset, we then extracted datasets composed of the first data points with . This gave us a total of datasets on which inference was performed. On each dataset, estimation of was performed by application of each of the estimators mentioned above (see also Table 2 for a summary). We therefore ended with a set of estimates.

Estimator Equation Identification
(6) jML
(8 mML
(9) BF1
(10) BF2
(11) median-1
(12) median-2
(13) linear
(17) BayesEst-2-jMAP-km
(17) BayesEst-3-jMAP-km
(18) BayesEst-3-jMAP-xy
(20) MML-2
(21) MML-3
Table 2: Simulation study. Summary of estimators used.

3.2 Evaluation

To assess the behavior of the various estimations, we proceeded as follows. For estimation , obtained by application of estimator to the first data points of simulation associated with the th value of , we first considered the absolute error of the estimation compared to the true value


We then summarized the results over the simulations by computing the mean absolute error (MAE), defined as a sampling average


In the same fashion, we also considered the relative error, where the above error of Equation (22) is normalized by the true value


and the mean relative absolute error (MRAE) as its sampling average over the simulations


3.3 Results

Detailed results can be found in the supplemental material, in the form of summary statistics (supplemental material, Tables 1–12) and graphs (supplemental material, Figures 1–6). We here focus on some key features of the behavior: computation time, algorithmic failures, the global trend as a function of , the behavior for large samples (), and the small-sample behavior.

Computation time.

The detailed computation times can be found in the supplemental material, Tables 1–12, columns 3 and 4. For all estimators, the actual value of seemed to have very limited influence on the computation time. Similarly, for most estimators (jML, mML, BF1, linear, MML-3, BayesEst-2-jMAP-km, BayesEst-3-jMAP-km, and BayesEst-3-jMAP-xy), the computation time was not a function of the sampling size; it was of the order of 40 ms for BF1 and of 20 ms for the others (see Table 3). Three estimators had computation times that were a function of data size: BF2, median1, and median2 (see Figure 1). For BF2, the computation time was roughly linear, with a doubling of the data size corresponding to a doubling of the computation time. For median1 and median2, the computation time was roughly constant up to and then increased exponentially.

Estimator Time (mean std dev)
jML ms
mMLl ms
BF1 ms
linear ms
MML-2 ms
MML-3 ms
BayesEst-2-jMAP-km ms
BayesEst-3-jMAP-km ms
BayesEst-3-jMAP-xy ms
Table 3: Simulation study. Computation time for estimators that were not influenced by sample size. Mean standard deviation across values of , , and simulations.
Figure 1: Simulation study. Computation time for estimators that were influenced by sample size. Boxplots (median and percentile interval) across values of and simulations.


We first report cases where methods failed to compute an estimate (see Table 4). This happened for median-2 (corresponding to ) and linear (corresponding to ). By definition, of Equation (13) is not defined for . Also, the second median estimator, , was not always defined, as they were cases for which the integral of Equation (12) was smaller than for all (and, in particular for ). Cases corresponding to failures were removed to compute the summary statistics and .

median2 linear
median2 linear
Table 4: Simulation study. Summary of number of failures encountered for all and . No failure was encountered for jML, mML, BF1, BF2, median1, MML-2, MML-3, BayesEst-2-km, BayesEst-3-km, and BayesEst-3-xy.

Evolution as a function of .

Graphs corresponding to and for all estimators gathered by value of can be found in Figure 2. Globally, most estimators improved as the sample size increased. One clear exception was BF2 for small and low . In that particular case, the method tended to estimate as equal to 0, leading to an error of the order of , with a correction toward a more normal trend as the data size increased. This behavior was also observed to a much lesser extend for the maximum message length estimators (MML-2 and MML-3) and the Bayesian estimators (BayesEst-2-jMAP-km, BayesEst-3-jMAP-km, and BayesEst-3-jMAP-xy).

Figure 2: Simulation study. Average error for all values of and all estimators.

Large-sample behavior.

We observed quite homogeneous results for large datasets (), both within- and between estimators. First, the mean absolute error roughly decreased linearly in log-log coordinates for most estimators. The two exceptions were median1 for low , where the estimator seemed to reach a plateau independent of , and linear, for which data samples of increasing size did not seem to improve estimation. Also, the performance of a given method for different values of were quite similar when one considered the mean absolute error for and the mean relative absolute error for . The main exception was median1 due to the difference in behavior for low values versus high values of mentioned earlier in the paragraph. For the other estimators, the slope of as a function of was found to be around , corresponding to a decrease of mean (relative) absolute error in . Finally, most estimators behaved quite similarly in this large-sample regime, at the exception of median1, linear and, to a lesser extend, median2.

To quantify these large-sample behaviors, we fitted to each curve a model of linear regression, either


for , or


for . The results are summarized in Figure 3, confirming generally homogeneous behaviors (at the exception of median1, median2 and linear) that tend to depend more on the value of than on the specifics of the estimation method.

(a) estimated slope (b) estimated intercept
(c) predicted value for (d) predicted value for
(e) standard deviation of residual noise
Figure 3: Simulation study.

Result of linear regression analysis (

) for either Equation (26) (for ) or Equation (27) (for ). (a) Estimated slope . (b) Estimated intercept . (c) Predicted values of (for ) or (for ) at . (d) Predicted values of (for ) or (for ) at . (e) Standard deviation of residual noise.

Small sample behavior.

Behaviors were much more variable for small sample sizes, with both a variability between methods and, for a given method, between values of . A majority of estimators (jML, mML, BF1, median1, median2, MML-2, MML-3) tended to have vey large average errors for very small sample sizes (). It was found that these extremely large values were due to a few very large errors. For instance, in the case of MML-2, and , all but one of the 1 000 repetitions had errors smaller than 1, the remaining error being equal to .

To quantify this behavior, we modeled deviation from the linear trend as an exponential decay, either


for or


for . is the maximum departure of the mean (relative) absolute error from the linear trend and corresponds to the increase in mean (relative) absolute error at . A positive value indicates a mean (relative) absolute error that is larger than the linear expectation (and therefore a decreased performance), while a negative value indicates a smaller mean (relative) absolute error (and therefore an increased performance). is such that the departure from the linear trend is divided by 10 for . The larger the value of , the slower the decay. The results are summarized in Figure 4.

Regarding , we found several groups of estimators. A first group (jML, mML, BF1, median1, median2) had large positive values of regardless of . A second group (MML-1 and MML-2) had large positive values of only for , and small negative values for . A third group (linear, BayesEst-2-km, BayesEst-3-km, BayesEst-3-xy) had low (positive and negative) values of for all values of . A last estimator (mML) had small positive values of for and negative, small yet larger values of for .

Regarding , all values were found to be in the range . All estimators with large values of (corresponding to the first two groups mentioned earlier) had values in the range , indicating a very fast exponential decay — for practical purposes, it mostly affected datasets of size .

(a) estimated decay amplitude (b) estimated decay time
(c) standard deviation of residual noise
Figure 4: Simulation study. Result of nonlinear regression analysis for either Equation (28) (for ) or Equation (29) (for ). (a) Estimated decay amplitude . (b) Estimated decay time . (c) Standard deviation of residual noise (bottom left).

4 Discussion

In the present manuscript, we assessed the behavior of 12 estimators of the concentration parameter of a von Mises distribution. We used synthetic datasets of size ranging from 2 to 8 192 generated with a ranging from 0 to 100. We provided detailed results as well as a more global view of the estimators’ performance. We found that, for a given estimator, results were very similar across values of if we considered the mean absolute error for and the mean relative absolute error for . We also found that, for a given , most estimators had similar behaviors for large datasets (), while the behaviors differed more strongly for small datasets.

The fact that the mean absolute error for was similar to the mean relative absolute error for has two consequences. It first shows that, for , error increases linearly as a function of . Larger values of are therefore expected to lead to larger estimation errors. While it also means that smaller values of tend to yield smaller estimation errors, this behavior cannot be used to our advantage for , where it would mean that error vanishes as . By contrast, it is the mean absolute error that remains stable for , i.e., error reaches a plateau when decreases below .

For large datasets, the common behavior of most estimators was found to be that the mean absolute error roughly decreased linearly in log-log coordinates, with a slope around , corresponding to a decrease of mean (relative) absolute error in . This is in line with the general estimation theory, where errors are often found to decrease as .

In the light of our results, we are able to give general practical guidelines regarding the use of the estimators tested here. First, we would not recommend the use of median-1, median-2 and linear unless there is evidence that they might perform well in the specific context of interest. For large datasets (), all other estimators perform in a similar fashion and there is no obvious reason to recommend one or the other. In this context, BF2’s computational burden is a disadvantage with no compensation in terms of performance. By contrast, for small datasets, BF2 was found to be the estimator with the best behavior, still at the price of a high computational cost as well as a consistent underestimation of the concentration parameter when it was small but different from 0. For instance, for , it estimated as 0 regardless of its actual value (supplemental material, Table 5). The Bayesian estimators, which only performed a little worse than BF2 but had computational costs similar to the other estimators, could prove valuable trade-offs.

Importantly, all estimators considered here require a sample of independent and identically distributed (i.i.d.) realization of a distribution with and

unknown. Our simulation study faithfully respected this requirement. As a consequence, it did not explore the behavior of estimators with respect to the presence of outliers or model misspecification. Another consequence of the fact that we did not depart from the von Mises model is that we did not consider estimators assuming that one or several observations are identified as outliers (e.g., Winsorized estimate of

Fisher, 1982). Also, some of the estimators proposed in the literature depend on parameters (see, e.g., Lenth, 1981; Kato and Eguchi, 2016). Since the behavior of these estimators strongly depends on the choice of the parameters, they were not incorporated in the analysis.

As mentioned earlier, there were cases where some of the estimators were not defined. This happened for the linear estimator and , since this estimator requires at least 4 data points. The median-based estimator also failed sometimes, in particular for small values of , as Equation (12) had no solution (the integral was smaller than 0.5 for all values of ). In such cases, we could have decided to set the estimator to 0. However, we believe that such problem is symptomatic of a deeper problem whose investigation goes beyond the scope of the present manuscript. So we just took the estimator as is.

As a final note, the routines calculating the various estimators were written by us and were not optimized. As such, the computation times presented here should only serve as a rough indicator of the time required to apply each of them. In particular, it is possible that optimal coding of BF2 with a compiled language could greatly improve its computation time.


  • Abeyasekera and Collett (1982) Abeyasekera, S. and D. Collett (1982). On the estimation of the parameters of the von Mises distribution. Communications in Statistics – Theory and Methods 11(18), 2083–2090.
  • Berens (2009) Berens, P. (2009). CircStat: a MATLAB toolbox for circular statistics. Journal of Statistical Software 31(10), 21 pages.
  • Best and Fisher (1981) Best, D. J. and N. I. Fisher (1981). The bias of the von Mises–Fisher concentration parameters. Communications in Statistics – Simulation and Computation B10(5), 493–502.
  • Dowe et al. (1996) Dowe, D. L., J. J. Oliver, R. A. Baxter, and C. S. Wallace (1996). Bayesian estimation of the von Mises concentration parameter. In K. M. Hanson and R. N. Silver (Eds.), Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, Volume 79 of An International Book Series on The Fundamental Theories of Physics: Their Clarification, Development and Application, pp. 51–60. Springer, Dordrecht.
  • Ducharme and Milasevic (1990) Ducharme, G. R. and P. Milasevic (1990). Estimating the concentration of the Langevin distribution. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 18(2), 163–169.
  • Fisher (1982) Fisher, N. I. (1982). Robust estimation of the concentration parameter of Fisher’s distribution on the sphere. Applied Statistics 31, 152–154.
  • Fisher (1995) Fisher, N. I. (1995). Statistical Analysis of Circular Data. Cambridge University Press.
  • Kato and Eguchi (2016) Kato, S. and S. Eguchi (2016). Robust estimation og location and concentration parameters for the von Mises–Fisher distribution. Statistical Papers 57(1), 205–234.
  • Ko (1992) Ko, D. (1992). Robust estimation of the concentration parameter of the von Mises–fisher distribution. The Annals of Statistics 20(2), 917–928.
  • Lenth (1981) Lenth, R. V. (1981). Robust measure of location for directional data. Technometrics 23(1), 77–81.
  • Mardia (1975) Mardia, K. V. (1975). Distribution theory for the von Mises-Fisher distribution and its application. In G. P. Patil, S. Kotz, and J. K. Ord (Eds.), A Modern Course on Statistical Distributions in Scientific Work. volume 1 – Models and Structures, Volume 17 of NATO Advanced Study Institutes Series. Series C: Mathematical and Physical Sciences, pp. 113–130. D. Reidel Publishing Company, Dordrecht, Holland.
  • Mardia and Jupp (2000) Mardia, K. V. and P. E. Jupp (2000). Directional Statistics.

    Wiley Series in Probability and Statistics. Wiley, Chichester.

  • Schou (1978) Schou, G. (1978). Estimation of the concentration parameter in von Mises-Fisher distributions. Biometrika 65(2), 369–377.
  • Watson and Williams (1956) Watson, G. S. and E. J. Williams (1956). On the construction of significance tests on the circle and the sphere. Biometrika 43(3/4), 344–352.