Efficient posterior sampling for Bayesian Poisson regression

by   Laura D'Angelo, et al.
Università di Padova

Poisson log-linear models are ubiquitous in many applications, and one of the most popular approaches for parametric count regression. In the Bayesian context, however, there are no sufficient specific computational tools for efficient sampling from the posterior distribution of parameters, and standard algorithms, such as random walk Metropolis-Hastings or Hamiltonian Monte Carlo algorithms, are typically used. Herein, we developed an efficient Metropolis-Hastings algorithm and importance sampler to simulate from the posterior distribution of the parameters of Poisson log-linear models under conditional Gaussian priors with superior performance with respect to the state-of-the-art alternatives. The key for both algorithms is the introduction of a proposal density based on a Gaussian approximation of the posterior distribution of parameters. Specifically, our result leverages the negative binomial approximation of the Poisson likelihood and the successful Pólya-gamma data augmentation scheme. Via simulation, we obtained that the time per independent sample of the proposed samplers is competitive with that obtained using the successful Hamiltonian Monte Carlo sampling, with the Metropolis-Hastings showing superior performance in all scenarios considered.



There are no comments yet.


page 1

page 2

page 3

page 4



Generalized linear mixed models (GLMMs) are often used for analyzing cor...

Improved log-Gaussian approximation for over-dispersed Poisson regression: application to spatial analysis of COVID-19

In the era of open data, Poisson and other count regression models are i...

Fast and optimal nonparametric sequential design for astronomical observations

The spectral energy distribution (SED) is a relatively easy way for astr...

Bayesian inference for network Poisson models

This work is motivated by the analysis of ecological interaction network...

Informed MCMC with Bayesian Neural Networks for Facial Image Analysis

Computer vision tasks are difficult because of the large variability in ...

Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence

We revisit the Bayesian Context Trees (BCT) modelling framework for disc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Poisson log-linear models are common in statistics and represent one of the most popular choices to model how the distribution of count data varies with predictors. A typical assumption is that, under an independent sample of counts,

, the probability mass function of the generic

conditionally on a

-dimensional vector of covariates



where is a -dimensional vector of unknown coefficients. Linking the linear predictor and the parameter with the logarithm represents the most natural choice, as the logarithm is the canonical link for the Poisson family (Nelder and Wedderburn, 1972). This model has broad application in several fields, including medicine and epidemiology (Frome, 1983; Frome and Checkoway, 1985; Hutchinson and Holtman, 2005), manufacturing process control (Lambert, 1992), analysis of accident rates (Joshua and Garber, 1990; Miaou, 1994), and crowd counting (Chan and Vasconcelos, 2009), among others.

In the Bayesian context, model (1) does not enjoy any conjugacy property and, thus, regardless of the prior used, the posterior distribution of

is not available in close form. Consequently, inference is conducted using Markov Chain Monte Carlo (MCMC) methods, which obtain a sample from the posterior distribution of the parameters. Several approaches have focused on how to easily obtain the posterior distribution of the coefficients of Poisson models without requiring complex tuning strategies or long computation times. In the context of count-valued time series, 

Frühwirth-Schnatter and Wagner (2006) proposed a formulation of the model based on two levels of data augmentation, to derive an efficient approximate Gibbs sampler. Frühwirth-Schnatter et al. (2009) exploited a data augmentation strategy to simplify the computation of hierarchical models for count and multinomial data. Data augmentation strategies have also been employed in the case of models for multivariate dependent count data (Karlis and Meligkotsidou, 2005; Bradley et al., 2018). However, the simplest Poisson regression in (1) still lacks a specific and efficient algorithm to sample from the posterior distribution of the parameters for any prior choice, making the Metropolis-Hastings (Hastings, 1970) or Hamiltonian Monte Carlo (HMC) (Neal, 2011) algorithms the only available options.

On the other hand, several efficient computational strategies for binary regression models have been proposed in the literature. Using the probit link, Albert and Chib (1993) proposed an efficient data augmentation based on a latent Gaussian variable, while the more recent contribution by Polson et al. (2013)

exploited the canonical logit link, introducing an efficient Pólya-gamma data augmentation scheme. Leveraging

Polson et al. (2013) approach, we propose a novel approximation of the posterior distribution that can be exploited as proposal distribution of a Metropolis-Hastings algorithm or as importance density of an importance sampling for Poisson log-linear models with conditional Gaussian prior distributions on the regression parameters. With conditional Gaussian prior, we refer to a possibly hierarchical prior with conditional distribution , with and/or random. Examples include straightforward Gaussian prior distributions with informative fixed using prior information, and scale mixtures of Gaussian where

is set to zero and the variance has a suitable hierarchical representation, such as the Bayesian lasso prior

(Park and Casella, 2008), the horseshoe prior, and its extensions (Carvalho et al., 2010; Piironen and Vehtari, 2017).

More specifically, we introduce an approximation of the posterior density that exploits the negative binomial convergence to the Poisson distribution. Thanks to this result, we are able to leverage the Pólya-gamma data augmentation scheme of 

Polson et al. (2013) to derive an efficient sampling scheme. In the next section, we introduce and discuss the proposed algorithms, starting from the definition of an approximate posterior distribution whose sampling can be performed straightforwardly. Sampling from this approximate posterior is then used as proposal density for the Metropolis-Hastings or importance sampler. The performances of the proposed algorithms in terms of computational efficiency is compared with that of state-of-the-art methods in a simulation study. The paper concludes with two illustrative applications.

2 Efficient posterior sampling strategies

2.1 Approximate posterior distribution

Assume is an independent sample of counts from model  (1). We introduce an approximation of the posterior density which exploits the negative binomial convergence of the Poisson distribution, i.e., we approximate the -th contribution to the likelihood function with where


which corresponds to the probability mass function of a negative binomial random variable with parameter

, the number of failures until the experiment is stopped, and success probability . As approaches infinity, this quantity converges to a Poisson likelihood.

Following Polson et al. (2013), we rewrite each -th contribution to the approximate likelihood (2) by introducing augmented Pólya-gamma random variables , i.e

where denotes the density of a Pólya-gamma with parameters .

In what follows, we assume that prior knowledge about the unknown parameters is represented by a conditionally Gaussian prior, i.e. , with a possible hierarchical representation for the parameters and . Examples include default informative Gaussian with fixed or scale mixtures of Gaussian where is set to zero and the variance has a suitable hierarchical representation (Park and Casella, 2008; Carvalho et al., 2010; Piironen and Vehtari, 2017).

The approximate posterior based on the conditionally Gaussian prior and approximate likelihood is consistent with the successful Gibbs sampler of Polson et al. (2013); i.e., sampling from the approximate posterior is equivalent to sampling iteratively from the following full conditionals


where and , with and .

The adherence of this approximate posterior to the true posterior highly depends on the values of , with larger values of resulting in better approximations. However, when employing this result in posterior sampling, large values of imply longer computation time due to the computational cost of sampling Pólya-gamma random variables with large parameters. Although the specific choice of remains an open point—discussed later in Section 2.4—in the context of MCMC sampling, we propose to reduce the computational burden related to the sampling of

Pólya-gamma random variables marginalizing the Gaussian distribution in (

3) with respect to the related Pólya-gamma density conditioned on , the last available sampled. Since this marginalization is not in a closed form we introduce a second level of approximation of the true posterior. Specifically, we introduce a density that depends on , defined as the first-order Taylor expansion of the marginalized Gaussian distribution, i.e.


where , , , , and for each the conditional expectation of each is simply

or equivalently


The above construction is eventually used as the building block of efficient Metropolis-Hastings and importance sampling algorithms, as described in the following sections.

2.2 Metropolis-Hastings sampler

We employ the above sampling mechanism as the proposal density in a Metropolis-Hastings algorithm. Consistent with this, at each iteration of the MCMC sampler, an additional step that accepts or rejects the proposed draw is introduced. Specifically, we assume that conditionally on the current state of the chain , a new value is sampled from (5). Then, the acceptance probability


is evaluated to decide whether to accept or reject the proposed , where is the exact posterior distribution.

To compute the acceptance probability in (6), the forward and backward transition densities and must be computed. Consistent with this, approximation (4) is particularly useful: without it, it would be necessary to compute the marginal density where the Pólya-gamma random variables are integrated out. However, the marginalization with respect to the Pólya-gamma density does not lead to a closed form expression; thus, the Metropolis-Hastings algorithm cannot be defined.

Clearly, for increasing the proposal density (5) is closer to the true full conditional distribution; hence, the related acceptance rate will be higher, and the Metropolis-Hastings algorithm will be similar to a Gibbs sampler. On the other hand, setting this parameter to get a lower acceptance rate can result in smaller autocorrelation, and hence a better mixing (Robert and Casella, 2010). We discuss an approach to choose balancing these two extremes in Section 2.4.

2.3 Adaptive importance sampler

The sampling mechanism (5) can also be exploited within the context of importance sampling, where the posterior expectation of a function of the parameter , is evaluated via Monte Carlo integration without direct sampling from . To this end, the general approach is to define an importance density that is used to sample values , which are eventually averaged to obtain an approximation of through

with weights

The efficiency of this algorithm is determined by the ability of the importance density to sample values relevant to the target density. To improve this aspect, we modify the original algorithm and, at each iteration, we simulate values from (5), updating the importance density with (4). Thus, the importance density is adaptively updated based on the previously extracted value and the weights become

2.4 Tuning parameters

The values of the parameters , , have to be tuned to balance the trade-off between acceptance rate and autocorrelation in the Metropolis-Hastings, and to control the mixing of the weights in the importance sampler. However, tuning parameters is not practical, especially for moderate to large . The first simple solution sets all parameters equal to a single value , however, in our experience, this resulted in a low effective sample size for some of the sampled chains.

As an alternative strategy, we choose to tune instead the distance of the proposal density from the target posterior. As the expression of the posterior distribution is unknown, we control the distance between the Poisson and negative binomial likelihood. Based on Teerapabolarn (2012)

, we consider the upper bound of the relative error between the Poisson and negative binomial cumulative distribution functions. This result is particularly useful owing to its simplicity, which allows to analytically derive adequate parameters to bound the error to a specific value. Specifically, if

is a Poisson random variable with mean and is a negative binomial random variable with parameters and , as defined in Section 2.1, we have the following result:

Hence, by setting an upper bound

for the distance between the Poisson and negative binomial distribution, all the values of the parameters

can be automatically derived to obtain a proposal density whose distance from the target posterior is constant for every , even for heterogeneous data. Under our notation , thus which is solved by


where and is the Lambert-W function (Lambert, 1758), which can be computed numerically using standard libraries. Hence, in the algorithm, at the beginning of each iteration, the values are computed according to (7) conditionally on the current value of .

3 Numerical illustrations

3.1 Synthetic data

We conducted a simulation study under various settings to compare the efficiency of the proposed Metropolis-Hastings and importance sampler with that of state-of-the-art methods. We focused on the Hamiltonian Monte Carlo approach—as implemented in the Stan software (Stan Development Team, 2021)—as the successful Metropolis-Hastings with standard random walk proposal would require, different from the proposed approaches, the tuning of parameters, which becomes cumbersome for moderate to elevate . The proposed methods are implemented via the R package bpr, which is written in efficient C++ language exploiting the Rcpp package (Eddelbuettel and Francois, 2011) and available at https://github.com/laura-dangelo/bpr.

Data were generated from a Poisson log-linear model with sample sizes and number of covariates . Specifically, for each combination of and , we consider 50 independent dimensional vectors of counts where each () is sampled from a Poisson distribution with mean , with common parameter . The covariates were generated from continuous or discrete/categorical random variables under the constraints that the continuous variables have mean zero and variance one and that . Reproducible scripts to generate the synthetic data are available at github.com/laura-dangelo/bpr/simulation and as Supplementary Materials.

Two prior distributions for the coefficients were assumed, namely a vanilla Gaussian prior with independent components , , and the more complex horseshoe prior (Carvalho et al., 2010) which allows for the following conditionally Gaussian representation

for , where

is the standard half-Cauchy distribution. To implement the samplers under the horseshoe prior, we used the details of 

Makalic and Schmidt (2016), and fixed the global shrinkage parameter to the “optimal value” , where is the number of non-zero parameters (van der Pas et al., 2017).

Each method introduced in Section 2

was run for 10000 iterations with 5000 of them discarded as burn-in. The convergence of each algorithm was assessed by graphical inspection of the trace plots of the resulting chains. The convergence was satisfactory for all simulations and comparable for all algorithms, as no systematic bias was found in the posterior mean of the estimated parameters.

To assess the efficiency of the proposed methods, we used a proxy of the time per independent sample, which is estimated as the total time (in seconds) necessary to simulate the entire chain, over the effective sample size of the resulting chain. For the proposed adaptive importance sampler, an estimate of the effective sample size was obtained using the quantity , which takes values between 1 and  (Robert and Casella, 2010). Notably, the burn-in samples were removed from the chains before computing the effective sample size. Thus, the obtained times per independent sample do not represent exactly the number of seconds necessary to generate one independent sample—they rather represent an overestimate. Nonetheless, this approach provides a robust and fair comparison between the different competing algorithms. The experiment has been run on a Linux machine with 8 GB DDR4 2400 MHz RAM, CPU Intel i7-7700HQ 3.8 GHz, running R 4.1.1.

Figure 1: Time per independent sample (in logarithmic scale) for the three algorithms. For each combination of and the boxplots represent the distribution of the (log) time (in seconds) over the effective sample size using a Gaussian prior, over 50 replications.

Figure 1 and 2 show, for each combination of and , the distribution of the median time per independent sample for the three algorithms computed on the 50 replications under a Gaussian and horseshoe priors, respectively. The plots are presented in the logarithmic scale for clarity.

For the Gaussian prior the performances of the proposed algorithms are better than those obtained using the HMC implemented in Stan, for small values of the dimension . For , instead, the performances of the HMC are quite competitive with respect to the importance sampling and broadly comparable to the proposed efficient Metropolis-Hastings algorithm. Notably, the differences are less evident with increasing sample size.

Figure 2: Time per independent sample (in logarithmic scale) for the three algorithms. For each combination of and the boxplots represent the distribution of the (log) time (in seconds) over the effective sample size using the horseshoe prior, over 50 replications.

For the horseshoe prior, the proposed Metropolis-Hastings presents a stable superior performance with respect to the HMC sampler implemented in Stan for each sample size ad number of covariates . The performance of the importance sampler remains competitive. As previously observed for the Gaussian prior, the differences are less evident for increasing sample size.

3.2 Spike train data

Herein, we illustrate the proposed sampling method on data of brain activity in mice in response to visual stimulation. This type of data is relatively new, and it arises from the observation of brain activity through the technique of calcium imaging. The novelty of this technique is that it allows to analyze the brain activity at a neuronal level, and it allows to study the associations between a stimulus and the cells’ response. The data set was generated using a small subset of data from the Allen Brain Observatory 

(Allen Institute for Brain Science, 2016), which is an extensive survey of the physiological activity of neurons in the mouse visual cortex. In the original data set, for each neuron the fluorescent calcium traces are recorded, which is a proxy of the neuronal activity, under different experimental conditions. From these traces, it is of interest to detect and analyze the activations of neurons, which correspond to transient spikes of the intracellular calcium level. We applied the method reported by Jewell et al. (2019) as described in de Vries et al. (2020) to extract and count the activations of each neuron, to understand how they are affected by the experimental conditions and the location of the neurons in the brain. In the context of neural studies, this approach is usually referred to as “encoding models”, as the interest is to predict the activity of a population of neurons in response to a given visual stimulus and other experimental conditions. Paninski et al. (2007) reviewed some methods commonly employed in encoding problems. Generalized linear models, which are flexible and yet interpretable, are one of the fundamental tools for investigating the response of neurons to external factors. In particular, the authors assert that assuming a Poisson distribution is a plausible assumption to model spike counts; hence, we regressed the estimated number of neurons’ activation on several continuous and categorical covariates available from the study.

Figure 3:

Coefficients of the regression on the calcium imaging data set: posterior density, with the posterior mean and 95% credible interval (colored dot and segment).

The covariates are the depth of the neuron, the area of the visual cortex where the neuron is located (factor with 6 levels), the cre transgenic mouse line (factor with 13 levels), and the type of visual stimulation (factor with 4 levels). The depth of the neurons is discretized to 22 levels, ranging from 175 to 625 microns, thus, we could obtain a data set having a full factorial design with 5 replications for each available covariate combination. Moreover, we included a quadratic term of the depth to improve the fitting. The obtained data set is made of 920 observations on 23 variables.

We ran the proposed Metropolis-Hastings algorithm for 9000 iterations, discarding the first 5000 as burn-in. The computation time was 98 seconds. The posterior estimates of the coefficients of the dummies on three categorical variables are shown in Figure 

3; and for the numeric covariate depth, the posterior mean and 95% credible intervals were equal to for the linear term, and for the quadratic term. Given these estimates of the coefficients, the number of spikes increased with the largest depths. Moreover, as shown in Figure 3, the response of neurons is heterogeneous across the cre-lines and, coherent with the results of de Vries et al. (2020), we obtained that the mean response is lower for the VISam, VISpm and VISrl areas.

3.3 Betting data

Poisson regression models have been widely adopted in sports analytics, where the response variable is the match score or total number of points. Modelling match scores in association football has recently gained considerable interest owing to the popularity of the betting market, and several modelling approaches have been proposed. Classical methods use only the information of the teams playing

(Dixon and Coles, 1997; Petretta et al., 2021)

, while other authors have explored the possibility of introducing additional information, such as historical data and bookmakers’ odds 

(Egidi et al., 2018; Groll et al., 2019). In general, to describe the number of goals scored by each team in a match, the Poisson distribution is considered a valid assumption (Maher, 1982; Lee, 1997). Herein, we considered data of match scores from the Italian Serie A 2020-2021 season, which are publicly available at http://www.football-data.co.uk. We considered the number of goals as the variable of interest, and, as covariates, we included the fixed effects of the team, several betting odds (for different betting types and bookmakers), and an indicator of whether the team is playing home. The resulting data set has 760 observations on 102 variables.

Figure 4: Betting data: posterior mean and 90% credible interval of the coefficients, obtained using a Gaussian (red circles) and horseshoe (blue triangles) prior. Filled points indicate that the credible interval does not contain zero. Only the names of the non-zero coefficients are shown.

We used the proposed Metropolis-Hastings algorithm with both a Gaussian prior centered at zero and horseshoe prior distribution on the parameters; to compare the results and analyze the variables the two priors select. The results are depicted in Figure 4. For each explanatory variable, the posterior mean and credible interval were obtained using the two priors. The filled symbols indicate that the credible interval does not contain zero, showing that the shrinkage induced by the horseshoe prior selects only a few variables compared to the informative Gaussian prior. Moreover, the horseshoe prior induces a significant reduction of the amplitude of all credible intervals.

A more formal comparison between the two models is obtained using the conditional predictive ordinate (CPO) statistics (Geisser, 1993; Gelfand et al., 1992; Gelfand and Dey, 1994), which is defined for , as , where is the vector of observed data omitting the -th value. Figure 5 shows the boxplots of the resulting CPO statistics for the two models. The graph does not highlight any fundamental difference in the predictive capacity of the two models, implying that the horseshoe prior allows to obtain a more parsimonious model with a similar fit. This is also confirmed by the logarithm of the pseudo-marginal likelihood , which is commonly used as a summary of CPO’s (Ibrahim et al., 2014). It is equal to -1172.055 and to -1167.521 for the models based on the Gaussian prior and the horseshoe, respectively.

Figure 5: Betting data: distribution of the CPO under the two prior distributions.

4 Discussion

Motivated by the lack of specific computational tools for efficient sampling from the posterior distribution of regression parameters in Poisson log-linear models, we introduced an approximate posterior distribution used as building block for the Metropolis-Hastings and importance sampling algorithms. The performances of the proposed solutions, in terms of mixing and computation time, were comparable or superior to those of the efficient Stan implementation of HMC in all scenarios considered and particularly when a hierarchical prior is assumed. The ease of application of our methods is further enhanced by their availability via the R package bpr, which obtains the posterior distribution of several quantities of interest without the need for coding and with minimal tuning.


  • J. H. Albert and S. Chib (1993) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88 (422), pp. 669–679. Cited by: §1.
  • Allen Institute for Brain Science (2016) Allen brain observatory. Note: http://observatory.brain-map.org/visualcoding Cited by: §3.2.
  • J. R. Bradley, S. H. Holan, and C. K. Wikle (2018) Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data (with discussion). Bayesian Analysis 13 (1), pp. 253–310. External Links: Document Cited by: §1.
  • C. M. Carvalho, N. G. Polson, and J. G. Scott (2010) The horseshoe estimator for sparse signals. Biometrika 97 (2), pp. 465–480. Cited by: §1, §2.1, §3.1.
  • A. B. Chan and N. Vasconcelos (2009) Bayesian Poisson regression for crowd counting. In

    2009 IEEE 12th International Conference on Computer Vision

    Vol. , pp. 545–551. Cited by: §1.
  • S. de Vries, J. Lecoq, M. Buice, P. Groblewski, G. Ocker, M. Oliver, D. Feng, N. Cain, P. Ledochowitsch, D. Millman, K. Roll, M. Garrett, T. Keenan, C. Kuan, S. Mihalas, S. Olsen, C. Thompson, W. Wakeman, J. Waters, and C. Koch (2020) A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nature neuroscience 23 (1), pp. 138–151. Cited by: §3.2, §3.2.
  • M. J. Dixon and S. G. Coles (1997) Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics) 46 (2), pp. 265–280. Cited by: §3.3.
  • D. Eddelbuettel and R. Francois (2011) Rcpp: seamless R and C++ integration. Journal of Statistical Software, Articles 40 (8), pp. 1–18. External Links: ISSN 1548-7660 Cited by: §3.1.
  • L. Egidi, F. Pauli, and N. Torelli (2018) Combining historical data and bookmakers’ odds in modelling football scores. Statistical Modelling 18 (5-6), pp. 436–459. Cited by: §3.3.
  • E. L. Frome (1983) The analysis of rates using Poisson regression models. Biometrics 39 (3), pp. 665–674. Cited by: §1.
  • E. L. Frome and H. Checkoway (1985) Use of Poisson regression models in estimating incidence rates and ratios. American Journal of Epidemiology 121 (2), pp. 309–323. Cited by: §1.
  • S. Frühwirth-Schnatter, R. Frühwirth, L. Held, and H. Rue (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19 (479). Cited by: §1.
  • S. Frühwirth-Schnatter and H. Wagner (2006) Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling. Biometrika 93 (4), pp. 827–841. Cited by: §1.
  • S. Geisser (1993) Predictive inference. Chapman and Hall/CRC. Cited by: §3.3.
  • A. Gelfand, D. Dey, and H. Chang (1992) Model determination using predictive distributions with implementation via sampling-based-methods (with discussion). In Bayesian Statistics 4, Cited by: §3.3.
  • A. E. Gelfand and D. K. Dey (1994) Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological) 56 (3), pp. 501–514. Cited by: §3.3.
  • A. Groll, C. Ley, G. Schauberger, and H. V. Eetvelde (2019)

    A hybrid random forest to predict soccer matches in international tournaments

    Journal of Quantitative Analysis in Sports 15 (4), pp. 271–287. Cited by: §3.3.
  • W. K. Hastings (1970) Monte Carlo sampling methods using Markov chains and their applications. Cited by: §1.
  • M. K. Hutchinson and M. C. Holtman (2005) Analysis of count data using Poisson regression. Research in Nursing & Health 28 (5), pp. 408–418. Cited by: §1.
  • J. G. Ibrahim, M. Chen, and D. Sinha (2014) Bayesian survival analysis. In Wiley StatsRef: Statistics Reference Online, pp. . External Links: ISBN 9781118445112 Cited by: §3.3.
  • S. W. Jewell, T. D. Hocking, P. Fearnhead, and D. M. Witten (2019) Fast nonconvex deconvolution of calcium imaging data. Biostatistics 21 (4), pp. 709–726. Cited by: §3.2.
  • S. C. Joshua and N. J. Garber (1990) Estimating truck accident rate and involvements using linear and Poisson regression models. Transportation Planning and Technology 15 (1), pp. 41–58. Cited by: §1.
  • D. Karlis and L. Meligkotsidou (2005) Multivariate Poisson regression with covariance structure. Stat Comput 15, pp. 255–265. Cited by: §1.
  • D. Lambert (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34 (1), pp. 1–14. Cited by: §1.
  • J. H. Lambert (1758) Observations variae in mathesin puram. Acta Helvitica, physico-mathematico-anatomico-botanico-medica 3, pp. 128–168. Cited by: §2.4.
  • A. J. Lee (1997) Modeling scores in the premier league: is Manchester United really the best?. Chance 10 (1), pp. 15–19. Cited by: §3.3.
  • M. J. Maher (1982) Modelling association football scores. Statistica Neerlandica 36 (3), pp. 109–118. Cited by: §3.3.
  • E. Makalic and D. F. Schmidt (2016) A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters 23 (1), pp. 179–182. Cited by: §3.1.
  • S. Miaou (1994) The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident Analysis & Prevention 26 (4), pp. 471–482. Cited by: §1.
  • R. M. Neal (2011) MCMC using hamiltonian dynamics. Handbook of Markov chain Monte Carlo 2 (11), pp. 2. Cited by: §1.
  • J. A. Nelder and R. W. M. Wedderburn (1972) Generalized linear models. Journal of the Royal Statistical Society. Series A (General) 135 (3), pp. 370–384. Cited by: §1.
  • L. Paninski, J. Pillow, and J. Lewi (2007) Statistical models for neural encoding, decoding, and optimal stimulus design. In Computational Neuroscience: Theoretical Insights into Brain Function, P. Cisek, T. Drew, and J. F. Kalaska (Eds.), Progress in Brain Research, Vol. 165, pp. 493–507. Cited by: §3.2.
  • T. Park and G. Casella (2008) The Bayesian lasso. Journal of the American Statistical Association 103 (482), pp. 681–686. Cited by: §1, §2.1.
  • M. Petretta, L. Schiavon, and J. Diquigiovanni (2021) Mar-co: a new dependence structure to model match outcomes in football. External Links: 2103.07272 Cited by: §3.3.
  • J. Piironen and A. Vehtari (2017) Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics 11 (2), pp. 5018 – 5051. Cited by: §1, §2.1.
  • N. G. Polson, J. G. Scott, and J. Windle (2013) Bayesian inference for logistic models using Pólya-gamma latent variables. Journal of the American Statistical Association 108 (504), pp. 1339–1349. Cited by: §1, §1, §2.1, §2.1.
  • C. Robert and G. Casella (2010) Introducing Monte Carlo methods with R. Springer. Cited by: §2.2, §3.1.
  • Stan Development Team (2021) Stan modeling language users guide and reference manual. Note: url: http://mc-stan.org/ Cited by: §3.1.
  • K. Teerapabolarn (2012) The least upper bound on the Poisson-negative binomial relative error. Communications in Statistics - Theory and Methods 41 (10), pp. 1833–1838. Cited by: §2.4.
  • S. van der Pas, B. Szabó, and A. van der Vaart (2017) Adaptive posterior contraction rates for the horseshoe. Electronic Journal of Statistics 11 (2), pp. 3196 – 3225. Cited by: §3.1.