1 Introduction
Prolific geotagging of recorded data has allowed for exploration and development of rich frameworks for spatial modeling that permit quantification of the nature of dependence for a chosen response on a specified underlying spatial domain of reference. The type of referencing is broadly divided into (a) areal and, (b) point. Areal referencing involves use of county, zipcode, block or censustract information, while point referencing relies on stereographic projective coordinates, i.e. latitudes and longitudes coupled with projection on a sphere, . A projection on is generally asymptomatic to the earth, with a purpose of providing subjective basis to the spatial analysis of chosen response. Focusing on areally referenced data generating processes, generally the response along with covariates is recorded at an individual level for multiple individuals residing at each areal coordinate. Consequently variation in response can be divided into two sources viz., within and between
areal coordinates. An ideal model should jointly specify components that account and characterize both sources of variation. Explicitly, this translates to equal importance being given to modeling individual level covariates along with accounting for additional noise introduced by the variability across areal coordinates, when creating a modeling framework. In this paper we aspire to create such a modeling framework for a widely used and well celebrated family of probability distributions.
The exponential dispersion (ED) models Jørgensen (1997), have provided a flexible modeling framework for a variety of data generating processes over the last century. Flexibility of such models lie in their ability to account for a variety of perturbations or deviations from the simple linear model framework viz., discreteness, overdispersion, zeroinflation etc. A comprehensive framework is provided by the generalized linear models (GLMs) which incorporate these probability densities (Nelder and Wedderburn (1972), McCullagh (2018)) within a modeling framework. With each mentioned perturbation, enrichment of the GLM framework proceeds by relaxing unnecessary restrictions, for example requirement of normality); introducing components with a purpose of specifying variation from a specific source, for example frailty models in survival analysis (see Hougaard (1995), Banerjee et al. (2007) and Ibrahim et al. (2014)) or providing additional probability mass at chosen points in the support, for example zeroinflation (Hall (2000), Agarwal et al. (2002), Wang et al. (2015) and, Zhang (2013a)). They are typically specified using a link function that relates the natural parameter to a linear function of chosen covariates. In this paper we concentrate on a particular generalization of GLMs namely the double generalized linear models (DGLMs) Pregibon (1984)
which do not require the dispersion to be fixed across observations. DGLMs have been studied in the context of modeling insurance losses and have proven to be effective, in allowing the investigator to examine the meanvariance relationship through separate models for mean and dispersion (
Jørgensen and Paes De Souza (1994)), Smyth and Jørgensen (2002)) under the assumption of a compound Poissongamma (CPg) response. The CPg density is a particular member of the ED family, in particular they belong to a more widely known class, the Tweedie ED models (Tweedie (1984)). They feature a positive probability mass at zero along with continuous positive support. Tweedie CPg GLMs have seen widespread application in a variety of fields including but not limited to modeling precipitation, economic donations, ecological biomass and insurance premiums (Dunn (2003), Smyth (1996), Shono (2008), Foster and Bravington (2013), Lauderdale (2012), Dons et al. (2016), Zhang (2013b), Yang et al. (2018)). To mention other well studied members of Tweedie ED family: Gaussian, Poisson, Gamma and Inverse Gaussian. It provides an additional index parameter to distinguish between these members.Spatial models have previously not been studied under a DGLM framework. In this paper we focus on introducing a spatial effect for the mean, while the DGLM allows for a varying dispersion. Specification of a spatial effect involves specifying a neighborhood structure through a undirected graph. A graph is defined as a pair , consisting of vertices and edges between the vertices. If two vertices share an edge we refer to them as neighbors. The edges contain no information regarding direction producing an undirected graph. With respect to the model a graph specifies interactions between specified vertices included in the model. The first row of figure (1) shows examples of undirected graphs. In particular (1b) features a fully connected graph, i.e. starting from any vertex one can traverse the graph to reach any other vertex. Figure (1c) shows a complex graph with 3 fully connected components which are again connected with each other. For every such graph we can create two matrices that quantify and express information in these figures namely, an adjacency and a degree matrix. If denotes set cardinality, then adjacency and degree matrices are of order . For the adjacency matrix each row is specific to a vertex containing zeros for nonneighbors and ones for neighbors, while the degree matrix is a diagonal matrix containing the number of neighbors for respective vertices in its diagonal entries. Consequently, the adjacency matrix contains neighborhood information, while the degree matrix contains information regarding sparsity of the graph. Such matrices have seen widespread use to specify spatial interactions, with each vertex being a areal coordinate or a point reference (see Cressie (1993), Besag (1974)). Edges in such scenarios are defined if two areal coordinate shares a boundaries. While working with zipcodes we consider the centroid for each zip code and term two zip codes as neighbors if associated polygons share boundaries. The second row of figure (1) shows this procedure pictorially. It is evident that based on the choice of areal referencing the associated graph can prove to be fairly complicated.
The proposed approach in this paper estimates said spatial effects by adopting a penalized estimation approach. In particular we penalize the norm induced by the graph Laplacian , which is defined as the difference of the degree matrix and adjacency matrix . We emphasize on joint modeling dependence of mean on covariates and the spatial domain of reference and dispersion on chosen covariates. We then proceed to develop a coordinate descent algorithm that models all mentioned model parameters (including the index parameter). Once such a procedure concludes, inference on estimated spatial effects proceeds via krigging
, i.e. spatial interpolation (see
Cressie (1993), Stein (2012)). The primary assumption behind this is that the latent unobserved surface, over , being estimated is smooth and, observed at discrete areal locations or coordinates. This also facilitates prediction at chosen unobserved coordinates (see Finley et al. (2017)). Including spatial effects within any modeling framework introduces overdispersion into the model accounting for variability offered by the underlying spatial domain of reference. Naturally not specifying this translates to bias in estimated model coefficients for mean and dispersion models.In section (2) we formally introduce the Tweedie ED family and DGLMs in that context. We then proceed to fit proposed spatial DGLM by formulating it as an optimization problem, section (3) discusses details regarding its formulation. This includes a proposed algorithm that can be used for fitting these models for members in the family. We refrain from introducing unnecessary restrictions on the proposed model framework, which results in a statement of minimal necessary criteria required on probability densities and link functions for said algorithm to be applicable. Details and challenges for individual members of the Tweedie family are also considered, demonstrating that suggested approach is applicable to all members of the family. Section (4) considers evaluating performance of suggested approach in comparison to stateoftheart penalties like the ridge penalty to showcase efficacy of proposed approach. Section (5) considers fitting a spatial DGLM to automobile collision insurance losses using proposed approach for the state of Connecticut, in the year 2008. Finally, section (6) concludes the paper and discusses possibilities for future research in hindsight of proposed developments.
2 Tweedie ED family and DGLMs
We begin with some notations, let , , be the response, mean and dispersion respectively and , and
denote observed covariate vectors for
th observation at the th vertex, where and , with being the total number of observations. To make notations less cumbersome, when convenient, subscripts are omitted in following discussion.Tweedie densities belong to the more general ED family whose probability density function is given by,
(1) 
where , are the natural or, canonical and dispersion parameter respectively and the cumulant function. The Tweedie family specifies an additional index parameter , distinguishing between included members from the ED family. If we denote the mean by and variance by, , then and . Using , and we write . Probability density function in eq. (1) can also be expressed using unitdeviance ; table (1) lists out closed forms for some commonly used members in the Tweedie ED family. A special feature of the ED family is a relationship between mean and variance, that is specified through a variance function unique to each member. For members of the Tweedie ED family, . The value of and therefore expression uniquely determines the member in question.
DGLMs are a generalization of GLMs that provide a more general framework by allowing the dispersion to vary across observations. For appropriate choice of link functions and a DGLM is formulated as
(2) 
Additionally, a graphical DGLM starts by specifying additional unobserved parameters , with associated dependence structure being explained by graph . Henceforth, we shall refer to as . This specification extends model in eq. (2) to
(3) 
We denote , as covariate vectors corresponding to mean and dispersion respectively while, is a vector consisting of zeros, except for 1 in one entry indicating a particular edge in the graph. Another advantage of specifying is observed through joint modeling of and . Specification of the dependence structure using the graph for affects the quality of estimation for through bias adjustment within the mean model. Denoting the inverse maps for , and the function composition , the negative loglikelihood for the family can be expressed as
(4)  
Tweedie EDMs  

Normal  0  1  
Poisson  1  1  
Poissongamma  –  
Gamma  2  
Inverse Gaussian  3 
As is evidenced from table (1) depending on , dispersion can be constant, for example Poisson models; in that case we choose resulting in a simple GLM for . Furthermore, depending on the density may lack a closedform expression for associated normalizing factor, resulting in complex model structure for instance CPg models. While dealing with CPg densities we operate under . Explicitly, CPg densities have the following form
(5a)  
where we have,  
(5b) 
Expression in eq. (5b) is obtained by marginalizing out the associated random Poisson count from their joint density followed by equating resulting expression with eq. (1). This results in an infinite sum representation for , which was identified as an example of the generalized Bessel function, lacking a closed form. This translates to a lack of normalizing factor for the density, and the need for an approximation to the infinite sum (in eq. 5b) to make the density computationally viable. A fairly accurate approximation for the infinite sum is obtained through locating an approximate mode for terms at , followed by selecting an appropriate radius of inclusion for terms around , for every (for details see section 4, Dunn and Smyth (2005)). Modified saddlepoint approximations provide an analytically simpler approach (see Dunn and Smyth (2018)) by approximating densities for the ED family using,
(6) 
for Tweedie EDs . For instance, using eqs. (5a), (5b) and (3) with , we have the following negative loglikelihood,
(7a)  
(7b)  
equivalently under modified saddlepoint approximation,  
(7c)  
(7d) 
where is an indicator function. Naturally, choice of can be made based on chosen method of approximation. Evidently, is chosen to be small for saddle point approximations.
3 Optimization Problem
Denoting , model in eqn. (3) is fitted by solving the optimization problem,
(8a)  
Here is an additional index parameter associated with model selection. The penalty function is  
(8b)  
where is a vector having the same dimension as . 
Penalty function in its most general form, consists of two parts (a) first part consisting of , denotes a ridge penalization on , (b) second part consists of , which penalizes a graph derivative, the Laplacian in this case, promoting clustering on the graph. Note that if required, entries of can be made zero to not penalize a particular component of . For instance, if we choose to only penalize we have,
(9) 
additionally, choosing to have a ridge penalty on and results in,
(10) 
where is partitioned appropriately. Furthermore, depending on the nature of covariates, need not be as sparse as in eq. (10). Interdependence between covariates for mean and dispersion models can be penalized in the diagonal blocks. While offdiagonal blocks can be used to allow for penalizing dependence of groups of covariates on the graph, dependence of covariates across the mean and dispersion models can also be accommodated. We shall solve the optimization problem using configurations in eq. (9) followed by discussing necessary modifications required to extend formulated solution for the configuration in eq. (10).
Keeping fixed, we partition , where . If , , i.e. , are possibly including intercepts, let us denote and gradient vectors as and and associated Hessians of order and by and respectively. and are partitioned according to . The optimization problem in eq. (8a, 8b) is solved by iteratively solving three subproblems in , and ,
(11a)  
(11b)  
(11c)  
where 
, are constants.Note that using in eqs. (9) and (10) automatically adjusts penalization for optimization problems in eqs. (11a) and (11b) respectively. Under configuration in eq. (9) solution to the problems are,
(12a)  
(12b) 
with
are partitioned matrices of order . There exists such that,
The last inequality is proved in Appendix A. There exists constant such that,
(13) 
The second equality is obtained by substituting solution in eq. (12b). Positive or negative definiteness of in eq. (13), facilitates choice of . Under ridge penalization for configuration in eq. (10), the solution for eqs. (11a) and (11b) are,
(14a)  
(14b) 
Following similar logic for proof shown in Appendix A, and choosing , such that and are positive semidefinite (p.s.d) respectively, ensures . Finally we update by maximizing the profile likelihood. Proceedings are concisely summarized in algorithm (1), accompanied by theorem (1) (proof shown in Appendix A) proving convergence for proposed solutions.
Theorem (1) shows that under proper choice of scaling constants and , the objective function is guaranteed to decrease for all . The choice of a convergence criteria for the algorithm is based on the objective function , equivalently this implies , i.e., for arbitrarily small the difference in Euclidean norm for estimates can be made arbitrarily small.
Theorem 1.
Attempting to include separate intercepts for overall graph and model coefficients, and respectively, results in confounding, i.e. only is estimable but they are not estimable separately. As an alternative, considering a sufficiently large graph (i.e. a graph with large number of edges ), one could include a groupwise intercept for prescribed groups of edges in the graph, for example, all strongly connected components. Given the nature of we can penalize intercepts for neighboring groups to behave similarly, coupled with a ridge penalization to control their magnitude. However, rather than solving the estimability issue, this reduces the extent of it by confounding the intercept for chosen baseline group with true model intercept, .
In the ensuing discussion we start by deriving necessary conditions required for link functions and to satisfy algorithm (1), then we proceed to provide examples for different members of the Tweedie ED family under commonly used link functions. To maintain brevity we postpone necessary details to Appendix A. Unless explicitly stated, the link functions are general and not necessarily canonical. Observe that existence of a global minima is ensured by convexity of the likelihood function for member in question under specified link functions. Also, the chosen penalty function is convex, since is positivesemi definite for . The conditions for theorem (2) remain fairly general, i.e. commonly used link functions, both canonical and general satisfy these necessary conditions. Next, we present illustrations for members of Tweedie ED family (listed in table (1)).
Theorem 2.
If the negative loglikelihood is convex under chosen link functions , with inverses , , which are

monotonic and invertible, implying and exist,

smooth inverses, implying and are at least twice continuously differentiable,
with and, terms , , and in eqs. (16a–e) are welldefined in a neighborhood of the minima for objective function , proposed algorithm (1) produces estimates that satisfy conclusions of theorem (1).
Remark 1.
Convexity of the negative loglikelihood ensures uniqueness of and , given that proposed penalty is convex.
Remark 2.
For simplicity one could assume that and
are cumulative distribution functions of some random variable. In that case it is sufficient to have a differentiable probability density.
3.1 Illustrative Examples
Algorithm 1 is applicable to all listed members (see table (1)) of the Tweedie ED family. Referring to the theorem (2) (see proof in Appendix A), quantities necessary for specifying the algorithm are, , , and . However, while providing details for different members of the Tweedie ED family under applicable link functions and , examining the existence and specifying , , and , , should be sufficient.
1. Normal:
The normal or Gaussian distribution is a member of Tweedie ED family with index parameter
. From table (1), , , and resulting in the likelihood, . Commonly, link functions for is taken to be identity, i.e. which is also the canonical link and for dispersion , . It follows that , , and . Additionally, and .2. Poisson: Poisson densities inherently feature a constant dispersion, as seen from table (1) and are obtained by setting . Also, for Poisson, , , and resulting in the likelihood, . The canonical link function is , however, other commonly used choices are and . Therefore, or or identity respectively based on listed choices, while . Based on choice of the link function, (a) , , (b) ,
Comments
There are no comments yet.