Spatial Tweedie exponential dispersion models

03/12/2020
by   Aritra Halder, et al.
University of Connecticut
0

This paper proposes a general modeling framework that allows for uncertainty quantification at the individual covariate level and spatial referencing, operating withing a double generalized linear model (DGLM). DGLMs provide a general modeling framework allowing dispersion to depend in a link-linear fashion on chosen covariates. We focus on working with Tweedie exponential dispersion models while considering DGLMs, the reason being their recent wide-spread use for modeling mixed response types. Adopting a regularization based approach, we suggest a class of flexible convex penalties derived from an un-directed graph that facilitates estimation of the unobserved spatial effect. Developments are concisely showcased by proposing a co-ordinate descent algorithm that jointly explains variation from covariates in mean and dispersion through estimation of respective model coefficients while estimating the unobserved spatial effect. Simulations performed show that proposed approach is superior to competitors like the ridge and un-penalized versions. Finally, a real data application is considered while modeling insurance losses arising from automobile collisions in the state of Connecticut, USA for the year 2008.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/27/2019

Spatial risk estimation in Tweedie compound Poisson double generalized linear models

Tweedie exponential dispersion family constitutes a fairly rich sub-clas...
08/15/2019

Geographically-dependent individual-level models for infectious diseases transmission

Infectious disease models can be of great use for understanding the unde...
11/06/2019

Iterative Estimation of Mixed Exponential Random Graph Models with Nodal Random Effects

The presence of unobserved node specific heterogeneity in Exponential Ra...
01/26/2019

On doubly robust estimation for logistic partially linear models

Consider a logistic partially linear model, in which the logit of the me...
01/11/2021

A Degradation Performance Model With Mixed-type Covariates and Latent Heterogeneity

Successful modeling of degradation performance data is essential for acc...
05/22/2019

Restricted Spatial Regression Methods: Implications for Inference

The issue of spatial confounding between the spatial random effect and t...
06/14/2020

High-resolution Bayesian mapping of landslide hazard with unobserved trigger event

Statistical models for landslide hazard enable mapping of risk factors a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Prolific geo-tagging of recorded data has allowed for exploration and development of rich frameworks for spatial modeling that permit quantification of the nature of dependence for a chosen response on a specified underlying spatial domain of reference. The type of referencing is broadly divided into (a) areal and, (b) point. Areal referencing involves use of county, zip-code, block or census-tract information, while point referencing relies on stereo-graphic projective co-ordinates, i.e. latitudes and longitudes coupled with projection on a sphere, . A projection on is generally asymptomatic to the earth, with a purpose of providing subjective basis to the spatial analysis of chosen response. Focusing on areally referenced data generating processes, generally the response along with covariates is recorded at an individual level for multiple individuals residing at each areal co-ordinate. Consequently variation in response can be divided into two sources viz., within and between

areal co-ordinates. An ideal model should jointly specify components that account and characterize both sources of variation. Explicitly, this translates to equal importance being given to modeling individual level covariates along with accounting for additional noise introduced by the variability across areal co-ordinates, when creating a modeling framework. In this paper we aspire to create such a modeling framework for a widely used and well celebrated family of probability distributions.

The exponential dispersion (ED) models Jørgensen (1997), have provided a flexible modeling framework for a variety of data generating processes over the last century. Flexibility of such models lie in their ability to account for a variety of perturbations or deviations from the simple linear model framework viz., discreteness, over-dispersion, zero-inflation etc. A comprehensive framework is provided by the generalized linear models (GLMs) which incorporate these probability densities (Nelder and Wedderburn (1972), McCullagh (2018)) within a modeling framework. With each mentioned perturbation, enrichment of the GLM framework proceeds by relaxing unnecessary restrictions, for example requirement of normality); introducing components with a purpose of specifying variation from a specific source, for example frailty models in survival analysis (see Hougaard (1995), Banerjee et al. (2007) and Ibrahim et al. (2014)) or providing additional probability mass at chosen points in the support, for example zero-inflation (Hall (2000), Agarwal et al. (2002), Wang et al. (2015) and, Zhang (2013a)). They are typically specified using a link function that relates the natural parameter to a linear function of chosen covariates. In this paper we concentrate on a particular generalization of GLMs namely the double generalized linear models (DGLMs) Pregibon (1984)

which do not require the dispersion to be fixed across observations. DGLMs have been studied in the context of modeling insurance losses and have proven to be effective, in allowing the investigator to examine the mean-variance relationship through separate models for mean and dispersion (

Jørgensen and Paes De Souza (1994)), Smyth and Jørgensen (2002)) under the assumption of a compound Poisson-gamma (CP-g) response. The CP-g density is a particular member of the ED family, in particular they belong to a more widely known class, the Tweedie ED models (Tweedie (1984)). They feature a positive probability mass at zero along with continuous positive support. Tweedie CP-g GLMs have seen widespread application in a variety of fields including but not limited to modeling precipitation, economic donations, ecological biomass and insurance premiums (Dunn (2003), Smyth (1996), Shono (2008), Foster and Bravington (2013), Lauderdale (2012), Dons et al. (2016), Zhang (2013b), Yang et al. (2018)). To mention other well studied members of Tweedie ED family: Gaussian, Poisson, Gamma and Inverse Gaussian. It provides an additional index parameter to distinguish between these members.

Spatial models have previously not been studied under a DGLM framework. In this paper we focus on introducing a spatial effect for the mean, while the DGLM allows for a varying dispersion. Specification of a spatial effect involves specifying a neighborhood structure through a undirected graph. A graph is defined as a pair , consisting of vertices and edges between the vertices. If two vertices share an edge we refer to them as neighbors. The edges contain no information regarding direction producing an undirected graph. With respect to the model a graph specifies interactions between specified vertices included in the model. The first row of figure (1) shows examples of undirected graphs. In particular (1b) features a fully connected graph, i.e. starting from any vertex one can traverse the graph to reach any other vertex. Figure (1c) shows a complex graph with 3 fully connected components which are again connected with each other. For every such graph we can create two matrices that quantify and express information in these figures namely, an adjacency and a degree matrix. If denotes set cardinality, then adjacency and degree matrices are of order . For the adjacency matrix each row is specific to a vertex containing zeros for non-neighbors and ones for neighbors, while the degree matrix is a diagonal matrix containing the number of neighbors for respective vertices in its diagonal entries. Consequently, the adjacency matrix contains neighborhood information, while the degree matrix contains information regarding sparsity of the graph. Such matrices have seen widespread use to specify spatial interactions, with each vertex being a areal co-ordinate or a point reference (see Cressie (1993), Besag (1974)). Edges in such scenarios are defined if two areal co-ordinate shares a boundaries. While working with zip-codes we consider the centroid for each zip code and term two zip codes as neighbors if associated polygons share boundaries. The second row of figure (1) shows this procedure pictorially. It is evident that based on the choice of areal referencing the associated graph can prove to be fairly complicated.

The proposed approach in this paper estimates said spatial effects by adopting a penalized estimation approach. In particular we penalize the norm induced by the graph Laplacian , which is defined as the difference of the degree matrix and adjacency matrix . We emphasize on joint modeling dependence of mean on covariates and the spatial domain of reference and dispersion on chosen covariates. We then proceed to develop a co-ordinate descent algorithm that models all mentioned model parameters (including the index parameter). Once such a procedure concludes, inference on estimated spatial effects proceeds via krigging

, i.e. spatial interpolation (see

Cressie (1993), Stein (2012)). The primary assumption behind this is that the latent unobserved surface, over , being estimated is smooth and, observed at discrete areal locations or co-ordinates. This also facilitates prediction at chosen unobserved co-ordinates (see Finley et al. (2017)). Including spatial effects within any modeling framework introduces over-dispersion into the model accounting for variability offered by the underlying spatial domain of reference. Naturally not specifying this translates to bias in estimated model coefficients for mean and dispersion models.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 1: Figures showing (a) a graph with 5 vertices (b) a fully connected graph with 5 vertices (c) a graph with 15 vertices with 3 fully connected sub-graphs, each with 5 vertices. Spatial plots showing the (d) 8 counties with zip codes, (e) construction of adjacency based on 8 counties, (f) construction of adjacency based on 282 zip-codes for state of Connecticut.

In section (2) we formally introduce the Tweedie ED family and DGLMs in that context. We then proceed to fit proposed spatial DGLM by formulating it as an optimization problem, section (3) discusses details regarding its formulation. This includes a proposed algorithm that can be used for fitting these models for members in the family. We refrain from introducing unnecessary restrictions on the proposed model framework, which results in a statement of minimal necessary criteria required on probability densities and link functions for said algorithm to be applicable. Details and challenges for individual members of the Tweedie family are also considered, demonstrating that suggested approach is applicable to all members of the family. Section (4) considers evaluating performance of suggested approach in comparison to state-of-the-art penalties like the ridge penalty to showcase efficacy of proposed approach. Section (5) considers fitting a spatial DGLM to automobile collision insurance losses using proposed approach for the state of Connecticut, in the year 2008. Finally, section (6) concludes the paper and discusses possibilities for future research in hindsight of proposed developments.

2 Tweedie ED family and DGLMs

We begin with some notations, let , , be the response, mean and dispersion respectively and , and

denote observed covariate vectors for

-th observation at the -th vertex, where and , with being the total number of observations. To make notations less cumbersome, when convenient, subscripts are omitted in following discussion.

Tweedie densities belong to the more general ED family whose probability density function is given by,

(1)

where , are the natural or, canonical and dispersion parameter respectively and the cumulant function. The Tweedie family specifies an additional index parameter , distinguishing between included members from the ED family. If we denote the mean by and variance by, , then and . Using , and we write . Probability density function in eq. (1) can also be expressed using unit-deviance ; table (1) lists out closed forms for some commonly used members in the Tweedie ED family. A special feature of the ED family is a relationship between mean and variance, that is specified through a variance function unique to each member. For members of the Tweedie ED family, . The value of and therefore expression uniquely determines the member in question.

DGLMs are a generalization of GLMs that provide a more general framework by allowing the dispersion to vary across observations. For appropriate choice of link functions and a DGLM is formulated as

(2)

Additionally, a graphical DGLM starts by specifying additional unobserved parameters , with associated dependence structure being explained by graph . Henceforth, we shall refer to as . This specification extends model in eq. (2) to

(3)

We denote , as covariate vectors corresponding to mean and dispersion respectively while, is a vector consisting of zeros, except for 1 in one entry indicating a particular edge in the graph. Another advantage of specifying is observed through joint modeling of and . Specification of the dependence structure using the graph for affects the quality of estimation for through bias adjustment within the mean model. Denoting the inverse maps for , and the function composition , the negative log-likelihood for the family can be expressed as

(4)
Tweedie EDMs
Normal 0 1
Poisson 1 1
Poisson-gamma
Gamma 2
Inverse Gaussian 3
Table 1: Commonly used members of the Tweedie family of distributions with their index parameter (), variance function (), cumulant function (), canonical parameter (), dispersion (), deviance (), normalizing constant (), support (), and respective parameter spaces for mean () and the natural parameter ().

As is evidenced from table (1) depending on , dispersion can be constant, for example Poisson models; in that case we choose resulting in a simple GLM for . Furthermore, depending on the density may lack a closed-form expression for associated normalizing factor, resulting in complex model structure for instance CP-g models. While dealing with CP-g densities we operate under . Explicitly, CP-g densities have the following form

(5a)
where we have,
(5b)

Expression in eq. (5b) is obtained by marginalizing out the associated random Poisson count from their joint density followed by equating resulting expression with eq. (1). This results in an infinite sum representation for , which was identified as an example of the generalized Bessel function, lacking a closed form. This translates to a lack of normalizing factor for the density, and the need for an approximation to the infinite sum (in eq. 5b) to make the density computationally viable. A fairly accurate approximation for the infinite sum is obtained through locating an approximate mode for terms at , followed by selecting an appropriate radius of inclusion for terms around , for every (for details see section 4, Dunn and Smyth (2005)). Modified saddle-point approximations provide an analytically simpler approach (see Dunn and Smyth (2018)) by approximating densities for the ED family using,

(6)

for Tweedie EDs . For instance, using eqs. (5a), (5b) and (3) with , we have the following negative log-likelihood,

(7a)
(7b)
equivalently under modified saddle-point approximation,
(7c)
(7d)

where is an indicator function. Naturally, choice of can be made based on chosen method of approximation. Evidently, is chosen to be small for saddle point approximations.

3 Optimization Problem

Denoting , model in eqn. (3) is fitted by solving the optimization problem,

(8a)
Here is an additional index parameter associated with model selection. The penalty function is
(8b)
where is a vector having the same dimension as .

Penalty function in its most general form, consists of two parts (a) first part consisting of , denotes a ridge penalization on , (b) second part consists of , which penalizes a graph derivative, the Laplacian in this case, promoting clustering on the graph. Note that if required, entries of can be made zero to not penalize a particular component of . For instance, if we choose to only penalize we have,

(9)

additionally, choosing to have a ridge penalty on and results in,

(10)

where is partitioned appropriately. Furthermore, depending on the nature of covariates, need not be as sparse as in eq. (10). Interdependence between covariates for mean and dispersion models can be penalized in the diagonal blocks. While off-diagonal blocks can be used to allow for penalizing dependence of groups of covariates on the graph, dependence of covariates across the mean and dispersion models can also be accommodated. We shall solve the optimization problem using configurations in eq. (9) followed by discussing necessary modifications required to extend formulated solution for the configuration in eq. (10).

Keeping fixed, we partition , where . If , , i.e. , are possibly including intercepts, let us denote and gradient vectors as and and associated Hessians of order and by and respectively. and are partitioned according to . The optimization problem in eq. (8a, 8b) is solved by iteratively solving three sub-problems in , and ,

(11a)
(11b)
(11c)
where

, are constants.Note that using in eqs. (9) and (10) automatically adjusts penalization for optimization problems in eqs. (11a) and (11b) respectively. Under configuration in eq. (9) solution to the problems are,

(12a)
(12b)

with

are partitioned matrices of order . There exists such that,

The last inequality is proved in Appendix A. There exists constant such that,

(13)

The second equality is obtained by substituting solution in eq. (12b). Positive or negative definiteness of in eq. (13), facilitates choice of . Under ridge penalization for configuration in eq. (10), the solution for eqs. (11a) and (11b) are,

(14a)
(14b)

Following similar logic for proof shown in Appendix A, and choosing , such that and are positive semi-definite (p.s.d) respectively, ensures . Finally we update by maximizing the profile likelihood. Proceedings are concisely summarized in algorithm (1), accompanied by theorem (1) (proof shown in Appendix A) proving convergence for proposed solutions. Theorem (1) shows that under proper choice of scaling constants and , the objective function is guaranteed to decrease for all . The choice of a convergence criteria for the algorithm is based on the objective function , equivalently this implies , i.e., for arbitrarily small the difference in Euclidean norm for estimates can be made arbitrarily small.

Initialize coefficients , index parameter and set Laplacian for graph;
Compute initial and set ; repeat
       Compute and from eqs. (16a) and (16b);
       – compute scaling ;
       if ridge penalty on  then
             Update using eq. (14a);
            
      else
             Update using eq. (12a);
            
       end if
      Update in eq. (16c);
       Compute and from eqs. (16d) and (16e);
       – compute scaling ;
       if ridge penalty on  then
             Update using eq. (14b);
            
      else
             Update using eq. (12b);
            
       end if
      Update ;
       Compute ;
       Set , ;
      
until Convergence of , i.e. ;
Return , .
Algorithm 1 Algorithm for fitting a graphical CP-g DGLM through penalization of the graph Laplacian, accompanied by a ridge penalty for respective model effects.
Theorem 1.

For appropriately chosen constants and , under configuration in eq. (9) and , sequence from eqs. (12a) and (12b) satisfy

Furthermore, for solutions in eqs. (14a) and (14b) under configuration in eq. (10), we have

where is the Euclidean norm.

Attempting to include separate intercepts for overall graph and model coefficients, and respectively, results in confounding, i.e. only is estimable but they are not estimable separately. As an alternative, considering a sufficiently large graph (i.e. a graph with large number of edges ), one could include a group-wise intercept for prescribed groups of edges in the graph, for example, all strongly connected components. Given the nature of we can penalize intercepts for neighboring groups to behave similarly, coupled with a ridge penalization to control their magnitude. However, rather than solving the estimability issue, this reduces the extent of it by confounding the intercept for chosen baseline group with true model intercept, .

In the ensuing discussion we start by deriving necessary conditions required for link functions and to satisfy algorithm (1), then we proceed to provide examples for different members of the Tweedie ED family under commonly used link functions. To maintain brevity we postpone necessary details to Appendix A. Unless explicitly stated, the link functions are general and not necessarily canonical. Observe that existence of a global minima is ensured by convexity of the likelihood function for member in question under specified link functions. Also, the chosen penalty function is convex, since is positive-semi definite for . The conditions for theorem (2) remain fairly general, i.e. commonly used link functions, both canonical and general satisfy these necessary conditions. Next, we present illustrations for members of Tweedie ED family (listed in table (1)).

Theorem 2.

If the negative log-likelihood is convex under chosen link functions , with inverses , , which are

  • monotonic and invertible, implying and exist,

  • smooth inverses, implying and are at least twice continuously differentiable,

with and, terms , , and in eqs. (16a–e) are well-defined in a neighborhood of the minima for objective function , proposed algorithm (1) produces estimates that satisfy conclusions of theorem (1).

Remark 1.

Convexity of the negative log-likelihood ensures uniqueness of and , given that proposed penalty is convex.

Remark 2.

For simplicity one could assume that and

are cumulative distribution functions of some random variable. In that case it is sufficient to have a differentiable probability density.

3.1 Illustrative Examples

Algorithm 1 is applicable to all listed members (see table (1)) of the Tweedie ED family. Referring to the theorem (2) (see proof in Appendix A), quantities necessary for specifying the algorithm are, , , and . However, while providing details for different members of the Tweedie ED family under applicable link functions and , examining the existence and specifying , , and , , should be sufficient.

1. Normal:

The normal or Gaussian distribution is a member of Tweedie ED family with index parameter

. From table (1), , , and resulting in the likelihood, . Commonly, link functions for is taken to be identity, i.e. which is also the canonical link and for dispersion , . It follows that , , and . Additionally, and .

2. Poisson: Poisson densities inherently feature a constant dispersion, as seen from table (1) and are obtained by setting . Also, for Poisson, , , and resulting in the likelihood, . The canonical link function is , however, other commonly used choices are and . Therefore, or or identity respectively based on listed choices, while . Based on choice of the link function, (a) , , (b) ,