## 1 Introduction

Spatial autoregressive models as well as conditional autoregressive models require the definition of a suitable spatial dependence structure via the so-called spatial weights matrix (see, e.g., Ver Hoef et al. 2018; Elhorst 2010

). This matrix, like an adjacency matrix in graphical models, defines how the locations are connected, meaning which locations are possibly dependent and by which extend. The spatial autoregressive term is then classically modelled as multiple of the product of this predefined weighting matrix and the vector of observations. Of course, estimated spatial autoregressive coefficients, therefore, depend on the choice of this matrix. Thus, all coefficients as well as inference on these parameters should always be done conditional on the definition of the weighting scheme. Hence, several papers analyze the impact of misspecified weighting matrices, e.g.

Stakhovych and Bijmolt (2009). From this perspective, the current practice is quite unsatisfactory. Moreover, the entire matrix of spatial weights can not be estimated by classical approaches like the maximum likelihood method (cf. Lee 2004), generalized method of moments (cf.

Kelejian and Prucha 1999), or least squares procedures (cf. Kelejian and Prucha 1998), because of the curse of dimensionality. Besides, classical ordinary least squares estimators are biased in the presence of spatial dependence. In addition, we often observe anisotropic dependencies with unobserved exogenous effects (e.g.

Deng 2008; Fassò 2013), such the classical parametric approach might cause issues because of misspecified dependencies.In this paper, we propose a penalized regression approach to estimate the entire spatial weighting matrix of a dependent spatiotemporal process as well as an unknown number of change points, which may occur at different time points depending on the location. In current literature, there are only a few papers proposing lasso-type estimation procedures for similar spatiotemporal models. Thus, the majority of studies applying methods of spatial econometrics still assume that the spatial weights matrix is known a-priori. Pioneering works on estimating spatial dependence structures can be found in Bhattacharjee and Jensen-Butler (2013); Lam et al. (2013); Ahrens and Bhattacharjee (2015). Whereas Bhattacharjee and Jensen-Butler (2013) propose a two-step estimation procedure, which is based on a spatial autocovariance matrix estimated in the first step, Lam et al. (2013); Ahrens and Bhattacharjee (2015) consider (adaptive) lasso-type estimation procedures. Moreover, Lam and Souza (2016) focus on a spatiotemporal model setup with exogenous regressors, where the matrix of spatial weights has a block diagonal structure. Contrary to these approaches, we account for structural beaks, which may occur in the temporal dimension. This leads to a flexible modelling approach, but a high number of parameters to be estimated. We, therefore, propose a two-step adaptive lasso estimation approach to estimate the spatial dependence and change points in the local means simultaneously. It is important to note that these change points are not necessarily the same for all spatial locations, but if there is a change in one location, it is affecting all other locations as well. Hence, the major issue for this setting is to distinguish between fluctuations due to the spatial dependence or due to a structural break.

The considered spatiotemporal autoregressive model is presented in the following Section 2. In this section, we particularly focus on the number of parameters and some specific issues for this setting. In the subsequent Section 3, we propose a suitable two-step lasso approach. To evaluate the performance of the proposed procedure, a series of Monte Carlo simulations is presented in the ensuing section. Eventually, we illustrate the procedure by an empirical example of Berlin condominium prices from 1995-2014. Section 6 concludes the paper.

## 2 Spatiotemporal Model

We consider that the data is generated by an univariate spatiotemporal process , where is the temporal domain and is the spatial domain of the process. The most common way is to consider discrete time points, such that the temporal domain can be defined as subset of all integers . However, our approach is also suitable for continuous spatiotemporal series, i.e., . The spatial domain can either be defined as subset of the -dimensional real numbers, , or as subset of the -dimensional integers . For the first case, continuous spatial processes and point processes are covered, e.g., data of ground-level measurement stations (), financial or economic data of certain regions, states, municipalities (), or atmospheric or satellite data (). In contrast, the second case covers all kind of spatial lattice data, like images (), CT scans (), or raster data (). Moreover, the locations can also be understood as vector of characteristics associated with . For economic studies, like for instance on regional labor markets, these characteristics could comprise the region’s population, gross domestic product, area etc. However, note that should have positive volume (see Cressie 1993).

Further, we assume that we observe the process for time points at a constant set of locations . For a convenient notation, we denote the time point as an index in the following sections, i.e., the vectors of observations of the spatiotemporal random process are given by for . Furthermore, we introduce the -dimensional matrix . We additionally consider that there might be an unknown number of structural breaks in the data. These breaks may occur at different time points for each location and they can be of different magnitude. However, we assume that the changes only affect the mean level of the data. Thus, at time point , the considered spatial autoregressive model is specified as

(1) |

where the vector denotes an independent and identically distributed random error. The mean levels are defined by . Thus, is given by the set . Since the mean level can change at different points of time differently for all locations, there are possible structural breaks. However, we assume that there is only a reasonable number of unknown change points. We will go into more detail on that in the following Section 3. The spatial dependence is assumed to be defined by the -dimensional spatial weighting matrix , which is constant over time. Moreover, the diagonal elements of the matrix are assumed to be zero in order to prevent observations influencing themselves.

In spatial econometrics, the classical approach is to replace the unknown spatial dependence structure by a linear combination or , where it assumed that is a predefined, non-stochastic weighting matrix describing the dependence. One might get an insight on the structure of by looking at the spatial covariogram or semivariogram (see Cressie and Wikle 2011). In practice, however, the true underlying matrix can not easily be assessed, and is, therefore, estimated by maximizing a certain goodness-of-fit measure, like the log-likelihood, in-sample fits, information criteria, or cross validations over certain classical weighting schemes. For spatial autoregressive panels it is quite likely to find the true weighting matrix by this approach, if the true one is under the candidate schemes (see Anselin et al. 1996). Other papers, like Stakhovych and Bijmolt (2009), analyze how large is the effect of misspecified weighting matrices, as one might think that missing spatial links could be catched up by other linkages.

In contrast to this classical approach, we estimate the entire spatial weighting matrix by a penalized regression. Simultaneously, the change points and mean shifts are estimated. However, this yields another issue, namely that the shifts in the mean can hardly be distinguished from changes due to a positive spatial dependence. Obviously, the model (1) can be rewritten as

(2) |

where stands for the

-dimensional identity matrix. Thus, a change

at location does not only effect the observed value at location , but also all other locations via the spatial dependence implied by . Denote the inverse matrix by . Having a closer look on the expected valuewe see that the model can easily estimated if either the spatial dependence or the mean levels are known. However, if both parameters have to be estimated, one can only distinguish between or by the fact that the spatial dependence structure is constant over time, whereas the mean levels depend on . It is important to note that the actual mean , although we refer to as (local) mean level, since changes in the mean can only due to . For illustration of the interdependence, we depict a simulated random field as time-series plots in Figure 1. There are two change points at and , where the mean level drawn as solid red line changes only at the first location. Due to the spatial dependence, the observed level in the first location is, however, higher than the red line and the change ripples out to other locations, as it can be seen by the overall mean level depicted by the blue curve. Thus, all mean changes and the spatial dependence structure must be estimated simultaneously, i.e., there are parameters to be estimated. We therefore propose a two-step lasso approach, which is described in more detail in the next section.

The spatiotemporal autoregressive model is well-defined and stationary, if the series

(3) |

converges. This is the case, if and only if

where is the spectral radius. Moreover, if , then there exists a norm , such that . In spatial statistics, the weighting matrix is often row-standardized, such that and for all . The row-standardization, however, also implies that the elements are less than or equal to one and that all absolute row sums are equal to one. Lam and Souza (2016) consider the necessary condition that .

## 3 Two-Step Lasso Estimation

The motivation behind the two step approach is to initially separate the change point detection problem from the spatial dependence. Thus, we initially estimate a set of candidate change points for each location . In the first step, we consider only the univariate time series , i.e., is the -th column of . Even ignoring the spatial dependence, can be estimated without a loss of information, as the spatial dependence is not time-varying and there is no additional source of temporal dependence. In addition, the number of estimated parameters in the full model is reduced, if there are less candidate change points than time points , i.e., the cumulated cardinalities are less than . Then, these sets of candidate change points are passed to the full model to estimate and simultaneously. In the second step, we suppose that for all locations the mean level equals , if . As the set of all estimated change points in the second step is always a subset of , i.e., , it is important that we do not underestimate the number of possible change points in the first step.

### 3.1 First step: candidate change points

To find all candidate change points for location , consider the following linear models for , where

and is a lower triangular matrix and denotes the indicator function on the set . The vector of coefficients is used to find the candidate change points. However, this specific model setup leads to a case where the number of estimated parameters is exactly the number of observations . Any solution using the full matrix will, therefore, exactly fit the residuals to the observed values and will not provide a meaningful interpretation. As a consequence, we need to be able to set at least some of the elements of to zero. Thus, we propose to use a penalized regression approach. In the recent literature, it has been shown that certain types of penalized regressions can be used to estimate change points. For instance, Chan et al. (2014) use a group-lasso approach to estimate the structural breaks in autoregressive time series. In contrast, Lee et al. (2018)

utilize a SCAD-type penalty to provide an estimator for change points with the oracle property in quantile regression.

For our setup, we propose to use the adaptive lasso penalty, which is introduced by Zou (2006). The underlying differences in can be large due to change points and they are 0 whenever no change point is present, so we need our model to be at least nearly unbiased for large values in . Moreover, and more importantly, the approach must ensure that the correct entries are consistently set to zero. However, standard lasso approaches does not necessarily exhibit these properties, as shown in Zou (2006). The authors suggest a pre-estimation step for

, which can be done by an ordinary least squares (OLS) or ridge-regression. The estimated coefficients of this pre-estimation step are then employed as weighting factors

for . To be precise, the estimates are given by(4) |

where the observed process is , the -norm is denoted by , and is the Hadamard product. The elements of the weights vectors are specified as , where is a tuning parameter influencing the thresholding function. The estimates of the pre-estimation step are denoted by . This convex optimization problem can be solved by corresponding optimization approaches such as the LARS-algorithm of Efron et al. (2004). Given an appropriate choice of the penalization parameter , the estimation for does exhibit the oracle-properties, i.e., consistency in variable selection as well as asymptotic normality with expectation zero in the differences . This grants that the true features are selected.

In the above-mentioned pre-estimation step, the parameters are obtained by ridge-regression, as the number of parameters coincide with the number observations or time points. Via the parameter the impact of larger estimated values on the penalty term can be strengthened or diminished. For our setting, we suggest setting to get the direct impact of each parameter estimation.

To select , an -fold cross-validation (CV) should be performed over an exponentially decaying grid of possible values of . This approach is implemented, for instance, in the R-package `glmnet`

using a coordinate descent approach (see Friedman et al. 2010). For instance, the assumed can be chosen, such that the root-mean-squared error of the CV is minimal. However, we want the resulting model to be very parsimonious, i.e., it should not model any variation created by , so the penalty parameter is chosen as the largest value

, for which the corresponding mean-squared-error does not significantly differ from the smallest mean-squared-error by one standard deviation. These two models do not perform significantly different, but the model with the higher

will always induce a sparsity, which is greater than or equal to the model with the smaller value of .Eventually, the zero elements in the estimated can be interpreted as candidate change points, as fitted values are given by the cumulated sums of the estimated coefficients. Thus, each estimated , which is not set to zero, represents a change point at time , whereas zero coefficients indicate the continuation of the mean in the subsequent periods. This concludes the first step of our approach.

### 3.2 Second step: estimation of the full model

In the second step, we estimate the full model, i.e., both the changes in the mean as well as the spatial dependence . For this estimation, we propose to use a further adaptive lasso approach. In particular, (1) can be rewritten in a linear form as

(5) |

where , , and is a lower triangular, block diagonal matrix. The vectorization operator is denoted by and is the Kronecker product. To be precise, is the direct sum of triangular matrices of dimension . The vector of coefficients is defined as , where . In addition, we assume that for and all . That is, the coefficients are forced to be zero for all time points of each location , where no change is expected, according to the results of the first step.

Beside the mean-level term , we account for possible spatial dependence by . The coefficients are the vectorized spatial weights, i.e., . It is worth noting that the diagonal elements of are assumed to be zero, such that we avoid self-dependencies and the number of estimated parameters in is reduced.

Eventually, the estimated coefficients of the full model are given by the convex optimization problem

The observed vector is denoted by . The weights of the regularization term, and correspond to the penalty weights estimated in the pre-estimation of the adaptive lasso. It can either be done by ridge regression or OLS, if the number of parameters is smaller than the number of observations. Of course, this depends on the number of candidate change points estimated in the first step. More precisely, the weights are specified analogously to the weights in the first step, i.e., . For our settings, we minimized the objective function by a coordinate descent algorithm implemented in the R package `glmnet`

(cf. Friedman et al. 2010).

In contrast to the approach of Lam and Souza (2016); Lam et al. (2013), the condition that is more flexible, but it does not ensure that the estimated process is well-defined in terms of the non-singularity of . However, the approach can easily be implemented and works very well for moderate levels of spatial dependence. If the spatial process is, however, close to non-stationarity, i.e., is close to one, a constrained lasso approach must be implemented for the second step. More precisely, we observe a good estimation performance for constrained lasso approaches following the idea of James et al. (2012, 2018). In this paper, we, however, focus on the simpler approach with a convex objective function, which have great practical advantages, while leading to good results for the majority of degrees of spatial dependence.

Moreover, it is not feasible to model negative spatial dependence, which, however, rarely occurs in practice (cf. Cressie, 1993). On the contrary, least squares estimators under the presence of spatial dependence are not generally sign-consistent (see Lam et al., 2013). Thus, this restriction on positive spatial dependence warrants sign-consistency. In the following section, we focus on the performance of the proposed estimation scheme for different magnitudes of spatial dependence. In particular, we focus on moderate levels of spatial dependence, i.e., .

## 4 Monte Carlo Simulations

In this section, we evaluate the performance of the lasso estimators by a series of Monte Carlo simulations. First, we analyze (a) how well the algorithm detects the existence of spatially dependent observations, (b) how well the algorithms detects spatially independent observations, and (c) how well the changes in the mean level are captured by the approach. We perform these simulations for different specifications of the true underlying spatial dependence . More precisely, we consider classical contiguity schemes, which should also be modeled well by classical approaches if is correctly specified, as well as random patterns, and block structures in the spatial dependence. Secondly, we analyze how the performance differs between different levels of spatial dependence. Generally, one would expect that the lasso estimators perform worse for small spatial dependence, as the resulting effects are weaker. Moreover, if is close to one, we could be faced to difficulties as the maximum of the objective function is close to regions, where the model is not well-defined. Thus, we evaluate the performance for weak (), moderate (), and large spatial dependence ().

The spatiotemporal process is simulated on a spatial lattice , i.e., , for discrete time points with . To define the spatial dependence, we consider the three row-standardized weighting schemes illustrated in the first row in Figure 2. First, we consider a Queen’s contiguity matrix depicted on the left hand side. For this scheme, dependencies cannot be found in areas, which are far away. It is also assumed that if there is a connection from area to , then there must be a feedback from to to as well. This specific structure is quite commonly used in empirical research, like for instance in Büttner (1999); Ross et al. (2007); Tsai et al. (2009). Secondly, a randomly sampled spatial dependence structure

is considered, meaning a link between two locations exists with a probability of 20 percent. Thus, there is not necessarily a connection from

to , if and are connected. Note that this matrix is also row-standardized, such that . Thirdly, we assume that only a few locations are dependent in . Moreover, these connected locations should appear close together. We, therefore, randomly assign 3 blocks of minimal size up to maximal size with positive weights. The blocks are not necessarily quadratic. In contrast to the contiguity matrix, the dependent observations can but must not occur in the closest neighborhood. All spatial weighting matrices , which have random features, are sampled again for every iteration of the simulation. Eventually, with .Besides, we simulate at least one change point within the time period. To get realistic conditions, we divide the locations into two groups, so that the change points do not occur for all locations at the same point of time. To be precise, the first group containing 10 locations has two changes in the mean at and , whereas only one structural break occurs at for the second group, that is, the remaining 15 locations. Consequently, there are three mean changes at equidistant time points, namely at , , and . Regarding the locations of the first group, the zero-mean level increases to at and decreases again to at . In contrast, the mean level is for and for for the second group locations. The residuals

are independently sampled normally distributed random variables with zero mean and unit variance. We simulate the process for

replications. For each replication, we apply the estimation procedure described in Section 3.To evaluate the performance of our model, we consider several criteria, which are described in more detail below. First, we compute the specificity and sensitivity of all elements of , but neglecting the diagonal elements. The specificity provides information about the amount of correctly identified missing links, i.e, the proportion of zero entries in the estimated matrix and the true underlying matrix . Let be the set of zero entries in a matrix , i.e., . Note that the complement of is . Thus, the specificity is given as ratio between the cardinalities of and , that is,

Surely, using only this criterion cannot yield meaningful results, as naive methods like setting all weights to 0 leads to a specificity of one, even though such a strategy creates a lot of false identifications for links, which were in fact not zero. Hence, the sensitivity

should also be considered to assess the performance for correctly identified positive weights.

To get an insight on the goodness of the estimated weights , we compute the average bias of the non-diagonal entries

It is important that the diagonal entries are excluded to avoid an unjustified improving of our results.

Furthermore, we examine the average bias and mean absolute errors for the estimated mean levels . To be precise, the average bias is defined as

0.925 | 0.910 | 0.943 | 0.911 | 0.888 | 0.944 | ||

0.278 | 0.183 | 0.186 | 0.446 | 0.283 | 0.246 | ||

0.001 | -0.001 | 0.001 | 0.002 | -0.001 | 0.000 | ||

-0.273 | -0.422 | -0.101 | -0.056 | -0.318 | -0.131 | ||

0.934 | 0.928 | 0.946 | 0.961 | 0.955 | 0.972 | ||

0.905 | 0.843 | 0.919 | 0.909 | 0.829 | 0.900 | ||

0.622 | 0.420 | 0.409 | 0.789 | 0.554 | 0.607 | ||

0.006 | 0.002 | 0.002 | 0.008 | 0.002 | 0.003 | ||

0.813 | 0.133 | 0.015 | 1.175 | 0.314 | 0.206 | ||

0.902 | 0.878 | 0.933 | 0.935 | 0.920 | 0.958 | ||

0.904 | 0.757 | 0.894 | 0.919 | 0.695 | 0.879 | ||

0.781 | 0.594 | 0.607 | 0.874 | 0.712 | 0.790 | ||

0.005 | 0.005 | 0.004 | 0.006 | 0.008 | 0.004 | ||

0.967 | 0.882 | 0.316 | 1.395 | 1.415 | 0.535 | ||

0.905 | 0.877 | 0.923 | 0.937 | 0.926 | 0.950 |

All results are reported in Table 1. First of all, it is possible to estimate the underlying spatial dependence for spatiotemporal autoregressive models, even under the presence of an undefined number of structural breaks. This is a big advantage compared to classical approaches, where it assumed that structure of the spatial dependence can be predefined by a certain spatial weights matrix. Generally, the proposed penalized regression method show a good performance for estimating this matrix . However, the performance slightly depends on the type of the true underlying spatial dependence structure. For matrices like the Queen’s contiguity matrix, where we have symmetric links, the algorithm can easily detect the relevant links. On average, 78.9 percent of all true links were detected for and . For matrices , which do not have symmetric entries, the performance is more difficult to evaluate. In particular, the algorithm rather estimates an undirected dependence, i.e., symmetric entries, than the true directed or one-way dependence. Hence, we observe high detection rates, sensitivities, for positive links in and , but moderate specificities, i.e., correctly eliminated links. However, this does not affect the average fitting of the weights, as one might see comparing the average bias and mean absolute errors, which is roughly for all cases.

The tendency to estimate symmetric spatial dependence can also be seen for the example matrices of one replication shown in Figure 2, where we depicted the estimates in the second row. Even though the algorithm detects the block structure in

, it also erroneously estimates its counterpart on the mirrored side of the main diagonal. Nevertheless, the weight of the wrongly classified link is usually weaker than the true one. Visually inspecting the estimated weighting matrices, one can also see why the performance is worse for the randomly sampled weights, i.e.,

, compared to the other matrices. The majority of the nonzero links are found by the algorithm, but it also estimates positive weights on the mirrored side. Thus, the specificity is on a moderate level and the mean absolute errors are slightly higher.The estimation precision of the local mean levels depends on the number of time points as well as on the extend of the spatial dependence. At the first glance, the direction of increasing precisions might seem counterintuitive and should, therefore, be analyzed in more detail. First, we observe that the estimation bias increases with increasing spatial dependence, i.e., with increasing . Due to higher values of , spatial spill over effects, measured by , become more severe. Hence, is inflated tremendously when increasing . As the distinction between and local means is the core difficulty of the estimation, the estimation precision weakens when is increasing. Secondly, we observe that a growing number of time points does not necessarily lead to an increasing in the estimation precision of . While the goodness of the estimation increases for the case of , both other cases shows a decreasing in estimation precision. This surprising effect can be explained by the fact that an increase in always corresponds to an multiple increase in the number of parameters, namely -times more parameters. The algorithm is potentially able to detect change points at each time point, so all time points are included as possible regressor. Hence, increasing time horizon does not improve the relation between observations and parameters and could, therefore, have a negative impact on the estimation of .

Like for the estimated spatial weights, we also show the resulting estimated mean level for one replication in Figure 3. In particular, we plot the true mean levels as black line as well as the estimates as red line. The underlying weighting scheme is a Queen’s contiguity matrix. Hence, the process in each plot is effected by its direct neighbors, meaning we kept the correct positions of the spatial locations in Figure 3. In general, the resulting mean level is estimated quite precisely. If we observe a deviation from the true level , e.g. , the difference is usually captured by the spatial dependence . For instance, location has an erroneously estimated change point after the first 50 observations, as the direct neighbors 14 and 15 exhibit a change point exactly at . As a consequence, the algorithm does not distinguished correctly between and . If the dependence is estimated better, e.g. for or , the estimated spatial mean level is much closer to the true level . In summary, it can be stated that the proposed estimation procedure can capture the dynamics of quite well, but it, of course, depends on the precision of the estimation of .

## 5 Real Data Example

Housing prices generally depend on the location of the property and the surrounding housing prices. In regions, which are more attractive and provide a better infrastructure, the prices for housing are of course higher than in regions, which are less developed. In larger cities, the spatial dependence between housing prices of different districts/quarters is not always only due to spatial proximity, but also due to good public transport connections, similar life styles, or cultural offers. All these effects are only hardly to measure or anticipate, as it would be necessary for classical spatial econometric models. Moreover, we typically observe that the price levels fluctuate over time. These fluctuations might be also due to changes in the legal framework, like changes of the property taxes or real-estate transfer taxes.

In the last section, we focus on these effects of regional housing prices. In particular, we analyze the condominium prices in all Berlin postal code regions from January 1995 to December 2014. The data are from the Berlin committee of valuation experts for real estate (*Gutachterausschuss*), which is informed about the prices for each real estate transaction. To create a panel data set with equidistant time points, we pooled all contracts for a specific area within one month. That means, we observe 240 monthly prices over a period of 20 years. The prices are given in Euro per , where it was already accounted for the currency changeover from Deutsche Mark to Euro in 2002. Regarding the spatial resolution, we define the locations by the respective ZIP-Codes, where only the first three digits are used. In total, there are 24 different unique 3-digit ZIP-Codes for the city of Berlin, which are constant over the whole time period. All ZIP-Codes are corresponding to exactly one unique self-contained area, except for the ZIP-Code 140xx consisting of two distinct spatial polygons. Unfortunately, there are more than 120 months with no valid contract data for the ZIP-code area 136xx, so we exclude this area from our analysis. To gain numerical stability, the remaing 23 time series are transformed by a normal probability integral transformation, as for instance done by Uniejewski et al. (2018) for electricity prices. This ensures that the underlying series is normally distributed for our two-pass method, i.e., it does not exhibit extreme price spikes, e.g. due to sparse data for several months and/or areas. After the calculation of the relevant features, it is always possible to re-transform the series into the original price domain.

To estimate and the mean changes implied by , we use the proposed method described in Section 3. We, however, add one further assumption, namely that no change in occurs within the last 5 percent of the time points, i.e., all observations in 2014. For instance, the algorithm would have only one time instant to estimate , two time instants for , and so on. Hence, this restriction guarantees that there are at least 13 time points to estimate each mean level .

Figure 4 depicts the real estate prices after applying the normal transformation for all spatial locations as time-series plots. Obviously, the prices show different patterns depending on the ZIP code area. Whereas the mean stays constant over the considered time period for some locations, e.g. 102xx, 103xx, or 134xx, there is a clear decrease and increase of the prices for other locations, like 125xx, 130xx, or 120xx. In particular, we observe declining prices for the first 8-10 years of our study and raising prices in the recent 10 years. The estimated coefficients are reported in Figure 4 as red lines. It is worth noting at this point that these estimated coefficients do not correspond to the estimated mean levels depicted by the blue curves. Hence, should be interpreted as regional baseline prices, or regional mean, without accounting for spillover effects from neighboring areas. Deviations of the overall mean can, therefore, be associated with higher/lower prices spilling over from other ZIP code areas, which are for some reasons connected, e.g. because of spatial proximity, similar life style, culture etc.

Further insights on these connections can be gained from the estimated spatial weighting matrix , which is shown in Figure 5. The entries represent how a location is affected the locations , where darker colors represent stronger influences. Links, which are estimated to be zero, are not colored. Furthermore, we highlight all mutual connections, i.e., influence and vice versa, with an asterisk. These two-way relations must not be necessarily equal, meaning and are possibly different. Moreover, there are various links, which are estimated to be directed or one-sided, e.g. 141xx to 140xx.

To give an impression on the location of highly influencing or affected areas, we show the total number of outgoing and incoming links in Figures 6 and 7. For the regions with the largest number of outgoing/incoming links, we additionally plot the weights as arrows, where the line width corresponds to estimated value of the weight. It is not surprising that the region influencing more areas than all others has the ZIP code 107xx, which covers large parts of Berlin-Mitte and Berlin-Charlottenburg. This area is basically the city center of the former West-Berlin. It is affecting nearly all urban areas of the city. There are no links to Berlin-Köpenick (125xx) and Berlin-Steglitz (141xx), and Berlin-Pankow (131xx), which are rather outer districts with independent quarter centers. On the contrary, the region, which is influenced the most by other districts, is located in the north of Berlin. The postal code 134xx covers mostly areas of Berlin-Reinickendorf including the airport Berlin-Tegel (TXL). This region is also characterized by forests and waters as well as exclusive residential areas.

In summary, we can state that the algorithm captured the structure of the data in a considerably well manner. In addition, we gain new insights into spatial relationships, which could not be detected by standard distance-based models. In particular, we show that the prices of condominiums do not only depend on direct neighbors, but also areas, which are farther away. This dependence is also not diminishing with an increasing distance, how it is often assumed in spatial econometrics. Thus, the proposed modelling approach is able to capture other latent factors, like socio-economic, cultural, life-style factors. Moreover, the estimated regional mean levels show that there are severe change points and temporal dynamics in the real-estate prices. In particular, the temporal pattern differs between the locations, such that the changes does not necessarily occur at the same points of time for each area.

## 6 Conclusion

A common and well-known issue in spatial econometrics is to find a suitable spatial weighting matrix. This matrix defines the structure of the spatial dependence and is usually unknown when applying the model to real data. Although one might get insights on the spatial dependence structure by the spatial covariogram for instance, irregular dependencies would still be undetected. Because of the curse of dimensionality, the spatial weights are typically not estimated, but replaced by predefined weights multiplied with unknown scalar(s) to be estimated.

We addressed this important issue of spatial econometrics by proposing a penalized regression approach to estimate the entire spatial weights matrix. In addition, we account for possible structural breaks within the panel data, which effect both the mean level of the location, where the change point occurred, as well as mean level of all other locations. The proposed estimation procedure is a two-step approach, where the dimension of the parameter space is reduced in a first step. Further, the full model including the spatial dependence and the changes in the mean level are estimated simultaneously. Although only a simple convex optimization must be done to obtain the parameter estimates, the global minimum might be close to regions where the objective function is not well defined for spatial dependence close to non stationarity. Further research should, therefore, also focus on constrained lasso estimators. The constraint could be based on the spectral radius or any other matrix norm of the estimated weighting matrix.

We analyzed the performance of this approach by an extensive simulation study. Generally, the procedure works very well for small, medium, and large spatial dependence. Both sensitivity and specificity for detecting spatially dependent locations are reasonable large and changes at different time points for different locations can be detected. However, the estimation procedure tends to estimate symmetric spatial dependence structures, where the respective weights are not necessarily equal. More precisely, we mean symmetry in the positive entries, i.e., if a weight then also should be positive. Thus, the approach works better for data with this kind of symmetric spatial dependence.

Finally, we illustrated the proposed two-step lasso approach by an empirical example of Berlin real estate prices. On the one hand side, we see that the spatial dependence differs from classically applied dependence schemes, like contiguity matrices. In particular, the postal code areas, which we considered as spatial units, depend not only on direct neighbors, but also on areas, which are farther away but possibly have a good public transport connections and similar life styles etc. On the other hand, the mean level of the prices changes over time, which is modeled by the several change points. The regional means, which should be interpreted as prices without accounting for different prices in the neighboring areas, are different for all spatial locations. For some areas, the price level is more or less constant over time; for other areas, we observe declining prices in the first period of our study and rising prices in the second one. Moreover, it is possible to distinct between regional mean levels and overall mean levels, meaning one can see if the natural prices are increased or decreased due to differing price levels in surrounding areas.

## References

- Ahrens and Bhattacharjee (2015) Ahrens, A. and Bhattacharjee, A. (2015). Two-step lasso estimation of the spatial weights matrix. Econometrics, 3(1):128–155.
- Anselin et al. (1996) Anselin, L., Bera, A. K., Florax, R., and Yoon, M. J. (1996). Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26(1):77–104.
- Bhattacharjee and Jensen-Butler (2013) Bhattacharjee, A. and Jensen-Butler, C. (2013). Estimation of the spatial weights matrix under structural constraints. Regional Science and Urban Economics, 43(4):617–634.
- Büttner (1999) Büttner, T. (1999). Determinants of tax rates in local capital income taxation: a theoretical model and evidence from germany. FinanzArchiv/Public Finance Analysis, pages 363–388.
- Chan et al. (2014) Chan, N. H., Yau, C. Y., and Zhang, R.-M. (2014). Group lasso for structural break time series. Journal of the American Statistical Association, 109(506):590–599.
- Cressie (1993) Cressie, N. (1993). Statistics for Spatial Data. Wiley.
- Cressie and Wikle (2011) Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-Temporal Data. Wiley.
- Deng (2008) Deng, M. (2008). An anisotropic model for spatial processes. Geographical Analysis, 40(1):26–51.
- Efron et al. (2004) Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression. The Annals of Statistics, 32(2):407–499.
- Elhorst (2010) Elhorst, J. P. (2010). Applied Spatial Econometrics: Raising the Bar. Spatial Economic Analysis, 5(1):9–28.
- Fassò (2013) Fassò, A. (2013). Statistical assessment of air quality interventions. Stochastic environmental research and risk assessment, 27(7):1651–1660.
- Friedman et al. (2010) Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.
- James et al. (2012) James, G. M., Paulson, C., and Rusmevichientong, P. (2012). The constrained lasso. In Refereed Conference Proceedings, volume 31, pages 4945–4950. Citeseer.
- James et al. (2018) James, G. M., Paulson, C., and Rusmevichientong, P. (2018). Penalized and constrained optimization: An application to high-dimensional website advertising. Discussion paper.
- Kelejian and Prucha (1998) Kelejian, H. H. and Prucha, I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1):99–121.
- Kelejian and Prucha (1999) Kelejian, H. H. and Prucha, I. R. (1999). A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. International Economic Review, 40:509–533.
- Lam and Souza (2016) Lam, C. and Souza, P. C. (2016). Detection and estimation of block structure in spatial weight matrix. Econometric Reviews, 35(8-10):1347–1376.
- Lam et al. (2013) Lam, C., Souza, P. C., et al. (2013). Regularization for spatial panel time series using the adaptive lasso. Journal of Regional Science.
- Lee (2004) Lee, L.-F. (2004). Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autoregressive Models. Econometrica, 72(6):1899–1925.
- Lee et al. (2018) Lee, S., Liao, Y., Seo, M. H., and Shin, Y. (2018). Oracle estimation of a change point in high-dimensional quantile regression. Journal of the American Statistical Association, 0(0):1–11.
- Ross et al. (2007) Ross, Z., Jerrett, M., Ito, K., Tempalski, B., and Thurston, G. D. (2007). A land use regression for predicting fine particulate matter concentrations in the new york city region. Atmospheric Environment, 41(11):2255–2269.
- Stakhovych and Bijmolt (2009) Stakhovych, S. and Bijmolt, T. H. (2009). Specification of spatial models: A simulation study on weights matrices. Papers in Regional Science, 88(2):389–408.
- Tsai et al. (2009) Tsai, P.-J., Lin, M.-L., Chu, C.-M., and Perng, C.-H. (2009). Spatial autocorrelation analysis of health care hotspots in taiwan in 2006. BMC Public Health, 9(1):464.
- Uniejewski et al. (2018) Uniejewski, B., Weron, R., and Ziel, F. (2018). Variance stabilizing transformations for electricity spot price forecasting. IEEE Transactions on Power Systems, 33(2):2219–2229.
- Ver Hoef et al. (2018) Ver Hoef, J. M., Hanks, E. M., and Hooten, M. B. (2018). On the relationship between conditional (CAR) and simultaneous (SAR) autoregressive models. Spatial Statistics, 25:68–85.
- Zou (2006) Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.