1 Introduction
Local projections introduced by Jordà (2005) provide a statistical framework that accounts for the relationship between an exogenous variable and an endogenous variable, measured at different time points. Typical applications of local projections include impulse response analyses and direct (noniterative) forecasting (Stock and Watson, 2007). A local projection has several advantages over standard methods, such as vector autoregressions (VARs). First, it does not impose a strong assumption on the datagenerating process, making it robust to misspecification. Second, it can easily deal with asymmetric and/or statedependent impulses. On the other hand, local projections do have several disadvantages. First, when using a local projection, the exogenous variable must be identified beforehand. Second, a local projection is statistically less efficient than other methods, and typically obtains a wiggly impulse response function, (e.g., Ramey, 2016
). In an impulse response analysis, a shape of an estimated impulse response function is of concern.Therefore, if an estimated impulse response function is wiggly and has wide confidence/credible intervals, it is difficult to interpret the result, and one might reject or accept a hypothesis wrongly. In this study, we address the second disadvantage of local projection.
In order to improve the statistical efficiency, we develop a fully Bayesian approach that can be used to estimate local projections using roughness penalty priors as well as Bspline basis expansions. The proposed priors, which are adapted from Bayesian splines (Lang and Brezger, 2004), are generated from an intrinsic Gaussian Markov random field; that is, they induce randomwalk behavior on a sequence of parameters. By incorporating such priorinduced smoothness, we can use information contained in successive observations to enhance the statistical efficiency of an inference. We compare the proposed approach with the existing approaches through a series of Monte Carlo experiments. The approach is applied to an analysis of monetary policy shocks in the United States to show how the roughness penalty priors successively smooth impulse responses and improve the statistical efficiency in terms of predictive accuracy. Furthermore, we show that such improvements are almost entirely owing to the roughness penalty priors, and not to the Bspline expansions.
There are three strands of studies related to this work. First, Barnichon and Matthes (forthcoming) approximate a moving average representation of a time series using values from Gaussian basis functions. Their approximation is simpler, but rather coarser than ours. As a result, their estimated impulse responses may be excessively smoothed and vulnerable to model misspecification. Second, to smooth an impulse response estimate, MirandaAgrippino and Ricco (2017) penalize the estimate based on deviations from an impulse response derived from an estimated VAR. However, their approach seems not to work well in cases with asymmetric and/or statedependent impulse responses (e.g., RieraCrichton et al., 2015). Furthermore, their approach uses a same data twice. This shortcoming can be resolved if a time series is long enough to be split into training and estimation samples, but it is not the general situation in macroeconomic studies. In contrast, our approach does not require a reference model, being free from these problems. Third the most relevant studies are those of Barnichon and Brownlees (forthcoming) and ElShagi (forthcoming), who develop maximum likelihood methods using roughness penalties. Although our approach can be regarded as a Bayesian counterpart of their works, it confers additional benefits. In our approach, smoothing parameters are inferred from priors, rather than being prefixed, which means we can systematically consider uncertainty in the smoothness of an impulse response. Barnichon and Brownlees (forthcoming) choose a smoothing parameter via crossvalidation, while ElShagi (forthcoming) determine smoothing parameters on the basis of some information criteria. Our approach is more flexible than Barnichon and Brownlees (forthcoming) : they let a single parameter control the smoothness of all parameter sequences, whereas we can assign different smoothing parameters to individual sequences.
The rest of the paper is organized as follows. Section 2 introduces the model, the priors and the posterior simulation. Section 3 conduct a set of Monte Carlo experiments and report the result. Section 4 demonstrates our approach in an analysis of the macroeconomic effects of monetary policy shocks in the United States. Section 5 concludes.
2 Proposed Approach
We consider two classes of local projections: those with and without Bspline expansions.
2.1 Local Projection without Bspline expansions
2.1.1 Model
We begin by describing a local projection (Jordà, 2005). Although we consider only time series data, an extension to panel data is straightforward. A model for an individual observation is given by
where is a set of projection points such that , is an endogenous variable observed at period , is an intercept, is an exogenous variable observed at period , are covariates, which may include lags of the endogenous and exogenous variables, and are unknown coefficients, and is a residual. The model allows asymmetric and/or statedependent impulse responses, as in RieraCrichton et al. (2015), Auerbach and Gorodnichenko (2013), and Ramey and Zubuairy (forthcoming). Precise definition of and depends on whether the model is used for an impulse response analysis or forecasting. For the former task, denotes a response observed periods after shock occurs at ; for the latter, is one among predictors observed at , and is an periodahead target observation. The following text focuses on the impulse response analysis.
We seek to infer a sequence , representing an impulse response of to , namely, . The model can be represented as
(1) 
where is a vector of regressors, and is a vector of corresponding parameters. For notational convenience, we reindex the coefficient vector as . Stacking these over the projection dimension yields a representation resembling a seemingly unrelated regression (SUR): for ,
(2) 
where is a covariance matrix, and denotes the Kronecker product.
Rearranging the above representation, we express the model in matrix notation as
(3) 
where
denotes a multivariate normal distribution with mean
and covariance . Letting denote the data, the likelihood takes a standard form:(4)  
(5) 
where is a matrix composed of the realized residuals.
2.1.2 Bayesian Inference
This section first discusses priors on the subsets of and then assembles them into a prior on , followed by a description of the priors on the other parameters. Lastly, the posterior simulation method is discussed.
We introduce a class of roughness penalty priors for , . Our prior construction is motivated by Lang and Brezger (2004). The prior induces an thorder randomwalk behavior on a sequence of parameters . When , the relationship between and successive parameters is represented by
for , where and are global and local smoothing parameters, respectively. Controlling local smoothness is potentially beneficial, because an impulse response function often has both strongly bent and smooth areas, for example, fastgrowing responses just after an occurrence of shock, and virtually flat responses after convergence to a longrun equilibrium. In some applications, without the adaptation for local smoothness, an estimated impulse response might be oversmoothed in some areas and undersmoothed in others. A prior on is an improper normal prior generated by an intrinsic Gaussian Markov random field (Rue and Held, 2005), and the smoothing parameters are inferred from gamma priors, unlike in existing approaches (MirandaAgrippino and Ricco, 2017) and Barnichon and Matthes (forthcoming). The hierarchy of the prior takes the form
(7) 
(8) 
where , is an by difference matrix of order ,
denotes a gamma distribution with shape
and rate (and, thus, with meanand variance
), and , , , andare prefixed hyperparameters. Assembling the priors on the subsets of
, the prior density for conditional on and is represented as(9) 
where is a byzero matrix with th diagonal element is replaced by one. In turn, we employ an inverse Wishart prior for with scale and degrees of freedom, . In what follows, the above prior is referred to as an adaptive roughness penalty (ARP) prior. As a special case, the same prior with all local smoothing parameters set to one is called a nonadaptive roughness penalty (NRP) prior. For the NRP prior, 9 can be rewritten for computational convenien
A posterior simulation is conducted using the Markov chain Monte Carlo (MCMC) algorithm. Because all of the conditional posterior densities are standard, we can construct a block Gibbs sampler. Each sampling block is specified as follows.
Sampling
The conditional posterior density of is given by the multivariate normal distribution:
(10)  
(11) 
This block is computationally demanding, and contains are two bottlenecks. The first is calculation of the prior precision matrix, which involves repeated highdimensional matrix multiplications, Eq. (9). Computational cost declines significantly by treating as a sparse matrix. The second bottleneck is the inversion of . For speed and numerical stability, we apply the algorithm described in Section 2 of Rue (2001) (see Algorithm 1), which exploits a banded structure of ; inverting of a lowertriangular Cholesky root of , (denoted by ), is faster and more numerically stable than inverting itself.
Step 1. Sample .
Step 2. Solve to obtain .
Step 3. Solve to obtain .
Step 4. Solve to obtain .
Step 5. Set .
Sampling and
The conditional posteriors of the smoothing parameters for , , are specified as the following gamma distributions: for ,
(12) 
(13) 
Sampling
The conditional posterior of is the inverse Wishart distribution
(14) 
2.2 Local projection with Bspline expansions
We consider a local projection with Bspline expansions as an additional smoothing device. We intend to infer a smooth function that represents an impulse response of to , namely, , and to approximate using a Bspline basis function expansion over a projection horizon ^{1}^{1}1See, for example, Eilers and Marx (1996) for a detailed description of Bsplines.
where is a number of knots, is a vector of coefficients, and is a vector of Bspline basis functions. We define approximations of the other coefficientsin a similar fashion. Given the approximation, the model is represented as
where is a vector of regressors, and is a vector of corresponding parameters. We reindex for expositional convenience. Letting , the model can be expressed as
Stacking these over the projection dimension yields a representation à la SUR:
for . Rearranging the above representation delivers
(15) 
We construct prior and posterior simulator for the model are in a fasion similar to the model without Bspline expansions. Given the same priors for and , a prior on is constructed as
(16) 
The conditional posterior density of is derived as the multivariate normal distribution
(17)  
(18) 
As in a model without Bspline expansions, this sampling block presents a major computational burden. On the one hand, the prior precision matrix can be calculated easily by virtue of its block diagonal structure (16) (unless the number of covariates is not extremely large). On the other hand, the quantity in (18) involves highdimensional matrix multiplication, and cannot be compressed as in (11), eventually making the posterior simulation more demanding than the previous case. Sampling distributions of the other parameters are derived analogously to those of a model without Bspline expansions, in (12), (13) and (14).
2.3 Comparison with the approaches
Recent frequentist approaches to estimate smooth impulse responses tie closely to ours, in that their objective functions have forms similar to the posterior densities we present (i.e., the sum of a Gaussian likelihood and a roughness penalty term). From this perspective, ElShagi (forthcoming) is a frequentist version of a model using NRP priors and no Bspline. ElShagi’s (forthcoming) estimator can be written in our notation as
Similarly, Barnichon and Brownlees (forthcoming) is a frequentist counterpart to our approach with both Bspline expansions and roughness penalty priors. Their objective function is written in our notation as
Barnichon and Brownlees’s (forthcoming) approach bears only one (scholar) smoothing parameter , rendering it less flexible than ours. Our fully Bayesian approach differs from other contemporary frequentist approaches in our treatment of smoothing parameters. ElShagi (forthcoming) determines on the basis of the finite sample corrected Bayesian information criterion modified with the effective loss of degree of freedom. Barnichon and Brownlees (forthcoming) propose to select using fold cross validation. In contrast, our proposed approach infers smoothing parameters using priors, letting us systematically consider uncertainty in the smoothness of an impulse response.
3 Simulation Study
We conducted Monte Carlo simulations to investigate performance of our proposed approaches. We considered six specifications consisting of the combination of three priors, each with/without Bspline expansions. As with the NRP and ARP priors, we considered a weakly informative independent standard normal prior, . Using this prior enables us, to some degree, to discriminate between the smoothing and shrinkage roles of the roughness penalty priors. A specification with the normal prior and no Bspline can be thought of as a Bayesian version of a standard local projection (Jordà, 2005). As mentioned in the previous section, a local projection with the NRP prior and no Bspline corresponds to ElShagi (forthcoming), whereas a local projection with the NRP prior and Bspline corresponds to a Bayesian version of Barnichon and Brownlees (forthcoming).
First, we considered a linear data generating processes (DGPs) specified by this moving average representation:
where is an endogenous variable, is an exogenous variable, and is a measurement error. A set of parameters represents an impulse response. True parameter values are defined as a convex curve:
where
denotes a uniform distribution with support
, and governs where the peak of the impulse response is located. Covariates are a constant, and four lags of and . We fixed the length of the impulse response , and the effective sample size . Hyperparameters are , , , and . We choose the order of the difference matrix as , implying that the sequences of parameters to be inferred are induced to straight lines. We use the Bspline basis with equidistant knots ranging from towith unitary increments. We set the degree of Bspline to three. We generate 500 sets of synthetic data. Gibbs samplering obtained 40,000 posterior draws, after discarding the initial 10,000. Each chain is initialized to an ordinary least square estimate.
We compared approaches on the basis of four performance measures: mean squared errors (MSE), coverage probability (Coverage), lengths of credible intervals (Length), and computational speed (Speed). MSE is the sum of mean squared errors,
where denotes a posterior mean estimate of in the th experiment, denotes the corresponding true value, and is a total number of experiments. Coverage is the arithmetic mean of the probability that the true value is within 90% credible interval:
where and are posterior 5 and 95 percentile estimates of in the th experiment. Length denotes the arithmetic mean of the lengths of 90% credible interval,
Speed is the mean computational time of posterior simulations in seconds. ^{2}^{2}2All programs were written in Matlab 2016a (64 bit), and executed on an Ubuntu Desktop 16.04 LPS (64 bit), running on Intel Xeon E52607 v3 processors (2.6GHz).
Table 1 reports results of the first experiment. With regard to MSE, the NRP and ARP priors outperform the normal prior, while the NRP and ARP priors are comparable. Measures of Length and Coverage reveal a biasvariance tradeoff: NRP and ARP priors tend toward shorter Length than the normal prior, but Coverage tends to warsen. Speed depends on either prior specification or whether a Bspline is used. The difference attributable to the former is not notably large, but the use of a Bspline does impose a significant computational burden. When Bspline expansions are employed, approximately 95% of computational time during each MCMC cycle is spent calculating , in particular quantity . This bottleneck is a simple matrixmatrix multiplication which is executed via a builtin mathematical routine of Matlab, so switching to a compiled language such as Fortran and C/C++ would not totally resolve the problem.
The marginal effects of the Bspline function expansion is unsurprising, considering that response variables can appear as functional data observed on an equally spaced grid.
^{3}^{3}3From this point of view, local projections are similar to functional data models such as Guo (2002); Morris and Carroll (2006). Panel (a) in Figure 1 displays Bspline basis functions on a fine grid (2,401 points), and simulated functional data. This situation is presumed in functional data analysis. In a local projection, however, observation pointsare sparse and invariant, as demonstrated in panel (b). As is evident from there, Bspline expansions merely allocate observed information to the fixed grids, rather than interpolate neighboring information. In our case, observed information for a single grid point is allocated to neighboring grid points with weights {1/6, 3/2, 1/6}.
^{4}^{4}4When the degree of Bspline is increased to 5, the weight set becomes {1/120, 13/60, 33/60, 13/60, 1/120}. The added weights are too small to affect the estimate . Therefore, using a Bspline does indeed smooth estimates of impulse responses, but its effectiveness is limited.We fortified our results by considering nonlinear DGPs characterized by asymmetric and statedependent impulse responses, respectively. The asymmetric DGP is specified by
while the statedependent DGP is specified by
where denotes the indicator function. For both cases, two sets of parameters are independently generated in the same way as the linear DGP. We set . Other settings are exactly the same as those in the experiment using linear DGP. For computational reasons, we did not consider a model with Bspline expansions. Table 2 presents the result, in which we verified the result of the first experiment. With regard to MSE and Length, the NRP and ARP priors consistently improve accuracy versus the normal prior. Coverage values obtained using the NRP and ARP priors are shorter than those using the normal prior.
We investigated the sensitivity of inference to the choice of hyperparameter . We considered for models with the NRP prior and no Bspline. Table 3 shows the results, wherein two items are noteworthy. First, as long as is set between 0.1 and 0.0001, an estimation using the NRP prior is more efficient than one using the normal prior. Second, a biasvariance tradeoff is again evident: as increases (i.e., shrinkage toward 0), an estimator becomes more efficient but less robust. Recent studies of Bsplines propose several methods for choosing the hyperparameter systematically (Jullion and Lambert, 2007; Klein and Kneib, 2016; Ventrucci and Rue, 2016), but applying these ideas in our context remains for future research.
4 Application
To demonstrate our model, we applied our approach to an analysis of monetary policy in the United States. We use monetary policy shocks compiled by Coibion et al. (2017) which is an update of Romer and Romer (2004)^{5}^{5}5The time series of monetary policy shocks is from Yuriy Gorodnichenko’s website (https://eml.berkeley.edu//~ygorodni/index.htm). . For the covariates and the response, we consider the following three macroeconomic variables, downloaded from the Federal Reserve Economic Data (FRED), maintained by the Federal Reserve Bank of St. Louis: the industrial production index (FRED mnemonic: INDPRO), the consumer price index for all urban consumers: all items (CPIAUCSL), and the effective federal funds rate (FEDFUNDS). We also treated all three as response variables. We included lags of monetary policy shock as covariates. All data are monthly and spans from March 1969 to December 2008. The range is limited by the availability of data for monetary policy shocks. Industrial production and the inflation rate are seasonally adjusted, and included as annualized monthtomonth percentage changes (logdifference multiplied by 1,200). We included the time trend and up to four lags oi covariates. We choose hyperparameters as in the previous section. Gibbs sampler obtained a total of 40,000 posterior draws, after discarding the first 10,000.
Figures 2–4 show posterior estimates of the impulse responses of the macroeconomic variables to monetary policy shocks under different specifications. Comparing the first and second columns in each figure, irrespective of response variable, reveals that the roughness penalty priors successfully penalize the roughness of the impulse response functions. Thus, we obtain economically plausible, smoothed estimates, and can interpret the shape of the impulse response easily, recognizing the underlining reponse. Comparing the first and second rows reveals that use of the Bspline exerts no significant effect on the shape of the impulse response.
We then compared the fitness of these estimates based on the deviance information criterion (DIC) (Spiegelhalter et al., 2002) and the Watanabe–Akaike information criterion (WAIC) (Watanabe, 2010). The upper portion of Table 4 reports on both criteria for different specifications (reported values are on the deviance scale; the smaller, the better). Specifications including the roughness penalty priors outperform the normal prior in predictive accuracy regardless of the fitness measure,but the use of a Bspline does not always yield improvement. Generally both Bspline and the roughness penalty prior enhance fitness, but almost all improvements originate from the latter. This finding supports our simulation results. When Bspline is not used, the posterior simulation takes 26 minutes to generate 50,000 draws; when it is used, the same simulation takes 66 hours. For many applications, the gain in efficiency from use of a Bspline would not justify the higher computational cost.
5 Conclusion
This study developed a fully Bayesian approach to estimateing local projections using roughness penalty priors. It is also considered a specification involving a Bspline basis function expansion. Monte Carlo experiments demonstrated that both Bsplines and the roughness penalty priors improve statistical efficiency, however, almost all the improvements originatefrom the latters. Applying our proposed method to an analysis of monetary policy in the United States showed that the roughness penalty priors successfully smooth posterior estimates of the impulse response functions, and can improve predictive accuracy of local projections.
This study addresses with one of the two disadvantages of local projections, compared with the standard statistical framework that includes vector autoregressions, namely that of less statistical efficiency. The other disadvantage that the exogenous variable is identified exante can be resolved by a twostage regression approach, as in Aikman et al. (2016). Constructing a Bayesian counterpart of this line of approach is conceptually straightforward, but computationally challenging. In addition, it is potentially beneficial to develop more robust approaches than ours: for example, a choice of hyperparameters, heteroskedasticity and autocorrelations in errors, knot selection, and so on. This study provides a first step for further developments of Bayesian local projections.
References
 (1)
 Aikman et al. (2016) Aikman, D., O. Bush, and A. M. Taylor (2016) “Monetary Versus Macroprudential Policies: Causal Impacts of Interest Rates and Credit Controls in the Era of the UK Radcliffe Report,” NBER Working Paper 22380.
 Auerbach and Gorodnichenko (2013) Auerbach, A. J. and Y. Gorodnichenko (2013) “Fiscal Multipliers in Recession and Expansion,” in A. Alesina and F. Giavazzi eds. Fiscal Policy after the Financial Crisis: University of Chicago Press, 63–98.
 Barnichon and Matthes (forthcoming) Barnichon, R. and C. Matthes (forthcoming) “Functional Approximations of Impulse Responses,” Journal of Monetary Economics.
 Barnichon and Brownlees (forthcoming) Barnichon, R. and C. Brownlees (forthcoming) “Impulse Response Estimation By Smooth Local Projections,” Review of Economics and Statistics.
 Coibion et al. (2017) Coibion, O., Y. Gorodnichenko, L. Kueng, and J. Silvia (2017) “Innocent Bystanders? Monetary Policy and Inequality,” Journal of Monetary Economics, Vol. 88, No. 1, 70–89.
 Eilers and Marx (1996) Eilers, P. H. and B. D. Marx (1996) “Flexible Smoothing with Bsplines and Penalties,” Statistical Science, Vol. 11, No. 2, 89–121.
 ElShagi (forthcoming) ElShagi, M. (forthcoming) “A Simple Estimator for Smooth Local Projections,” Applied Economics Letters.
 Guo (2002) Guo, W. (2002) “Functional mixed effects models,” Biometrics, Vol. 58, No. 1, 121–128.
 Jordà (2005) Jordà, Ò. (2005) “Estimation and Inference of Impulse Responses Local Projections,” American Economic Review, Vol. 95, No. 1, 161–182.
 Jullion and Lambert (2007) Jullion, A. and P. Lambert (2007) “Robust Specification of the Roughness Penalty Prior Distribution in Spatially Ddaptive Bayesian Psplines Models,” Computational Statistics and Data Analysis, Vol. 51, No. 5, 2542–2558.
 Klein and Kneib (2016) Klein, N. and T. Kneib (2016) “Scaledependent Priors for Variance Parameters in Structured Additive Distributional Regression,” Bayesian Analysis, Vol. 11, No. 4, 1071–1106.
 Lang and Brezger (2004) Lang, S. and A. Brezger (2004) “Bayesian Psplines,” Journal of Computational and Graphical Statistics, Vol. 13, No. 1, 183–212.
 MirandaAgrippino and Ricco (2017) MirandaAgrippino, S. and G. Ricco (2017) “The Transmission of Monetary Policy Shocks,” Staff Working Paper 657, Bank of England.
 Morris and Carroll (2006) Morris, J. S. and R. J. Carroll (2006) “Waveletbased functional mixed models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, No. 2, 179–199.
 Ramey (2016) Ramey, V. A. (2016) “Macroeconomic Shocks and Their Propagation,” in J. B. Taylor and H. Uhlig eds. Handbook of Macroeconomics, Vol. 2A: Elsevier, Chap. 2, 71–162.
 Ramey and Zubuairy (forthcoming) Ramey, V. A. and S. Zubuairy (forthcoming) “Government Spending Multipliers in Good Times and in Bad: Evidence from U.S. Historical Data,” Journal of Political Economy.
 RieraCrichton et al. (2015) RieraCrichton, D., C. A. Vegh, and G. Vuletin (2015) “Procyclical and Countercyclical Fiscal Multipliers: Evidence from OECD Countries,” Journal of International Money and Finance, Vol. 52, 15–31.
 Romer and Romer (2004) Romer, C. D. and D. H. Romer (2004) “A New Measure of Monetary Shocks: Derivation and Implications,” American Economic Review, Vol. 94, No. 1, 1055–1084.
 Rue (2001) Rue, H. (2001) “Fast Sampling of Gaussian Markov Random Fields,” Journal of the Royal Statistical Society, Series B, Vol. 63, No. 2, 325–338.
 Rue and Held (2005) Rue, H. and L. Held (2005) Gaussian Markov Random Fields: Theory and Applications: CRC press.
 Spiegelhalter et al. (2002) Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. Van Der Linde (2002) “Bayesian Measures of Model Complexity and fit,” Journal of the Royal Statistical Society, Series B, Vol. 64, No. 4, 583–639.
 Stock and Watson (2007) Stock, J. H. and M. W. Watson (2007) “Why Has US Inflation Become Harder to Forecast?” Journal of Money, Credit and Banking, Vol. 39, No. s1, 3–33.
 Ventrucci and Rue (2016) Ventrucci, M. and H. Rue (2016) “Penalized complexity priors for degrees of freedom in Bayesian Psplines,” Statistical Modelling, Vol. 16, No. 6, 429–453.

Watanabe (2010)
Watanabe, S. (2010) “Asymptotic Equivalence of Bayes Cross Validation and
Widely Applicable Information Criterion in Singular Learning Theory,”
Journal of Machine Learning Research
, Vol. 11, No. Dec, 3571–3594.