1. Introduction
Derivatives with early exercise features are popular, with American and Bermudanstyle options being the most common types. Nonetheless, the pricing of these options is a difficult problem in the absence of closedform solutions, even in the simplest case of valuing American options on a single asset. Researchers have thus developed various numerical methods for pricing that largely fall into two categories: the latticebased and simulationbased approaches.
In the latticebased approach, pricing is performed on a dense lattice in the state space by valuing the options at each point of the lattice using suitable boundary conditions and the mathematical relations among neighboring points. Examples include the finite difference scheme (Brennan and Schwartz, 1977), binomial tree (Cox et al., 1979) and its multidimensional generalizations (Boyle, 1988; Boyle et al., 1989; He, 1990)
. These methods are known to work well in lowdimensional problems. However, they become impractical in higherdimensional settings, mainly because the lattice size grows exponentially as the number of state variables increases. This phenomenon is commonly referred to as the curse of dimensionality.
In the simulationbased approach, the state variables are randomly drawn from the underlying asset processes. Therefore, this simulation technique paves the way to solve the aforementioned computational challenges. The price is calculated as the average of the option values over simulated paths, each of which represents a future realization of the state variables with respect to the riskneutral measure. This entails finding the optimal exercise rules, for which a group of simulationbased methods propose various approaches for estimating the continuation values as conditional expectations. Equipped with stopping time rules, they calculate the option price by solving a dynamic programming problem whose Bellman equation is essentially the comparison between the continuation values and exercise values.
The randomized tree method (Broadie and Glasserman, 1997) estimates the continuation value at each node of the tree as the average discounted option values of its children. This nonparametric approach is of the most generic type, but its use is limited in scope because the tree size still grows exponentially in the number of exercise times. The stochastic mesh method (Broadie and Glasserman, 2004) overcomes this issue by using the mesh structure in which all the states at the next exercise time are the children of any state at the current exercise time. The conditional expectation is computed as a weighted average of the children, where the weights are determined by likelihood ratios. Regressionbased methods (Carriere, 1996; Tsitsiklis and Van Roy, 2001; Longstaff and Schwartz, 2001) use regression techniques to estimate the continuation values from the simulated paths. Such regression approaches are computationally tractable, as they are linear not only in the number of simulated paths, but also in the number of exercise times.^{1}^{1}1This implicitly assumes that the number of regressors is constant. Such an assumption is reasonable because the number of regressors is usually much smaller than the number of Monte Carlo paths. Fu et al. (2001) and Glasserman (2013) provide an excellent review of the implementation and comparison of simulationbased methods. Among these variations, conducting the least square Monte Carlo (LSM) algorithm proposed by Longstaff and Schwartz (2001) is standard practice for valuing options with early exercise features because of its simplicity and efficiency, which makes it a strong choice from a practical standpoint.
Simulationbased estimators are biased by two main sources: low and high. Low bias is related to a suboptimal exercise decision owing to various approximations in the method. For this reason, it is also called suboptimal bias. For example, regressing with finite basis functions cannot fully represent the conditional payoff function and the price estimated from finite simulation paths contains noise. As a result, the exercise policies are suboptimal and therefore lead to a lower option price.
High bias is due to the positive correlation between exercise decisions and future payoffs; the algorithm is more likely to continue (exercise) precisely when future payoffs are higher (lower). This results from sharing the simulated paths for the exercise policy and payoff valuation. In simple terms, high bias is overfitting from the undesirable use of future information. For this reason, it is called lookahead or foresight bias. These two sources of bias are opposite in nature as well as in direction. Low bias is intrinsic to numerical methods, whereas high bias is extrinsic and removable. The standard technique of eliminating high bias is to calculate the exercise criteria by using an independent set of Monte Carlo paths, thereby eliminating the correlation.
The simulation estimators in the literature are typically either lowbiased or highbiased. Since there can be no unbiased simulation estimator, the idea of Broadie and Glasserman (1997)
is to construct low and highbiased estimators to form a confidence interval for the true option price. The LSM estimator, on the contrary, has elements of both low and high biases.
^{2}^{2}2Glasserman (2013) calls this an interleaving estimator, as it alternates the elements of low and high bias in pricing. Rather than aiming to raise accuracy by letting these two biases partially offset, such a construction primarily is to retain the computational efficiency of the original formulation. Indeed, Longstaff and Schwartz (2001) claim that the lookahead bias of the LSM estimator is negligible by presenting a singleasset put option case tested with an independent simulation set as supporting evidence.^{3}^{3}3The option has 50 exercise times in one year. The pricing is calculated with 100,000 Monte Carlo paths. In this regard, the LSM estimator has been considered to be lowbiased.However, the claim does not necessarily generalize to a broad class of examples. Although lookahead bias is asymptotically zero in theoretical setting, it can be material in practice. For example, it tends to be more pronounced in multistate cases given the same simulation effort. While indepth analysis is provided later, the intuition is that overfitting occurs in the least square regression with a large number of explanatory variables. These are the exact circumstances under which the simulationbased method is inevitable because of the curse of dimensionality. As such, the LSM algorithm has been the industry standard for pricing callable bonds and structured notes whose coupons have complicated structures that depend on other underlying assets such as equity prices, foreign exchange rates, and benchmark interest swap rates. Multifactor models are required for such underlying assets as well as yield curves with a term structure. Therefore, it is important to understand the magnitude of lookahead bias in the LSM estimator and adopt an efficient algorithm for removing this bias for practical purposes.
In terms of machine learning theory, lookahead bias is overfitting caused by using the same dataset for both training (i.e., the estimation of the exercise policy) and testing (i.e., the valuation of the options). This is undesirable in machine learning applications, and various crossvalidation techniques are used to address the problem. In this context, using an independent set of paths for the outofsample prediction is the holdout method, one of the simplest crossvalidation techniques. While this approach successfully removes lookahead bias, its main disadvantage is the computational burden from doubling the simulation effort.^{4}^{4}4For some stochastic differential equations such as stochastic volatility models, Monte Carlo simulation depends on the timediscretized Euler scheme. The LSM method can use exhaustive storage when the number of exercise times is large because the whole path history has to be stored for the backward evaluation, unlike its European counterpart. Thus, storage can limit the number of simulated paths.
In this article, we present an efficient approach for removing lookahead bias, namely the leaveoneout LSM (LOOLSM) algorithm. LOOLSM is based on leaveoneout crossvalidation (LOOCV), a special type of fold crossvalidation, where equals the number of sample points. When making a prediction for a sample, LOOCV trains the model with all the data except the sample, thereby avoiding overfitting. The leaveoneout regression can be efficiently calculated by subtracting the analytic correction terms from the original full regression. Therefore, this simple idea can address the main drawback of the holdout method and make the LOOLSM estimator truly low biased. Furthermore, we can explicitly capture lookahead bias, from which we examine asymptotic behavior both theoretically and empirically.
Previous work along this line is limited. The low estimator of Broadie and Glasserman (1997) is constructed with the selfexcluded expectation, which is a trivial version of the leaveoneout regression. Regarding the bias correction, Fries (2008) formulates high bias as the price of the option on the Monte Carlo error and derives the analytic correction terms from the Gaussian error assumption. This approach is built upon a fundamentally different setup from ours, as we explain in Section 3.1. Carriere (1996) discusses the asymptotic behavior of bias empirically. We analyze this for lookahead bias in LSM specifically; indeed, the author’s observation is consistent with our findings.
The rest of the paper is organized as follows. In Section 2, we describe the LSM pricing framework and introduce the LOOLSM algorithm. In Section 3, we define lookahead bias and analyze the asymptotic behavior of such bias in LSM. In Section 4, we present the numerical results for several examples and demonstrate how they compare with other methods. Finally, Section 5 concludes.
2. The LOOLSM Algorithm
2.1. The LSM Algorithm
We start the section by introducing some notations that we use in the rest of the paper:

denotes the vector of Markovian state variables at time
,^{5}^{5}5State variables can be augmented to satisfy the Markovian property. 
denotes the option price function of the state at time , discounted to the present time.

denotes the discounted option payout function of the state at time .

denotes the discounted option price function of the state at time conditional on that it is not exercised at .

is evaluated at the specific state . We define and similarly.
For example, for a singlestock call option struck at when the riskfree rate is . For the numerical pricing, we always work with a finite set of possible exercise times , as we necessarily discretize the time dimension for any continuous exercise cases.^{6}^{6}6It is customary to assume that the present time is not an exercise time. As they are the only times we consider, we simply write as for , and likewise for , .
The valuation of options with early exercise features can be formulated as a maximization problem of the expected future payoffs over all possible choices of stopping times :
is commonly referred to as the continuation value or hold value in the literature. It is the expected next step option value,
The main difficulty of pricing Bermudan options with simulation methods lies in obtaining the optimal exercise policy (i.e. the estimation of ) from the simulated paths. It is because the Monte Carlo path generation goes forward in time, whereas the dynamic programming for pricing works backward by construction.
Suppose we have a method to estimate the continuation value . Then, the option value can be calculated via a dynamic optimization problem whose backward induction step is given as
(1) 
We write as to indicate that it is an estimated continuation value function from a specific simulation set, as opposed to the true value function . Likewise, we write to indicate a dependency to the simulation set. The backward induction step is calculated pathwise in simulation methods. Therefore, one can understand and in 1 are the values of and evaluated on the simulated path .
Equation 1 effectively means that the option is exercised at if and continued otherwise. For consistency, we assume to ensure that (always continue at ) and (always exercise at ). In the final step, the option price is calculated as
(2) 
where the expectation is taken over all simulation path under the asset price dynamics, conditional on the initial condition.
The only missing piece is how to calculate . Longstaff and Schwartz (2001)’s idea is to calculate using the least squares regression of the pathwise option values at next exercise time with the functions of simulated state variables at current exercise time. Namely, we first run the regression with the simulated data, then define . Here, is the set of regressors, a finite number of basis functions at time , and is the length vector of regression coefficients. For the intercept, we assume .
To see how this is formulated in the simulation setting, suppose that we generate Monte Carlo paths and concatenate the related pathwise quantities vertically. We introduce the following notations:

The by matrix is the simulation result of the regressors at time , each row corresponding to a sample. Define (an by matrix).

The length column vector contains the pathwise option values at time .

The length column vectors and contain the continuation value and option payout , respectively, at time .
Although the exercise time index is not specified for notational simplicity, it is clear that the above quantities are specific to a particular exercise time.
The continuation value is calculated as
Here, is a matrixvector multiplication. Now, we obtain the price by recursively running Equation (1) with and .
In Longstaff and Schwartz (2001), the regressions are run only with inthemoney samples. Although the LOOLSM estimator can be defined in this setup, we instead use all the samples.^{7}^{7}7According to Glasserman (2013), using only inthemoney samples can be inferior in some cases. To the best of our knowledge, it is also customary to implement the LSM algorithm with all the samples for the regression in practice. There is another benefit when using the same number of samples for the regression to analyze the asymptotic behavior of lookahead bias (see Section 3).
2.2. The LOOLSM Algorithm
At the heart of lookahead bias lies the fact that the future payoff of a sample path is included in the continuation value regression, from which the exercise decision is made for the sample path. Therefore, this context naturally gives rise to the application of the LOOCV method. Figure 1 illustrates this idea with a stylized example.
The prediction value with the leaveoneout regression differs from by an analytic correction term:
(3) 
where is the column vector of ones and the arithmetic operations between vectors are conducted elementwise. Here, is the estimation error and is the diagonal vector of the hat matrix . Regarding the components of , we can show that if is nonsingular. See Appendix A for the proofs. Therefore, the error after correction is higher than the original error in absolute term since
In other words, the regression error for LSM is smaller than the one for LOOLSM because of overfitting.
The extra computation required for the LOOLSM algorithm is minimal; can be efficiently computed as the row sum of the elementwise multiplication of the two matrices:
As the transpose of the latter matrix has already been computed for the regression coefficients , this only adds the operation.
3. Analysis of Lookahead Bias with the LOOLSM Estimator
3.1. LookAhead Bias
In this section, we present a formal definition of the lookahead bias of the simulation estimators. The backward induction step in Equation (1) can be written as a single equation:
where is the indicator function.
We take the expectation of the above equation with respect to all possible simulation paths conditional on that . For simpler notation, we introduce the conditional expectation notation
Despite a slight abuse of notation, it is a consistent generalization of Equation (2).
First assume that
. If we denote the continuation probability by
, we havewhere is a suboptimal estimator
Lookahead bias is defined as the first conditional covariance term:
(4) 
In other words, it is the covariance between the exercise decision and future payoff. It is always positive for LSM because the estimator is tilted toward . On the contrary, it is zero for LOOLSM as and are independent.
Except the first step , is not necessarily equal to since may be biased depending on how the high and low biases above offset in previous steps. Therefore, we do not attempt to formulate how the bias is accumulated inductively in each step. Rather, we measure overall lookahead bias as the price difference between the LSM and LOOLSM estimators.
3.2. Asymptotic Behavior
Carriere (1996) predicts that their high bias decays at the rate of as increases:^{8}^{8}8This is based on an alternative formulation to Equation (1), where backward induction is . Longstaff and Schwartz (2001) report that such a formulation typically has significant upward bias. Carriere (1996) measures the bias as the difference between the highbiased estimator and the exact price obtained by a lattice method.
Indeed, we find a similar pattern when we estimate lookahead bias as the difference between the LSM and LOOLSM estimators. While we present the empirical results in Section 4.4, here we attempt to provide a theoretical justification.
Instead of using the dummy index , we can view and as functions and , respectively, of the th simulated state . As such, is also understood as a function of :
Here, note that is independent from because the row is removed from the computation of .
Before we state the main theorem of the section, we make two assumptions. It is undesirable for
to have very small or zero eigenvalues, because it is when the regression becomes highly unstable or fails. When working with a large
and invertible, however, it is extremely unlikely to happen because of the central limit theorem.
^{9}^{9}9In practice, one can also avoid this problem through regularization (for example, ridge regression).
Therefore, we first assume thatAssumption 1.
The smallest eigenvalue of the normalized covariance is at least for some .
Another complication arises when the option and continuation values can be arbitrarily close with nonnegligible probability, in which case the backward induction becomes unstable. For this reason, we further assume that
Assumption 2.
For any compact set and , exists and is uniformly bounded.
Here, is a small constant, and denotes the probability of being in a neighborhood of the exercise boundary, . This rather strong assumption limits the scope of the payoff functions and regressors, however we believe this is satisfied for realistic examples. Note that these are similar to the assumptions made for convergence, but with some differences. (For example, see (Stentoft, 2004) for reference.)
theoremthmepsilon Under Assumptions 1 and 2,

Let be the Euclidean norm in . Then,

Let . Under suitable regularity conditions for and , expected bias defined as the difference between the LSM and LOOLSM estimators satisfies
for any .
We prove Theorem 2 in Appendix C. Essentially, the proof shows that lookahead bias mainly comes from a small neighborhood of the exercise boundary whose volume is , whereas the expected bias size is approximately held constant. For a large when the bias correction almost always happens at most once on each path, the result can be extended to overall bias. Then, by choosing a very small , the theorem implies that any realistic bias should decay at least at the rate of . On the contrary, the proof provides no clue how small has to be until we observe the expected decay.
4. Numerical Results
In this section, we present some Bermudan option examples to demonstrate how the LOOLSM algorithm works in comparison with other methods. We start with single asset put options and then move onto bestof options on two assets, and basket options on four assets, which is in increasing order of .
We assume that the asset prices follow geometric Brownian motion:
where is the riskfree rate, is the dividend yield, is the volatility, and ’s are the standard Brownian motions correlated by
The choice of geometric Brownian motion for the price dynamics has some advantages. For example, the exact simulation is possible and can be easily implemented. Moreover, it is a standard choice in the literature and we can find exact prices to use for the benchmark prices. More complicated SDEs requiring the Euler scheme may exhibit another kind of Monte Carlo bias resulting from the time discretization.
We run
independent Monte Carlo simulations to compute the price offset from the exact value and standard deviation:
where denotes the price estimator from each simulation. We run sets of simulations with paths each, except for Section 4.4 where and
are varied. We use antithetic random variate to reduce variance.
We compare the following three regressionbased methods:

LSM: Simulated paths are used for both the exercise policy and pricing.

Holdout: The exercise policy function is calculated by using an independent set of paths of the same size. Then, it is applied to the original paths for pricing.

LOOLSM: Simulated paths are used for both the exercise policy and pricing with LOOCV.
By using the same set of paths for pricing across methods, we can better control the variation in the comparison of prices. For example, all three methods produce the same European option prices by construction.
One adjustment we make to the backward induction step is to exercise only if the option payout is strictly positive. Conditional expectations can assume negative values, which is merely an artifact, as the continuation values are nonnegative for all the examples in this section. This is also in the spirit of the original LSM method, where the exercise decision is made only for the inthemoney samples.
4.1. Singlestock Put Option
We start with a Bermudan equity put option, which is simple yet a good base case. We use the same parameters as in Feng and Lin (2013), for strikes , and 120. The exact Bermudan price is available for in their paper, and we use the binomial tree method to calculate “exact” prices for the other strikes. Borrowing the notations from previous sections,
We use the following basis functions for the regression:
We present the results in two formats. Table 1 reports the price offset and standard deviation. As expected, the LOOLSM produces similar results to the holdout method. The LSM results are slightly higher than those of the other methods, implying that lookahead bias is small. This finding is consistent with the claim of Longstaff and Schwartz (2001) as well as with the observation that the LSM price is usually lowbiased. Table 2 highlights lookahead bias, for which we report the difference between the holdout/LOOLSM prices and LSM price. The LOOLSM bias correction has a smaller standard deviation than that of the holdout, as it directly removes the bias embedded in the LSM algorithm.
Bermudan  European  

Exact  LSM  Holdout  LOOLSM  Exact  LSM  
80  0.856  0.002 0.014  0.003 0.014  0.003 0.014  0.843  0.002 0.015 
90  2.786  0.002 0.019  0.004 0.019  0.003 0.018  2.714  0.002 0.024 
100  6.585  0.001 0.020  0.003 0.020  0.003 0.020  6.330  0.000 0.029 
110  12.486  0.009 0.024  0.011 0.023  0.012 0.024  11.804  0.001 0.026 
120  20.278  0.014 0.033  0.014 0.033  0.016 0.033  18.839  0.003 0.018 
Holdout  LOOLSM  

80  0.0013 0.0026  0.0011 0.0005 
90  0.0017 0.0035  0.0014 0.0007 
100  0.0025 0.0072  0.0024 0.0014 
110  0.0021 0.0088  0.0024 0.0011 
120  0.0003 0.0086  0.0022 0.0013 
4.2. Bestof Option
The next example is a bestof call option on two assets^{10}^{10}10This is also called the max or rainbow option., an option on the maximum of the asset prices,
This is introduced in Glasserman (2013) with the parameters,
with the initial asset prices , and 110. For regressors, we use the polynomials up to degree 3 and the payoff function,
We obtain the exact Bermudan option prices from Glasserman (2013)
and compute the exact European option prices from the analytic solutions expressed in terms of the bivariate cumulative normal distribution
(Rubinstein, 1991). Table 3 shows the result. The LOOLSM still works similarly to the holdout method in the twoasset case. The difference from the LSM price becomes clearer, but the LSM price is still biased low.Bermudan  European  

Exact  LSM  Holdout  LOOLSM  Exact  LSM  
90  8.080  0.025 0.055  0.041 0.056  0.040 0.054  6.655  0.011 0.062 
100  13.900  0.034 0.060  0.050 0.062  0.052 0.058  11.196  0.011 0.078 
110  21.340  0.035 0.065  0.057 0.068  0.054 0.064  16.929  0.013 0.096 
4.3. Basket Option
Next, we present a Bermudan call option on a basket of four stocks. The discounted exercise payoff is given as
We use the parameters introduced by Krekel et al. (2004) to analyze basket option pricing methods,
The exact prices of the European basket options for the same parameter set are obtained from Choi (2018). Because the underlying assets are not paying dividend, the optimal exercise policy for the option holder is not to exercise the option until maturity, so the European option price is equal to the Bermudan’s.
We use the polynomials up to degree 2 and the payoff function as regressors,
Table 4 reports the results for a wide range of strikes, , and 140. The LOOLSM method and holdout method still result in similar biases and errors. However, lookahead bias is more pronounced in this fourasset case and the LSM algorithm consistently produces higher prices across all strike levels.
Exact  LSM  Holdout  LOOLSM  European  

60  47.481  0.233 0.223  0.205 0.213  0.209 0.196  0.012 0.309 
80  36.352  0.230 0.255  0.174 0.244  0.158 0.235  0.012 0.316 
100  28.007  0.235 0.237  0.117 0.238  0.109 0.231  0.012 0.309 
120  21.763  0.226 0.236  0.084 0.245  0.080 0.229  0.013 0.293 
140  17.066  0.213 0.224  0.086 0.222  0.075 0.223  0.015 0.275 
4.4. Asymptotic Behavior
In this section, we analyze empirically how lookahead bias, measured as the difference between the LSM and LOOLSM estimators, behaves as the number of Monte Carlo paths and the number of regressors change. As reported in Section 4.4, we aim to check if the bias is .
Here is how we design the experiment to test the relationship with . In total, we generate 7,200,000 Monte Carlo paths. For , and , we split the paths into chunks of paths, each of which comprises one simulation set. Then, we run the LSM and LOOLSM algorithms for each simulation set separately, thereby obtaining prices. Then, we report the statistics as a function of . By using the same paths for the different ’s, we control the variability from the Monte Carlo simulation as much as possible, only leaving the bias that derives from the simulation size.
The result of the experiment is summarized in Figure 2. The left figures demonstrate how the LSM and LOOLSM prices converge as a function of . It shows only one parameter set (e.g. a chosen strike) for each example, but other choices exhibit the same patterns. The right plots show the loglog relationship between
and lookahead bias, with linear regression lines. That the slope is close to 1 for all strikes means that the bias decreases at the rate of
. This is indeed the case for the single asset and bestof twoasset examples when the bias is already small. On the contrary, the basket option example decays slower, with a slope around . It may thus require larger simulations to have similar asymptotic behavior. Indeed, the decay is more concave than other examples (see 60 or 80 strikes) and the slopes are closer to 1 when tested with an greater than .To test the relationship with , we run the following experiment for singlestock Bermudan put option case. Consider an extended set of basis functions,
For each , we use the first basis functions of for to run both the LSM and the LOOLSM methods using the same simulation paths. We run this with the same parameter set in 4.1, but with a fixed strike and .
The result can be found in Figure 3. The relationship between lookahead bias and variables and is generally consistent with the discussions in Section 3 and Theorem 2. In particular, the bottom plot clearly shows the proportionality between bias and . We further believe the rage of change in bias with respect to has to do with the choice of basis functions. Especially, it ought to be related to how useful a basis function is in the estimation of the continuation values. This can potentially be a future research topic.
5. Conclusion
This article presents a new efficient approach for removing the lookahead bias of the LSM algorithm (Longstaff and Schwartz, 2001). It is natural to apply the leaveoneout method in this context, a wellknown crossvalidation technique in machine learning. The resulting LOOLSM estimator can be implemented with little extra computational cost. We validate this approach with several examples. In particular, we demonstrate that the LSM price can be biased high for multiasset options and that the LOOLSM algorithm can effectively eliminate lookahead bias. Finally, we discuss the asymptotic behavior of lookahead bias, measured as the difference between the LSM and LOOLSM estimators. We uncover interesting connection of the bias decay not only to the number of Monte Carlo paths, but also to the number of regressors.
References
 Boyle (1988) Phelim P Boyle. A Lattice Framework for Option Pricing with Two State Variables. The Journal of Financial and Quantitative Analysis, 23(1):1–12, 1988. doi: 10.2307/2331019.
 Boyle et al. (1989) Phelim P Boyle, Jeremy Evnine, and Stephen Gibbs. Numerical Evaluation of Multivariate Contingent Claims. The Review of Financial Studies, 2(2):241–250, 1989.
 Brennan and Schwartz (1977) Michael J Brennan and Eduardo S Schwartz. The Valuation of American Put Options. The Journal of Finance, 32(2):449–462, 1977. doi: 10.2307/2326779.
 Broadie and Glasserman (1997) Mark Broadie and Paul Glasserman. Pricing Americanstyle securities using simulation. Journal of Economic Dynamics and Control, 21(89):1323–1352, 1997. doi: 10.1016/S01651889(97)000298.
 Broadie and Glasserman (2004) Mark Broadie and Paul Glasserman. A stochastic mesh method for pricing highdimensional American options. Journal of Computational Finance, 7(4):35–72, 2004. doi: 10.21314/JCF.2004.117.
 Carriere (1996) Jacques F Carriere. Valuation of the earlyexercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics, 19(1):19–30, 1996. doi: 10.1016/S01676687(96)000042.
 Choi (2018) Jaehyuk Choi. Sum of all BlackScholesMerton models: An efficient pricing method for spread, basket, and Asian options. Journal of Futures Markets, 38(6):627–644, 2018. doi: 10.1002/fut.21909.
 Cox et al. (1979) John C Cox, Stephen A Ross, and Mark Rubinstein. Option pricing: A simplified approach. Journal of Financial Economics, 7(3):229–263, 1979. doi: 10.1016/0304405X(79)900151.
 Feng and Lin (2013) Liming Feng and Xiong Lin. Pricing Bermudan Options in Lévy Process Models. SIAM Journal on Financial Mathematics, 4(1):474–493, 2013. doi: 10.1137/120881063.
 Fries (2008) Christian P Fries. Foresight Bias and Suboptimality Correction in MonteCarlo Pricing of Options with Early Exercise. In Progress in Industrial Mathematics at ECMI 2006, pages 645–649. Springer, 2008. doi: 10.1007/9783540719922˙107. URL http://christianfries.de/finmath/foresightbias/.
 Fu et al. (2001) Michael C Fu, Scott B Laprise, Dilip B Madan, Yi Su, and Rongwen Wu. Pricing American options: a comparison of Monte Carlo simulation approaches. Journal of Computational Finance, 4(3):39–88, 2001. doi: 10.21314/JCF.2001.066.
 Glasserman (2013) Paul Glasserman. Monte Carlo Methods in Financial Engineering, volume 53. Springer Science & Business Media, 2013.
 He (1990) Hua He. Convergence from Discrete to ContinuousTime Contingent Claims Prices. The Review of Financial Studies, 3(4):523–546, 1990.
 Krekel et al. (2004) Martin Krekel, Johan de Kock, Ralf Korn, and TinKwai Man. An analysis of pricing methods for basket options. Wilmott Magazine, 2004(7):82–89, 2004.
 Longstaff and Schwartz (2001) Francis A Longstaff and Eduardo S Schwartz. Valuing American Options by Simulation: A Simple LeastSquares Approach. The Review of Financial Studies, 14(1):113–147, 2001. doi: 10.1093/rfs/14.1.113.
 Mohammadi (2016) Mohammad Mohammadi. On the bounds for diagonal and offdiagonal elements of the hat matrix in the linear regression model. Revstat–Statistical Journal, 14(1):75–87, 2016.
 Rubinstein (1991) Mark Rubinstein. Somewhere Over the Rainbow. Risk, 1991(11):63–66, 1991.
 Sherman and Morrison (1950) Jack Sherman and Winifred J Morrison. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. The Annals of Mathematical Statistics, 21(1):124–127, 1950. doi: 10.1214/aoms/1177729893.
 Stentoft (2004) Lars Stentoft. Convergence of the Least Squares Monte Carlo Approach to American Option Valuation. Management Science, 50(9):1193–1203, 2004. doi: 10.1287/mnsc.1030.0155.

Tsitsiklis and Van Roy (2001)
John N Tsitsiklis and Benjamin Van Roy.
Regression methods for pricing complex Americanstyle options.
IEEE Transactions on Neural Networks
, 12(4):694–703, 2001. doi: 10.1109/72.935083.
Appendix A Derivation of LOOCV
The least square regression of on is given as
We use the following notations:

is the th row vector of the sample matrix ,

is the th component of the column vector ,

and are and with the th row removed, respectively,

,

is the diagonal vector of and is the th component of .
The leaveoneout regression calculates the coefficients from and instead. It is straightforward to show that
By applying the Sherman–Morrison formula [Sherman and Morrison, 1950] to , we obtain
Therefore,
and
We can switch the perspective and still find a similar formula. If we apply the Sherman–Morrison formula to instead, we have
where . We can proceed similarly to obtain
We next show that
by using singular value decomposition (SVD). It is possible to decompose an
by matrix into full and reduced SVDs:assuming that . In the full SVD, is an by orthogonal matrix (), is an by orthogonal matrix (), and is a diagonal by matrix with singular values on the diagonal. In the reduced SVD, is the submatrix of consisting of the first columns and is the square submatrix of consisting of the first rows, with the zero rows in the bottom truncated.
Let and be the th row vectors of and , respectively. By using the reduced SVD, the hat matrix and diagonal elements are expressed as
where is the Euclidean vector norm. Since from , it follows that
Moreover, the sum of the ’s is
If the regression includes the intercept (i.e., the first column of consists of 1’s), the lower bound can be even tighter. If is the submatrix of with the 1’s column removed,
and the same conclusion is drawn for the diagonal of as long as the rank of is higher than . Therefore,
where the expectation is over . See Mohammadi [2016] and others for references.
Appendix B Comparison with Foresight Bias in Fries [2008]
In Fries [2008], foresight bias is defined as the option value on the Monte Carlo error in the estimation of the continuation values. Let us denote this by . By simply writing as , respectively and letting , can be decomposed into two sources of bias:
In this expression, the covariance term is foresight bias while the term is suboptimal bias. From the Gaussian error assumption , the foresight bias term can be calculated analytically , where
is the probability density function of the standard normal distribution.
This is closely related to the definition in Equation (4), but with a crucial difference. By using the same notations, Equation (4) is rewritten as
where . This is different from in that is for the deviation of samples, whereas is that of the resulting estimators. If we denote the Monte Carlo error of the LOOLSM estimator by ,
Therefore, the total Monte Carlo error is a weighted average of the two independent error terms, and . As we work with Equation (1), our lookahead bias term correctly captures the contribution of to the exercise decision. On the contrary, the foresight bias term in Fries [2008] includes the contribution from as well since it defines the bias term by using the convexity of the maximum function of the total Monte Carlo error . Such a term is not the source of high bias in the original LSM formulation.
Appendix C Proof of Theorem 2
*
Proof.
(1) Since is a real symmetric matrix, there exists an orthogonal matrix and a diagonal matrix such that by the spectral theorem. In particular, the diagonal entries of are the eigenvalues of . Let be the induced matrix norm. Then,
(2) We first show that
(6) 
This has the following interpretation; The first term contains the overfitting scenarios in which the LSM and LOOLSM algorithm disagree. The second term is the price difference as a result. From Equation (1),
Therefore, we have
where and are defined as
In other words, means the LSM algorithm continues when it should have exercised and means it exercises when it should have continued.
can be transformed as follows:
from Equation (5) and the relation . Likewise,
Finally, and can be combined into
Comments
There are no comments yet.