# An Efficient Approach for Removing Look-ahead Bias in the Least Square Monte Carlo Algorithm: Leave-One-Out

The least square Monte Carlo (LSM) algorithm proposed by Longstaff and Schwartz [2001] is the most widely used method for pricing options with early exercise features. The LSM estimator contains look-ahead bias, and the conventional technique of removing it necessitates an independent set of simulations. This study proposes a new approach for efficiently eliminating look-ahead bias by using the leave-one-out method, a well-known cross-validation technique for machine learning applications. The leave-one-out LSM (LOOLSM) method is illustrated with examples, including multi-asset options whose LSM price is biased high. The asymptotic behavior of look-ahead bias is also discussed with the LOOLSM approach.

## Authors

• 4 publications
• 2 publications
• 1 publication
05/10/2021

### Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Cross-validation is a well-known and widely used bandwidth selection met...
03/12/2012

The decentralized particle filter (DPF) was proposed recently to increas...
03/09/2018

### Efficient Pricing of Barrier Options on High Volatility Assets using Subset Simulation

Barrier options are one of the most widely traded exotic options on stoc...
07/04/2020

### Playing Chess with Limited Look Ahead

We have seen numerous machine learning methods tackle the game of chess ...
02/25/2021

### Time-Series Imputation with Wasserstein Interpolation for Optimal Look-Ahead-Bias and Variance Tradeoff

Missing time-series data is a prevalent practical problem. Imputation me...
06/01/2011

### Conflict-Directed Backjumping Revisited

In recent years, many improvements to backtracking algorithms for solvin...
03/30/2020

### Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints

The Coalition Formation with Spatial and Temporal constraints Problem (C...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Derivatives with early exercise features are popular, with American- and Bermudan-style options being the most common types. Nonetheless, the pricing of these options is a difficult problem in the absence of closed-form solutions, even in the simplest case of valuing American options on a single asset. Researchers have thus developed various numerical methods for pricing that largely fall into two categories: the lattice-based and simulation-based approaches.

In the lattice-based approach, pricing is performed on a dense lattice in the state space by valuing the options at each point of the lattice using suitable boundary conditions and the mathematical relations among neighboring points. Examples include the finite difference scheme (Brennan and Schwartz, 1977), binomial tree (Cox et al., 1979) and its multidimensional generalizations (Boyle, 1988; Boyle et al., 1989; He, 1990)

. These methods are known to work well in low-dimensional problems. However, they become impractical in higher-dimensional settings, mainly because the lattice size grows exponentially as the number of state variables increases. This phenomenon is commonly referred to as the curse of dimensionality.

In the simulation-based approach, the state variables are randomly drawn from the underlying asset processes. Therefore, this simulation technique paves the way to solve the aforementioned computational challenges. The price is calculated as the average of the option values over simulated paths, each of which represents a future realization of the state variables with respect to the risk-neutral measure. This entails finding the optimal exercise rules, for which a group of simulation-based methods propose various approaches for estimating the continuation values as conditional expectations. Equipped with stopping time rules, they calculate the option price by solving a dynamic programming problem whose Bellman equation is essentially the comparison between the continuation values and exercise values.

The randomized tree method  (Broadie and Glasserman, 1997) estimates the continuation value at each node of the tree as the average discounted option values of its children. This non-parametric approach is of the most generic type, but its use is limited in scope because the tree size still grows exponentially in the number of exercise times. The stochastic mesh method (Broadie and Glasserman, 2004) overcomes this issue by using the mesh structure in which all the states at the next exercise time are the children of any state at the current exercise time. The conditional expectation is computed as a weighted average of the children, where the weights are determined by likelihood ratios. Regression-based methods (Carriere, 1996; Tsitsiklis and Van Roy, 2001; Longstaff and Schwartz, 2001) use regression techniques to estimate the continuation values from the simulated paths. Such regression approaches are computationally tractable, as they are linear not only in the number of simulated paths, but also in the number of exercise times.111This implicitly assumes that the number of regressors is constant. Such an assumption is reasonable because the number of regressors is usually much smaller than the number of Monte Carlo paths. Fu et al. (2001) and Glasserman (2013) provide an excellent review of the implementation and comparison of simulation-based methods. Among these variations, conducting the least square Monte Carlo (LSM) algorithm proposed by Longstaff and Schwartz (2001) is standard practice for valuing options with early exercise features because of its simplicity and efficiency, which makes it a strong choice from a practical standpoint.

Simulation-based estimators are biased by two main sources: low and high. Low bias is related to a suboptimal exercise decision owing to various approximations in the method. For this reason, it is also called suboptimal bias. For example, regressing with finite basis functions cannot fully represent the conditional payoff function and the price estimated from finite simulation paths contains noise. As a result, the exercise policies are suboptimal and therefore lead to a lower option price.

High bias is due to the positive correlation between exercise decisions and future payoffs; the algorithm is more likely to continue (exercise) precisely when future payoffs are higher (lower). This results from sharing the simulated paths for the exercise policy and payoff valuation. In simple terms, high bias is overfitting from the undesirable use of future information. For this reason, it is called look-ahead or foresight bias. These two sources of bias are opposite in nature as well as in direction. Low bias is intrinsic to numerical methods, whereas high bias is extrinsic and removable. The standard technique of eliminating high bias is to calculate the exercise criteria by using an independent set of Monte Carlo paths, thereby eliminating the correlation.

The simulation estimators in the literature are typically either low-biased or high-biased. Since there can be no unbiased simulation estimator, the idea of Broadie and Glasserman (1997)

is to construct low- and high-biased estimators to form a confidence interval for the true option price. The LSM estimator, on the contrary, has elements of both low and high biases.

222Glasserman (2013) calls this an interleaving estimator, as it alternates the elements of low and high bias in pricing. Rather than aiming to raise accuracy by letting these two biases partially offset, such a construction primarily is to retain the computational efficiency of the original formulation. Indeed, Longstaff and Schwartz (2001) claim that the look-ahead bias of the LSM estimator is negligible by presenting a single-asset put option case tested with an independent simulation set as supporting evidence.333The option has 50 exercise times in one year. The pricing is calculated with 100,000 Monte Carlo paths. In this regard, the LSM estimator has been considered to be low-biased.

However, the claim does not necessarily generalize to a broad class of examples. Although look-ahead bias is asymptotically zero in theoretical setting, it can be material in practice. For example, it tends to be more pronounced in multi-state cases given the same simulation effort. While in-depth analysis is provided later, the intuition is that overfitting occurs in the least square regression with a large number of explanatory variables. These are the exact circumstances under which the simulation-based method is inevitable because of the curse of dimensionality. As such, the LSM algorithm has been the industry standard for pricing callable bonds and structured notes whose coupons have complicated structures that depend on other underlying assets such as equity prices, foreign exchange rates, and benchmark interest swap rates. Multi-factor models are required for such underlying assets as well as yield curves with a term structure. Therefore, it is important to understand the magnitude of look-ahead bias in the LSM estimator and adopt an efficient algorithm for removing this bias for practical purposes.

In terms of machine learning theory, look-ahead bias is overfitting caused by using the same dataset for both training (i.e., the estimation of the exercise policy) and testing (i.e., the valuation of the options). This is undesirable in machine learning applications, and various cross-validation techniques are used to address the problem. In this context, using an independent set of paths for the out-of-sample prediction is the hold-out method, one of the simplest cross-validation techniques. While this approach successfully removes look-ahead bias, its main disadvantage is the computational burden from doubling the simulation effort.444For some stochastic differential equations such as stochastic volatility models, Monte Carlo simulation depends on the time-discretized Euler scheme. The LSM method can use exhaustive storage when the number of exercise times is large because the whole path history has to be stored for the backward evaluation, unlike its European counterpart. Thus, storage can limit the number of simulated paths.

In this article, we present an efficient approach for removing look-ahead bias, namely the leave-one-out LSM (LOOLSM) algorithm. LOOLSM is based on leave-one-out cross-validation (LOOCV), a special type of -fold cross-validation, where equals the number of sample points. When making a prediction for a sample, LOOCV trains the model with all the data except the sample, thereby avoiding overfitting. The leave-one-out regression can be efficiently calculated by subtracting the analytic correction terms from the original full regression. Therefore, this simple idea can address the main drawback of the hold-out method and make the LOOLSM estimator truly low biased. Furthermore, we can explicitly capture look-ahead bias, from which we examine asymptotic behavior both theoretically and empirically.

Previous work along this line is limited. The low estimator of Broadie and Glasserman (1997) is constructed with the self-excluded expectation, which is a trivial version of the leave-one-out regression. Regarding the bias correction, Fries (2008) formulates high bias as the price of the option on the Monte Carlo error and derives the analytic correction terms from the Gaussian error assumption. This approach is built upon a fundamentally different set-up from ours, as we explain in Section 3.1. Carriere (1996) discusses the asymptotic behavior of bias empirically. We analyze this for look-ahead bias in LSM specifically; indeed, the author’s observation is consistent with our findings.

The rest of the paper is organized as follows. In Section 2, we describe the LSM pricing framework and introduce the LOOLSM algorithm. In Section 3, we define look-ahead bias and analyze the asymptotic behavior of such bias in LSM. In Section 4, we present the numerical results for several examples and demonstrate how they compare with other methods. Finally, Section 5 concludes.

## 2. The LOOLSM Algorithm

### 2.1. The LSM Algorithm

We start the section by introducing some notations that we use in the rest of the paper:

• denotes the vector of Markovian state variables at time

,555State variables can be augmented to satisfy the Markovian property.

• denotes the option price function of the state at time , discounted to the present time.

• denotes the discounted option payout function of the state at time .

• denotes the discounted option price function of the state at time conditional on that it is not exercised at .

• is evaluated at the specific state . We define and similarly.

For example, for a single-stock call option struck at when the risk-free rate is . For the numerical pricing, we always work with a finite set of possible exercise times , as we necessarily discretize the time dimension for any continuous exercise cases.666It is customary to assume that the present time is not an exercise time. As they are the only times we consider, we simply write as for , and likewise for , .

The valuation of options with early exercise features can be formulated as a maximization problem of the expected future payoffs over all possible choices of stopping times :

 V0(s)=maxτ∈T E[Zτ(S)|S(0)=s].

is commonly referred to as the continuation value or hold value in the literature. It is the expected next step option value,

 Cl(s)=E[Vl+1(S)|S(tl)=s].

The main difficulty of pricing Bermudan options with simulation methods lies in obtaining the optimal exercise policy (i.e. the estimation of ) from the simulated paths. It is because the Monte Carlo path generation goes forward in time, whereas the dynamic programming for pricing works backward by construction.

Suppose we have a method to estimate the continuation value . Then, the option value can be calculated via a dynamic optimization problem whose backward induction step is given as

 ^Vl(S)={Zl(S)ifZl(S)>^Cl(S),^Vl+1(S)ifZl(S)≤^Cl(S). (1)

We write as to indicate that it is an estimated continuation value function from a specific simulation set, as opposed to the true value function . Likewise, we write to indicate a dependency to the simulation set. The backward induction step is calculated pathwise in simulation methods. Therefore, one can understand and in 1 are the values of and evaluated on the simulated path .

Equation 1 effectively means that the option is exercised at if and continued otherwise. For consistency, we assume to ensure that (always continue at ) and (always exercise at ). In the final step, the option price is calculated as

 ^V0(s)=E[^V0(S)|S(0)=s], (2)

where the expectation is taken over all simulation path under the asset price dynamics, conditional on the initial condition.

The only missing piece is how to calculate . Longstaff and Schwartz (2001)’s idea is to calculate using the least squares regression of the pathwise option values at next exercise time with the functions of simulated state variables at current exercise time. Namely, we first run the regression with the simulated data, then define . Here, is the set of regressors, a finite number of basis functions at time , and is the length- vector of regression coefficients. For the intercept, we assume .

To see how this is formulated in the simulation setting, suppose that we generate Monte Carlo paths and concatenate the related pathwise quantities vertically. We introduce the following notations:

• The by matrix is the simulation result of the regressors at time , each row corresponding to a sample. Define (an by matrix).

• The length- column vector contains the pathwise option values at time .

• The length- column vectors and contain the continuation value and option payout , respectively, at time .

Although the exercise time index is not specified for notational simplicity, it is clear that the above quantities are specific to a particular exercise time.

The continuation value is calculated as

 c=Xβ=Hvwhereβ=Σ−1X⊤v,H=XΣ−1X⊤

Here, is a matrix-vector multiplication. Now, we obtain the price by recursively running Equation (1) with and .

In Longstaff and Schwartz (2001), the regressions are run only with in-the-money samples. Although the LOOLSM estimator can be defined in this set-up, we instead use all the samples.777According to Glasserman (2013), using only in-the-money samples can be inferior in some cases. To the best of our knowledge, it is also customary to implement the LSM algorithm with all the samples for the regression in practice. There is another benefit when using the same number of samples for the regression to analyze the asymptotic behavior of look-ahead bias (see Section 3).

### 2.2. The LOOLSM Algorithm

At the heart of look-ahead bias lies the fact that the future payoff of a sample path is included in the continuation value regression, from which the exercise decision is made for the sample path. Therefore, this context naturally gives rise to the application of the LOOCV method. Figure 1 illustrates this idea with a stylized example.

The prediction value with the leave-one-out regression differs from by an analytic correction term:

 c′=c−h⋅e1−hfore=v−c (3)

where is the column vector of ones and the arithmetic operations between vectors are conducted element-wise. Here, is the estimation error and is the diagonal vector of the hat matrix . Regarding the components of , we can show that if is non-singular. See Appendix A for the proofs. Therefore, the error after correction is higher than the original error in absolute term since

 e′=e+h⋅e1−h=e1−h.

In other words, the regression error for LSM is smaller than the one for LOOLSM because of overfitting.

The extra computation required for the LOOLSM algorithm is minimal; can be efficiently computed as the row sum of the element-wise multiplication of the two matrices:

 h=rowsum(X⋅XΣ−1).

As the transpose of the latter matrix has already been computed for the regression coefficients , this only adds the operation.

## 3. Analysis of Look-ahead Bias with the LOOLSM Estimator

In this section, we present a formal definition of the look-ahead bias of the simulation estimators. The backward induction step in Equation (1) can be written as a single equation:

 ^Vl(S)=I[Zl(S)≤^Cl(S)]^Vl+1(S)+(1−I[Zl(S)≤^Cl(S)])Zl(S),

where is the indicator function.

We take the expectation of the above equation with respect to all possible simulation paths conditional on that . For simpler notation, we introduce the conditional expectation notation

 gl(s)=E[g(S)|S(tl)=s]for any functiong(⋅).

Despite a slight abuse of notation, it is a consistent generalization of Equation (2).

First assume that

. If we denote the continuation probability by

, we have

 ^Vl(s) =E[I[Zl(s)≤^Cl(S)]^Vl+1(S)|S(tl)=s]+(1−p)Zl(s) =Cov(I[Zl(s)≤^Cl(S)],^Vl+1(S)|S(tl)=s)+pCl(s)+(1−p)Zl(s) =Cov(I[Zl(s)≤^Cl(S)],^Vl+1(S)|S(tl)=s)+^V′l(s),

where is a suboptimal estimator

 ^V′l(s)=pCl(s)+(1−p)Zl(s)≤max(Cl(s),Zl(s))=Vl(s).

Look-ahead bias is defined as the first conditional covariance term:

 Bl(s)=^Vl(s)−^V′l(s)=Cov(I[Zl(s)≤^Cl(S)],^Vl+1(S)|S(tl)=s). (4)

In other words, it is the covariance between the exercise decision and future payoff. It is always positive for LSM because the estimator is tilted toward . On the contrary, it is zero for LOOLSM as and are independent.

Except the first step , is not necessarily equal to since may be biased depending on how the high and low biases above offset in previous steps. Therefore, we do not attempt to formulate how the bias is accumulated inductively in each step. Rather, we measure overall look-ahead bias as the price difference between the LSM and LOOLSM estimators.

Equation (4) is different from foresight bias in Fries (2008), who defines it as the value of the option on the Monte Carlo error in the estimation of the continuation values:

 BFriesl(s)=Cov(I[Zl(s)≤^Cl(S)],^Cl(S)|S(tl)=s).

In Appendix B, we discuss in detail how these two definitions are related.

### 3.2. Asymptotic Behavior

Carriere (1996) predicts that their high bias decays at the rate of as increases:888This is based on an alternative formulation to Equation (1), where backward induction is . Longstaff and Schwartz (2001) report that such a formulation typically has significant upward bias. Carriere (1996) measures the bias as the difference between the high-biased estimator and the exact price obtained by a lattice method.

 Bias=aN+O(1N2)for a constanta

Indeed, we find a similar pattern when we estimate look-ahead bias as the difference between the LSM and LOOLSM estimators. While we present the empirical results in Section 4.4, here we attempt to provide a theoretical justification.

Equation (3) is equivalent to

 c=c′+h′⋅e′1+h′=c′+h⋅e′whereh′j=xjΣ−1-jx⊤j, (5)

where is the -th row vector of and .

Instead of using the dummy index , we can view and as functions and , respectively, of the -th simulated state . As such, is also understood as a function of :

 h′(s)=x(s)Σ(s)−1x(s)⊤.

Here, note that is independent from because the row is removed from the computation of .

Before we state the main theorem of the section, we make two assumptions. It is undesirable for

to have very small or zero eigenvalues, because it is when the regression becomes highly unstable or fails. When working with a large

and invertible

, however, it is extremely unlikely to happen because of the central limit theorem.

999

In practice, one can also avoid this problem through regularization (for example, ridge regression).

Therefore, we first assume that

###### Assumption 1.

The smallest eigenvalue of the normalized covariance is at least for some .

Another complication arises when the option and continuation values can be arbitrarily close with non-negligible probability, in which case the backward induction becomes unstable. For this reason, we further assume that

###### Assumption 2.

For any compact set and , exists and is uniformly bounded.

Here, is a small constant, and denotes the probability of being in a -neighborhood of the exercise boundary, . This rather strong assumption limits the scope of the payoff functions and regressors, however we believe this is satisfied for realistic examples. Note that these are similar to the assumptions made for convergence, but with some differences. (For example, see (Stentoft, 2004) for reference.)

theoremthmepsilon Under Assumptions 1 and 2,

1. Let be the Euclidean norm in . Then,

 h′(s)<∥x(s)∥2λ(N−1).
2. Let . Under suitable regularity conditions for and , expected bias defined as the difference between the LSM and LOOLSM estimators satisfies

 E[Bl(s)]∼ϵ+O(MN)

for any .

We prove Theorem 2 in Appendix C. Essentially, the proof shows that look-ahead bias mainly comes from a small neighborhood of the exercise boundary whose volume is , whereas the expected bias size is approximately held constant. For a large when the bias correction almost always happens at most once on each path, the result can be extended to overall bias. Then, by choosing a very small , the theorem implies that any realistic bias should decay at least at the rate of . On the contrary, the proof provides no clue how small has to be until we observe the expected decay.

## 4. Numerical Results

In this section, we present some Bermudan option examples to demonstrate how the LOOLSM algorithm works in comparison with other methods. We start with single asset put options and then move onto best-of options on two assets, and basket options on four assets, which is in increasing order of .

We assume that the asset prices follow geometric Brownian motion:

 dSi(t)Si(t)=(r−qi)dt+σidWi(t),

where is the risk-free rate, is the dividend yield, is the volatility, and ’s are the standard Brownian motions correlated by

 dWi(t)dWj(t)=ρijdt(ρii=1).

The choice of geometric Brownian motion for the price dynamics has some advantages. For example, the exact simulation is possible and can be easily implemented. Moreover, it is a standard choice in the literature and we can find exact prices to use for the benchmark prices. More complicated SDEs requiring the Euler scheme may exhibit another kind of Monte Carlo bias resulting from the time discretization.

We run

independent Monte Carlo simulations to compute the price offset from the exact value and standard deviation:

 Price Offset=E[^V0]−VExact0,% Standard Deviation=√nn−1(E[^V20]−E[^V0]2),

where denotes the price estimator from each simulation. We run sets of simulations with paths each, except for Section 4.4 where and

are varied. We use antithetic random variate to reduce variance.

We compare the following three regression-based methods:

• LSM: Simulated paths are used for both the exercise policy and pricing.

• Hold-out: The exercise policy function is calculated by using an independent set of paths of the same size. Then, it is applied to the original paths for pricing.

• LOOLSM: Simulated paths are used for both the exercise policy and pricing with LOOCV.

By using the same set of paths for pricing across methods, we can better control the variation in the comparison of prices. For example, all three methods produce the same European option prices by construction.

One adjustment we make to the backward induction step is to exercise only if the option payout is strictly positive. Conditional expectations can assume negative values, which is merely an artifact, as the continuation values are nonnegative for all the examples in this section. This is also in the spirit of the original LSM method, where the exercise decision is made only for the in-the-money samples.

### 4.1. Single-stock Put Option

We start with a Bermudan equity put option, which is simple yet a good base case. We use the same parameters as in Feng and Lin (2013), for strikes , and 120. The exact Bermudan price is available for in their paper, and we use the binomial tree method to calculate “exact” prices for the other strikes. Borrowing the notations from previous sections,

 Z(S)=e−rtmax(K−S1,0),
 S1(0)=100,σ1=20%,r=5%,q1=2%,tl=l5(l=1,2,⋯,5).

We use the following basis functions for the regression:

 X(S)=(1,S1,S21,S31,Z(S1))(M=5).

We present the results in two formats. Table 1 reports the price offset and standard deviation. As expected, the LOOLSM produces similar results to the hold-out method. The LSM results are slightly higher than those of the other methods, implying that look-ahead bias is small. This finding is consistent with the claim of Longstaff and Schwartz (2001) as well as with the observation that the LSM price is usually low-biased. Table 2 highlights look-ahead bias, for which we report the difference between the hold-out/LOOLSM prices and LSM price. The LOOLSM bias correction has a smaller standard deviation than that of the hold-out, as it directly removes the bias embedded in the LSM algorithm.

### 4.2. Best-of Option

The next example is a best-of call option on two assets101010This is also called the max or rainbow option., an option on the maximum of the asset prices,

 Z(S)=e−rtmax(max(S1,S2)−K,0).

This is introduced in Glasserman (2013) with the parameters,

 K=100,σi=20%,r=5%,qi=10%,ρi≠j=0,tl=l3(l=1,2,⋯,9),

with the initial asset prices , and 110. For regressors, we use the polynomials up to degree 3 and the payoff function,

 X(S)=(1,S1,S2,S21,S1S2,S22,S31,S21S2,S1S22,S32,Z(S))(M=11).

We obtain the exact Bermudan option prices from Glasserman (2013)

and compute the exact European option prices from the analytic solutions expressed in terms of the bivariate cumulative normal distribution

(Rubinstein, 1991). Table 3 shows the result. The LOOLSM still works similarly to the hold-out method in the two-asset case. The difference from the LSM price becomes clearer, but the LSM price is still biased low.

Next, we present a Bermudan call option on a basket of four stocks. The discounted exercise payoff is given as

 Z(S)=e−rtmax(S1+S2+S3+S44−K,0).

We use the parameters introduced by Krekel et al. (2004) to analyze basket option pricing methods,

 Si(0)=100,σi=40%,r=qi=0,ρi≠j=0.5,tl=l2(l=1,2,⋯,10).

The exact prices of the European basket options for the same parameter set are obtained from Choi (2018). Because the underlying assets are not paying dividend, the optimal exercise policy for the option holder is not to exercise the option until maturity, so the European option price is equal to the Bermudan’s.

We use the polynomials up to degree 2 and the payoff function as regressors,

 X(S)=(1,Si,S2i,SiSj,Z(S))for1≤i

Table 4 reports the results for a wide range of strikes, , and 140. The LOOLSM method and hold-out method still result in similar biases and errors. However, look-ahead bias is more pronounced in this four-asset case and the LSM algorithm consistently produces higher prices across all strike levels.

### 4.4. Asymptotic Behavior

In this section, we analyze empirically how look-ahead bias, measured as the difference between the LSM and LOOLSM estimators, behaves as the number of Monte Carlo paths and the number of regressors change. As reported in Section 4.4, we aim to check if the bias is .

Here is how we design the experiment to test the relationship with . In total, we generate 7,200,000 Monte Carlo paths. For , and , we split the paths into chunks of paths, each of which comprises one simulation set. Then, we run the LSM and LOOLSM algorithms for each simulation set separately, thereby obtaining prices. Then, we report the statistics as a function of . By using the same paths for the different ’s, we control the variability from the Monte Carlo simulation as much as possible, only leaving the bias that derives from the simulation size.

The result of the experiment is summarized in Figure 2. The left figures demonstrate how the LSM and LOOLSM prices converge as a function of . It shows only one parameter set (e.g. a chosen strike) for each example, but other choices exhibit the same patterns. The right plots show the log-log relationship between

and look-ahead bias, with linear regression lines. That the slope is close to 1 for all strikes means that the bias decreases at the rate of

. This is indeed the case for the single asset and best-of two-asset examples when the bias is already small. On the contrary, the basket option example decays slower, with a slope around . It may thus require larger simulations to have similar asymptotic behavior. Indeed, the decay is more concave than other examples (see 60 or 80 strikes) and the slopes are closer to 1 when tested with an greater than .

To test the relationship with , we run the following experiment for single-stock Bermudan put option case. Consider an extended set of basis functions,

 Xext(S)=(1,S1,S21,⋯S181,Z(S1))

For each , we use the first basis functions of for to run both the LSM and the LOOLSM methods using the same simulation paths. We run this with the same parameter set in 4.1, but with a fixed strike and .

The result can be found in Figure 3. The relationship between look-ahead bias and variables and is generally consistent with the discussions in Section 3 and Theorem 2. In particular, the bottom plot clearly shows the proportionality between bias and . We further believe the rage of change in bias with respect to has to do with the choice of basis functions. Especially, it ought to be related to how useful a basis function is in the estimation of the continuation values. This can potentially be a future research topic.

## 5. Conclusion

This article presents a new efficient approach for removing the look-ahead bias of the LSM algorithm (Longstaff and Schwartz, 2001). It is natural to apply the leave-one-out method in this context, a well-known cross-validation technique in machine learning. The resulting LOOLSM estimator can be implemented with little extra computational cost. We validate this approach with several examples. In particular, we demonstrate that the LSM price can be biased high for multi-asset options and that the LOOLSM algorithm can effectively eliminate look-ahead bias. Finally, we discuss the asymptotic behavior of look-ahead bias, measured as the difference between the LSM and LOOLSM estimators. We uncover interesting connection of the bias decay not only to the number of Monte Carlo paths, but also to the number of regressors.

## References

• Boyle (1988) Phelim P Boyle. A Lattice Framework for Option Pricing with Two State Variables. The Journal of Financial and Quantitative Analysis, 23(1):1–12, 1988. doi: 10.2307/2331019.
• Boyle et al. (1989) Phelim P Boyle, Jeremy Evnine, and Stephen Gibbs. Numerical Evaluation of Multivariate Contingent Claims. The Review of Financial Studies, 2(2):241–250, 1989.
• Brennan and Schwartz (1977) Michael J Brennan and Eduardo S Schwartz. The Valuation of American Put Options. The Journal of Finance, 32(2):449–462, 1977. doi: 10.2307/2326779.
• Broadie and Glasserman (1997) Mark Broadie and Paul Glasserman. Pricing American-style securities using simulation. Journal of Economic Dynamics and Control, 21(8-9):1323–1352, 1997.
• Broadie and Glasserman (2004) Mark Broadie and Paul Glasserman. A stochastic mesh method for pricing high-dimensional American options. Journal of Computational Finance, 7(4):35–72, 2004.
• Carriere (1996) Jacques F Carriere. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics, 19(1):19–30, 1996.
• Choi (2018) Jaehyuk Choi. Sum of all Black-Scholes-Merton models: An efficient pricing method for spread, basket, and Asian options. Journal of Futures Markets, 38(6):627–644, 2018. doi: 10.1002/fut.21909.
• Cox et al. (1979) John C Cox, Stephen A Ross, and Mark Rubinstein. Option pricing: A simplified approach. Journal of Financial Economics, 7(3):229–263, 1979.
• Feng and Lin (2013) Liming Feng and Xiong Lin. Pricing Bermudan Options in Lévy Process Models. SIAM Journal on Financial Mathematics, 4(1):474–493, 2013. doi: 10.1137/120881063.
• Fries (2008) Christian P Fries. Foresight Bias and Suboptimality Correction in Monte-Carlo Pricing of Options with Early Exercise. In Progress in Industrial Mathematics at ECMI 2006, pages 645–649. Springer, 2008.
• Fu et al. (2001) Michael C Fu, Scott B Laprise, Dilip B Madan, Yi Su, and Rongwen Wu. Pricing American options: a comparison of Monte Carlo simulation approaches. Journal of Computational Finance, 4(3):39–88, 2001.
• Glasserman (2013) Paul Glasserman. Monte Carlo Methods in Financial Engineering, volume 53. Springer Science & Business Media, 2013.
• He (1990) Hua He. Convergence from Discrete- to Continuous-Time Contingent Claims Prices. The Review of Financial Studies, 3(4):523–546, 1990.
• Krekel et al. (2004) Martin Krekel, Johan de Kock, Ralf Korn, and Tin-Kwai Man. An analysis of pricing methods for basket options. Wilmott Magazine, 2004(7):82–89, 2004.
• Longstaff and Schwartz (2001) Francis A Longstaff and Eduardo S Schwartz. Valuing American Options by Simulation: A Simple Least-Squares Approach. The Review of Financial Studies, 14(1):113–147, 2001. doi: 10.1093/rfs/14.1.113.
• Mohammadi (2016) Mohammad Mohammadi. On the bounds for diagonal and off-diagonal elements of the hat matrix in the linear regression model. Revstat–Statistical Journal, 14(1):75–87, 2016.
• Rubinstein (1991) Mark Rubinstein. Somewhere Over the Rainbow. Risk, 1991(11):63–66, 1991.
• Sherman and Morrison (1950) Jack Sherman and Winifred J Morrison. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. The Annals of Mathematical Statistics, 21(1):124–127, 1950.
• Stentoft (2004) Lars Stentoft. Convergence of the Least Squares Monte Carlo Approach to American Option Valuation. Management Science, 50(9):1193–1203, 2004.
• Tsitsiklis and Van Roy (2001) John N Tsitsiklis and Benjamin Van Roy. Regression methods for pricing complex American-style options.

IEEE Transactions on Neural Networks

, 12(4):694–703, 2001.
doi: 10.1109/72.935083.

## Appendix A Derivation of LOOCV

The least square regression of on is given as

 ^y=Xβ=Hywhereβ=Σ−1X⊤y,H=XΣ−1X⊤,andΣ=X⊤X.

We use the following notations:

• is the -th row vector of the sample matrix ,

• is the -th component of the column vector ,

• and are and with the -th row removed, respectively,

• ,

• is the diagonal vector of and is the -th component of .

The leave-one-out regression calculates the coefficients from and instead. It is straightforward to show that

 hj=xjΣ−1x⊤j, Σ-j=Σ−x⊤jxj, X⊤-jy-j=X⊤y−x⊤jyj.

By applying the Sherman–Morrison formula [Sherman and Morrison, 1950] to , we obtain

 Σ−1-j=Σ−1+Σ−1x⊤jxjΣ−11−hj.

Therefore,

 ^β-j = Σ−1-jX⊤-jy-j = (Σ−1+Σ−1x⊤jxjΣ−11−hj)(X⊤y−x⊤jyj) = ^β−Σ−1x⊤jej1−hj

and

 ^y′=^y−h⋅e1−h,e′=e1−h.

We can switch the perspective and still find a similar formula. If we apply the Sherman–Morrison formula to instead, we have

 Σ−1=Σ−1-j−Σ−1-jx⊤jxjΣ−1-j1+h′j,

where . We can proceed similarly to obtain

 ^y=^y′+h′⋅e′1+h′,h=h′1+h′.

We next show that

by using singular value decomposition (SVD). It is possible to decompose an

by matrix into full and reduced SVDs:

 X=UΛV⊤=¯U¯ΛV⊤,

assuming that . In the full SVD, is an by orthogonal matrix (), is an by orthogonal matrix (), and is a diagonal by matrix with singular values on the diagonal. In the reduced SVD, is the sub-matrix of consisting of the first columns and is the square sub-matrix of consisting of the first rows, with the zero rows in the bottom truncated.

Let and be the -th row vectors of and , respectively. By using the reduced SVD, the hat matrix and diagonal elements are expressed as

 H=XΣ−1X⊤=¯U¯U⊤,hj=∥∥¯uj∥∥2,

where is the Euclidean vector norm. Since from , it follows that

 0≤hj=∥∥¯uj∥∥2≤∥∥uj∥∥2=1.

Moreover, the sum of the ’s is

If the regression includes the intercept (i.e., the first column of consists of 1’s), the lower bound can be even tighter. If is the sub-matrix of with the 1’s column removed,

 H=1N+H1,whereH1=X1(X⊤1X1)−1X⊤1

and the same conclusion is drawn for the diagonal of as long as the rank of is higher than . Therefore,

 1N≤hj≤1and1N≤E(hj)=MN,

where the expectation is over . See Mohammadi [2016] and others for references.

## Appendix B Comparison with Foresight Bias in Fries [2008]

In Fries [2008], foresight bias is defined as the option value on the Monte Carlo error in the estimation of the continuation values. Let us denote this by . By simply writing as , respectively and letting , can be decomposed into two sources of bias:

 BFriesl = E[max(Z,C+ξ)]−max(Z,C) = Z+E[I[ξ≥d]⋅(ξ−d)]−max(Z,C) = Cov(I[ξ≥d],ξ)A+Z−dE[I[ξ≥d]]−max(Z,C)B.

In this expression, the covariance term is foresight bias while the term is suboptimal bias. From the Gaussian error assumption , the foresight bias term can be calculated analytically , where

is the probability density function of the standard normal distribution.

This is closely related to the definition in Equation (4), but with a crucial difference. By using the same notations, Equation (4) is rewritten as

 Bl=Cov(I[ξ≥d],η),

where . This is different from in that is for the deviation of samples, whereas is that of the resulting estimators. If we denote the Monte Carlo error of the LOOLSM estimator by ,

 ξ = ^Cl(S)−Cl(s) = δ+h(s)(^Vl+1(S)−(Cl(s)+δ))(from % Equation~{}(???)) = (1−h(s))δ+h(s)η.

Therefore, the total Monte Carlo error is a weighted average of the two independent error terms, and . As we work with Equation (1), our look-ahead bias term correctly captures the contribution of to the exercise decision. On the contrary, the foresight bias term in Fries [2008] includes the contribution from as well since it defines the bias term by using the convexity of the maximum function of the total Monte Carlo error . Such a term is not the source of high bias in the original LSM formulation.

## Appendix C Proof of Theorem 2

*

###### Proof.

(1) Since is a real symmetric matrix, there exists an orthogonal matrix and a diagonal matrix such that by the spectral theorem. In particular, the diagonal entries of are the eigenvalues of . Let be the induced matrix norm. Then,

 h′(s) = x(s)Σ(s)−1x(s)⊤ ≤ ∥x(s)∥2∥∥Σ(s)−1∥∥ = ∥x(s)∥2∥∥D−1∥∥<∥x(s)∥2λ(N−1).

(2) We first show that

 Bl(S)=I⎡⎣0

This has the following interpretation; The first term contains the overfitting scenarios in which the LSM and LOOLSM algorithm disagree. The second term is the price difference as a result. From Equation (1),

 ^V\textsclsml(S) = I[Zl(S)<^Cl(S)](^Vl+1(S)−Zl(S))+Zl(S), ^V\textscloolsml(S) = I[Zl(S)<^C′l(S)](^Vl+1(S)−Zl(S))+Zl(S).

Therefore, we have

 Bl(S) = ^V\textsclsml(S)−^V\textscloolsml(S) = (I[Zl(S)<^Cl(S)]−I[Zl(S)<^C′l(S)])(^Vl+1(S)−Zl(S)) = (I[D1(S)]−I[D2(S)])(^Vl+1(S)−Zl(S)),

where and are defined as

 D1 : (Zl(S)<^Cl(S))∧(Zl(S)>^C′l(S)), D2 : (Zl(S)>^Cl(S))∧(Zl(S)<^C′l(S)).

In other words, means the LSM algorithm continues when it should have exercised and means it exercises when it should have continued.
can be transformed as follows:

 D1 ⟺ 0

from Equation (5) and the relation . Likewise,

 D2⟺0>Zl(S)−^C′l(S)>h′(S)(^Vl+1(S)−Zl(S)).

Finally, and can be combined into

 D3=D1∪D2:0