Sample Design for Medicaid and Healthcare Audits

09/05/2018 ∙ by Michelle Norris, et al. ∙ Sacramento State 0

We develop several tools for the determination of sample size and design for Medicaid and healthcare audits. The goal of these audits is to examine a population of claims submitted by a healthcare provider for reimbursement by a third party payer to determine the total amount of money which is erroneously claimed. For large audit populations, conclusions about the total amount of reimbursement claimed erroneously are often based on sample data. Often, sample size determination must be made in the absence of pilot study data and existing methods for doing so typically rely on restrictive assumptions. This includes the `all-or-nothing errors' assumption which assumes the error in a claim is either the entire claim amount or none of it. Under the all-or-nothing errors assumption, Roberts (1978) has derived estimates of the variances needed for sample size calculations under simple expansion and ratio estimation. Some audit populations, however, will contain claims which are partially in error. We broaden existing methodology to handle this scenario by proposing an error model which allows for partial errors by modeling the line-item error mechanism. We use this model to derive estimates of the variances needed for sample size determination under simple expansion and ratio estimation in the presence of partial errors. In the absence of certain error-rate parameter estimates needed to implement our method, we show that conservative sample sizes can be determined using the claim data alone. We further show that, under all-or-nothing errors, ratio estimation will tend to outperform simple expansion and that optimal stratification is independent of the population error rate under ratio estimation. The proposed sample design methods are illustrated on three simulated audit populations.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background and Motivation

According to the Medicaid program website (Centers for Medicare and Medicaid Services (2018a)),

“Medicaid provides health coverage to millions of Americans, including eligible low-income adults, children, pregnant women, elderly adults and people with disabilities…The program is funded jointly by states and the federal government.”

In 2016, $566 billion in Medicaid payments were disbursed to healthcare providers such as pharmacies, medical offices, and school districts in the US (Centers for Medicare and Medicaid Services (2018b)). In California, MediCal is the name for the Medicaid program, and the California State Controller’s Office is charged with conducting audits to ensure that MediCal funds paid to organizations conform to the requirements of the MediCal program and are of the appropriate amount.

In planning a MediCal audit, auditors typically have access to a population of MediCal claims which they are charged with auditing for correctness. For example, if the audited organization is a medical clinic, a single claim may represent a single visit by a single patient, and the population may contain a million claims from a three-year period. The population may account for tens of millions of dollars in disbursed MediCal payments. Because a complete examination of all claims is not feasible, auditors typically select a sample of claims, then, based on documentation, determine the appropriate amount of MediCal reimbursement that should have been paid for each claim in the sample. There are three possible outcomes for each sampled/audited claim:

  1. None of the amount claimed is disallowed, and the entire claimed amount is deemed allowable for reimbursement (as shown in lines 1 and 5 in Table 1).

  2. The entire amount claimed is deemed disallowed, and none is deemed allowable for reimbursement (lines 3 and 6 in Table 1).

  3. A portion of the total amount claimed is deemed disallowed and only the remaining portion is allowable for reimbursement(lines 2 and 4 in Table 1). This case is also called a partial payment or partial error.

A common assumption in the existing literature on audit sample design is the ‘all-or-nothing error assumption’ which states that the error/disallowed amount in a claim equals the entire claim amount or zero. The all-or-nothing error assumption precludes the possibility of partial errors but greatly simplifies theoretic calculations.

While the claim amounts are known for the entire population prior to an audit, the disallowed amounts are only known after the audit and only for the sampled claims. We will use both the terms ‘disallowed amount’ and ‘error amount’ to refer to the portion of a claim total that is not allowable for reimbursement.

Line Patient ID Date of Service Claimed Amount Disallowed/Error Amount
(known for (only known
entire population) for sampled claims)
1 33457 Jan 15, 2017 $52.50 $0
2 31415 March 10, 2017 $78.90 $30.00
3 44478 Oct 27, 2016 $25.90 $25.90
4 67841 May 5, 2016 $105.00 $50.00
5 55112 Nov 20, 2016 $125.00 $0
6 98765 May 1, 2016 $66.00 $66.00
Table 1: Portion of Hypothetical Data for a MediCal Audit

The total disallowed amount found in the sample is extrapolated from the sample to the population, and the audited organization is required to pay that amount or some related amount back to the MediCal fund. Clearly, maintaining a small margin of error in estimating the total disallowed amount is of interest to all parties. Thus, it is important to design audit samples which estimate the total disallowed amount with a reasonable margin of error while minimizing the sample size. In addition, since a pilot sample is typically an inconvenience to the organization being audited, audit samples must frequently be designed with little to no information about the population of disallowed amounts – making it difficult to determine an appropriate sampling plan.

The text Statistical Auditing by Roberts (1978) likely contains the most comprehensive treatment of sample design issues for audit populations. In particular, Roberts derives estimates of the population variances needed for sample size determination under both simple expansion and ratio estimation under the all-or-nothing errors assumption. His estimates do not require data from pilot samples. However, they do require estimating the error rate, defined as the proportion of claims in the population containing some error amount or disallowed amount. He uses a Bernoulli generative model to derive his estimates. King and Madansky (2013) also propose a Bernoulli model to estimate the variance of the disallowed amounts under simple expansion under the all-or-nothing errors assumption but arrive at a slightly different estimate. In this paper, we review and reconcile these two estimators. In addition, since all currently available methods of determining sample size depend on estimating the error rate or the variance of the population of disallowed values, we also propose a method for determining a conservative sample size which is based solely on the claimed values and does not require any additional information about the population of disallowed values except the all-or-nothing errors assumption.

Realistically, partial errors do occur in some audit populations so generalizing existing results and deriving new results that apply to more general error models is desirable. We note that Neter and Loebbecke (1977) do consider more general error models in their empirical study, but our research has revealed little theoretic work on sample design under more general error models which can handle partial errors. One exception is the penny sampling method proposed by Edwards et al. (2015)

, which treats each penny in the total audited amount as a sampling unit and uses the inversion of a hypothesis test for a binomial proportion to obtain exact confidence intervals. One limitation of this work, however, is that it cannot be used for populations where underpayment to the MediCal provider is a possibility, i.e. penny sampling can only be used with populations where all errors are overpayments to the provider. Another exception is

Liu et al. (2005)

who consider a partial-error model which assumes a quasi-uniform distribution of the partial error amount for each claim. Liu et al. use this model to derive optimal strata breakpoints under ratio estimation in the audit setting. However, the model of Liu et al. does not accurately model the line-item error mechanism which generates partial errors in healthcare audit populations. Consequently, we develop a novel partial error model based on the underlying line-item errors and use it to extend Roberts’ results on sample size for all-or-nothing error populations to audit populations with partial errors. Under our line-item error model, we additionally show that the resulting variance estimates, which depend on two possibly unknown error-rate parameters, can be maximized to obtain a conservative sample size for audits where estimates of the required error-rate parameters are not available.

We also consider the question of choosing between the simple expansion and ratio estimators in simple random sampling with all-or-nothing errors. The general advice on p.157 of Cochran (1977) is to use the ratio estimator instead of simple expansion when , the claim amount, and , the disallowed amount, satisfy:


We specialize this inequality to the audit population setting, and derive a formula for the probability that ratio estimation will outperform simple expansion. Since our formula only relies on the error rate and parameters for the claim population, it can be used for sample planning prior to collecting any information about the population of disallowed amounts.

We note that although we discuss our results in the context of MediCal/Medicaid audits, they are more generally applicable to any type of healthcare audit where 1) the sampling unit consists of an invoice which is composed of one or more line-item charges; 2) either the entire invoice amount or individual line-item amounts may be in error; and 3) line-item errors are either all-or-nothing errors or a pre-audit estimate of the amounts of all line-item partial errors is available.

2 Notation and Estimators

We now summarize some notation and the two common estimators used to extrapolate the total disallowed amount in an audit.

the population size
the population of known claimed amounts
the population of unknown disallowed/error amounts
the sample size
a sample random sample of claims without replacement
disallowed values corresponding to sampled claims

The rest of this paper is organized as follows. In Section 3, we give the sample size formula of interest. In Section 4, we discuss issues pertaining to sample size determination under the simple expansion estimator. In particular, we review the existing binomial generative model for audit populations; reconcile the estimators of the variance under all-or-nothing errors proposed by Roberts (1978) and King and Madansky (2013); propose a new partial-error model and extend the procedure for estimating variance to the proposed partial error model; and maximize the variance under all-or-nothing or partial errors to obtain conservative sample sizes that do not require pilot study information. In Section 5, we consider the ratio estimator. We start with a criteria for deciding between simple expansion and ratio estimation; review the estimator for the variance of the ratio estimator under simple random sampling proposed by Roberts (1978); extend this estimate of variance under the proposed partial-error model; maximize the variance and derive a procedure for calculating a conservative sample size; and finish with comments about optimal stratification under ratio estimation. In Section 6, we apply the sample design tools developed in this paper to three simulated audit populations. We offer some concluding remarks and avenues for further research in Section 7.

3 Sample Size Formula

Under simple expansion, the large-sample confidence level margin of error of is


where denotes the

th percentile of the standard normal distribution. Substituting equation (

3) into equation (4) and solving for , we obtain the following sample size formula under simple expansion


The sample size formula will give the sample size required to attain a chosen margin of error and confidence level provided that the variance of disallowed amounts, , is known. However, is typically not known in the planning stages of an audit. One could obtain an estimate of using a pilot sample, but this is an inconvenience to an audited organization since they would have to pull records twice – once for the pilot sample and again for the actual full audit. In the next section, we propose a generative model for audit populations which permits estimation of in cases where the error rate can be approximated. Under ratio estimation, the sample size formula is equation (5) with substituted for . We propose methods for estimating during the planning stages of an audit in Section 5.2.

4 The All-or-Nothing Errors Model

Roberts (1978) and King and Madansky (2013) both formulate estimates of for audit populations with all-or-nothing errors. They assume the audit population was generated in such a way that the entire claim amount is in error with probability or none of the claim amount is in error with probability . They additionally assume errors are made independently from claim to claim. More formally, letting be the value th claim in the population, be an error indicator variable, the error/disallowed value of the th claim for and , they propose the audit population is generated as follows:


Although they use the same generative model, Roberts and King et al. propose different estimates of . Roberts uses the expected value of the population variance of , where the expectation is taken over all potential audit populations. We denote Robert’s estimate as where . This estimate can be computed using the formula in equation (7).


On the other hand, King and Madansky (2013) use the total variance of , where

is defined to be a random draw from the random vector

to estimate . Thus, can be interpreted as a random draw from a random audit population. can be found using iterated expectations as shown in the proof to 4.1.

Theorem 4.1 (Total Expected Value and Variance)

Under the model for given in equation (6):


  1. (8)

The federal Office of Inspector General’s RAT-STATS software also uses the total variance of , , to estimate (RAT-STATs Companion Manual, Rev 5/2010, p. 4-9). The total variance, however, represents the variation in as the audit population and the sample from it vary. We would argue, however, that the audit population is fixed but unknown so that including variation due to a varying audit population in our estimation of is not conceptually satisfying. In addition, since the Roberts estimate will minimize the mean square prediction error, we prefer it over the total variance.

The two proposed estimators of are related by the following inequality:

However, if the population size, is large relative to and , then the term in will be small relative to so that

i.e. the proposed estimators will be roughly equal. This has been the case in several audit populations we have reviewed.

4.0.1 Estimating

The formula for in equation (7) only depends on the known population of claimed amounts and the error rate, . So we can use if an estimate of is available from a past survey or a pilot survey, then substitute the result into equation (5) to determine the sample size needed to achieve a given margin of error and confidence level.

If an estimate of is not available, we can obtain a conservative sample size by maximizing as a function of . Taking the derivative of and setting it equal to 0 gives:


Solving equation (9), we obtain


In order to maximize over , we must check and . Since and , the maximum value of is . The sample size obtained by substituting for in equation (5) will be the maximum sample size needed for a specified margin of error and confidence level over all possible error rates, .

4.1 Partial Payments

Thus far, we have considered a model with all-or-nothing errors. We now wish to consider sample size determination under simple expansion when there are partial payments in the population, i.e. only a portion of the amount claimed is deemed allowable and the remaining portion is disallowable. Liu et al. (2005) proposed the partial payment model in equation (11).


for and where and is the proportion of claims in the population having an error. This model assumes a uniform distribution over all potential error amounts below the average partial error () and a uniform distribution over all error amounts above the average partial error amount. However, in MediCal audits, the partial error amount of a claim typically arises from fixed, discrete amounts corresponding to errors in underlying line-item charges. For example, Table LABEL:lineitemtable shows the detailed line-item charges for a single MediCal claim for a fictitious patient. The claim consists of three line items – one for each billable service provided by the medical provider to the patient on his/her June 1, 2017 visit.

Patient ID Date of Service Procedure Claimed amount
1234 June 1, 2017 Office Visit, Level 4 $45.00
1234 June 1, 2017 Blood Test 6.00
1234 June 1, 2017 x-ray 17.00
Total $68.00

Table LABEL:lineitemtable

All-or-nothing errors can occur for any line item. It is also possible for a line item to be partially in error. Partial line-item errors occur when a billed procedure is downgraded to a lower level of service. For example, if MediCal was billed for a level 4 office visit, but documentation about the patient’s condition does not substantiate a level 4 office visit (based on the complexity of the case) then the procedure may be downgraded by the auditor to a level 3 office visit. The amount reimbursable by MediCal will also be adjusted, say from $45.00 to $40.00, resulting in a partial error of $5.00 for that line item.

We propose a partial error model which models the error/disallowed amount of a claim as the sum of the line-item disallowed amounts in that claim. We further assume that errors occur independently from line to line with the same probability on each line. In order to define the line-item model, we introduce some notation:

the number of lines in claim for
the claimed amount for line of claim
the error/disallowed amount for line of claim
the most probable error amount for line of claim
the sum of the most probable error amounts for claim
the probability of a line-item error

The most probable error amount, , will be for all-or-nothing line items and may be taken as the amount associated with one level of service below that which was claimed for downgradable line items (unless some auxiliary information suggests a better alternative). We can express the proposed partial error model as follows:


Letting be a line-item error indicator variable and recalling that is the claim-level error indicator, the claim level error/disallowed amount can be expressed as the sum of a term representing the entire amount of the claim for a claim-level error plus the sum of the line item errors if there is no claim-level error as shown in equation (13).


Using this model, we extend Roberts’ estimate of under all-or-nothing errors to allow for partial errors. Letting , we propose as an estimate of . We assume that the vectors of claim-level and line-item error indicator variables are independent, i.e. .


We now derive and to substitute back into equation (14).


Substituting (15) and (16) into (14) and simplifying gives:


In the case where line item errors are all-or-nothing, for all and so equation (17) simplifies to:

We note that all quantities in equation (17) are known from the claim data available prior to the audit except the claim-level error rate, , and the line-item error rate, . Thus, equation (17) can be used to estimate if estimates of and are available from past surveys or pilot study data. We address situations where estimates of these two parameters are not available in the next section.

4.2 Conservative Sample Size

Since is a polynomial in and , we can maximize over to determine a conservative sample size which will be sufficient for any combination of . To simplify notation, we define:

then the formula for given in equation (17) can be written:


Taking the partial derivatives of , we obtain:


Setting the partial derivatives equal to 0 and solving for results in the following cubic equation


Thus, setting the partials equal to 0 will yield at most three critical values of . We also check for possible maximums on the boundaries and by separately maximizing equations (22)-(25).


Examining the boundary equations, we observe that is either a constant or a quadratic function on each boundary and, hence, is easily maximized on any boundary. The conservative sample size is determined by taking to be its maximum value over any real-valued critical points that fall in and over the maxima from the four boundaries.

5 Ratio Estimation

In this section, we show that, in the all-or-nothing errors case, ratio estimation is expected to outperform simple expansion for any audit population, provided the assumptions are met for the use of ratio estimation. We then review the estimator of the variance, , needed under ratio estimation which was proposed in Roberts (1978) under the all-or-nothing error assumption. This proposed estimator of depends on the error rate , and we observe that maximizes the estimated value of . Thus, in cases where is unknown, a conservative sample size can be computed in the all-or-nothing errors case. We comment on stratification under ratio estimation. Finally, we derive an estimate of the variance for the line-item partial errors model and show that a conservative sample size can be computed under this model.

5.1 Choosing Between Ratio Estimation and Simple Expansion

We now derive a method for determining whether ratio estimation or simple expansion will be more efficient for extrapolating data from an audit sample. Rearranging the criteria (in inequality (1)) for choosing between these two estimators gives


Under the binomial generative model, we have and . Making these substitutions into inequality (26) and simplifying, we obtain:


The probability that will represent our confidence that the ratio estimator will have smaller variance than the simple expansion estimator. In order to compute this probability, we determine the distribution of . Often MediCal claim data consist of only a few distinct values, each of which is repeated a large number of times. Suppose there are distinct claim total values, resulting in the corresponding distinct values of , . Let for be the set of subscripts of claims having the value and be the number of elements in . Then the criteria for choosing between ratio estimation and simple expansion becomes:


Recall the

are independent and identically distributed Bernoulli random variables. Thus, for each

, the summation

will be approximately normally distributed by the Central Limit Theorem if

is large. In this case, will be approximately normally distributed since it is a linear combination of the approximately normal and independent random variables . Additionally, using linear operator properties of the mean and variance, the mean and variance of can be shown to be:


Let represent the standard normal variate. If is large for all ,

where the last line is true since the numerator of the right side of line (5.1) is negative. The last line implies that ratio estimation is always favored to outperform simple expansion in any claim population provided we can assume is approximately normally distributed. Examining equation (5.1), we see that as , the probability ratio estimation is preferred approaches 0.5, and as , the probability that ratio estimation is preferred approaches 1. If normality of is not reasonable, a Monte Carlo estimate of the probability that inequality (26) is true would give a more accurate estimate of our confidence that ratio estimation will outperform simple expansion.

As noted by Neter and Loebbecke (1977) and Edwards (2011)

, ratio-estimator-based confidence intervals can fail to attain the nominal confidence level when applied to audit populations even if the standard large-sample criteria for using ratio estimation are met. The excess zeros and skewness often found in audit populations require one to check normality assumptions under either estimator to ensure nominal confidence levels are likely to be met with the proposed sample size. This can be done through Monte Carlo simulation under a range of potential error rates prior to starting an audit.

5.2 Estimating

Assuming all-or-nothing errors, Roberts (1978)