The Ratcliff diffusion decision model (DDM) ratcliff1978theory,ratcliff2008diffusion is the most prominent evidence accumulation model for jointly modelling binary decision and associated response times. It assumes noisy information uptake; this noise is characterized by the Wiener process. The DDM is popular because (a) it utilizes a lot of information from the collected data, which in turn allows us to draw conclusions about the underlying cognitive processes that are assumed to underlie the empirical response time distribution voss2008fast and (b) it provides a good fit to data across different empirical domains [e.g.,]forstmann2016sequential.
However, this efficient use of information comes at a rather high computational cost. The probability density function (PDF) of the DDM is the density of first-passage times, that is, the time it takes for the Wiener process to first cross a fixed boundary. The computation of the PDF of the first-passage times is computationally expensive because it contains an infinite sum. Of course computers cannot actually evaluate an infinite sum, so several papers have developed approximation methods to this density function voss2008fast, navarro2009fast, blurton2012fast, gondan2014even. The different approximation methods provide different truncation rules that guarantee the value of the approximated density function does not differ from its true value by more than a pre-specified criterion, .
The analytic formulation of the diffusion model PDF originates from feller1968introduction, who provides its seminal derivation. In this derivation from first principles (i.e., the Wiener process), there is a step that requires taking a limit. feller1968introduction provides two different – but equivalent – limiting processes that yield two different – but equal – forms of the density function. Each of these forms contains an infinite sum, and they are known individually as the “large-time” density function and the “small-time” density function navarro2009fast. Even though the two distinct forms of the DDM density function are mathematically equivalent, they can produce slightly different results when calculated numerically with a fixed and limited precision.
The original truncations of the “large-time” and “small-time” density functions provided by navarro2009fast depend on precalculating the number of terms for the truncated versions of the aforementioned infinite sums. As suggested by their names, these density function approximation methods have contrasting performance across the domain of response times. The “large-time” approximation method is efficient for large response times, and the “small-time” approximation method is efficient for small response times. Contrarily, the “large-time” approximation method can be inefficient for small response times and vice-versa for the “small-time” approximation method; these inefficiencies may lead to inaccuracies in the approximations themselves. To make up for these shortcomings, navarro2009fast proposed a mechanism for choosing the more efficient and accurate approximation method between the “large-time” and “small-time” given a particular observation (i.e., pair of choice and associated response time) and set of parameter values. As shown in a set of benchmark analyses later in this manuscript, approximation methods that combine both the “large-time” and “small-time” density functions are not only faster on average but also more stable when used in an optimization setting.
gondan2014even introduced a new precalculation of the number of terms in the “small-time” truncated sum. Since this new precalculation performs the same role as that of navarro2009fast, we can implement it in the choosing mechanism provided by navarro2009fast. We include this combination of approximation methods in both our analysis and software package as it has not yet been implemented in existing software.
The main contribution of the present work is a novel approximation method to the “small-time” density function which – in contrast to the existing approximation methods – does not rely on precalculating the number of terms required in the truncated sum. By avoiding this precalculation, the new approximation methods relies on fewer overall computations when compared to the currently available approximation methods. We also provide a heuristic that allows the combination of the new “small-time” approximation method with the “large-time” approximation method of navarro2009fast. In a series of benchmark analyses, we show that the new combined approximation method is both faster than existing approximation methods and at the same time maintains the same high numerical accuracy, even when used for fitting real empirical data.
The existing literature navarro2009fast,gondan2014even discusses density function approximation methods to the variant of the DDM that maintains a constant drift rate across trials (i.e., the drift rate parameter is constant across experimental trials). However, it is common to allow the drift rate to vary across trials (i.e., include the parameter ), and this produces slightly different density functions. More specifically, the inclusion of across-trial variability in the drift rate is one of the big contributions of ratcliff1978theory and is what allows the diffusion model to predict slow errors [e.g.,]donkin2018response. Conveniently, the infinite sums in either version of the density function (“large-time” or “small-time”) is the same regardless of the inclusion of variability in the drift rate. We can exploit this sameness to apply the known approximation methods to the variable drift rate variants of the DDM density functions, being careful to ensure the desired error tolerance is scaled appropriately. As there is only one other implementation of a density function approximation method that includes variability in the drift rate rtdists, we include these variants both in our benchmark analyses and software package.
For both consistency and convenience in performing the benchmark analyses, we built the software package, fddm, that contains all of the available implementations of the DDM density function approximation methods. All of these implementations are available through the function dfddm() in the fddm package for the programming language R R. However, for computational efficiency, all approximation methods are implemented in pure C++, a relatively low-level programming language known for comparatively fast numerical computations. We used the R package Rcpp Rcpp1, Rcpp2, Rcpp3 to provide an R language user interface to the C++ implementations. fddm is freely available from CRAN111https://cran.r-project.org/package=fddm and comes with detailed documentation and model fitting examples.
Packaging all of the available implementations in a single software package is not only convenient for running benchmark analyses, but it also avoids any inconsistencies that may arise in the benchmark results from using implementations from different sources. The benchmark tests recorded the implementations’ evaluation time given a predefined error tolerance, , that remained consistent throughout the testing. Our analysis shows that the fastest and most stable approximation method for the DDM density functions of those currently available is our implementation of the “large-time” approximation method from navarro2009fast combined with the “small-time” approximation method formally introduced in this paper; we use this approximation method as the default for the dfddm() function. For reference, we also compare our implementations to existing DDM density function approximation implementations in R rtdists, RWiener and show that the default implementation in dfddm() is the fastest of those.
The remainder of this paper is structured as follows: Section 3 will detail all of the approximations to the two infinite sums discussed in the literature. In particular, this section will highlight the differences across the density functions and their approximation methods. Additionally, we will present a novel method for approximating the “small-time” density function, and this method is faster than the comparable approximation methods from the literature given a standard accuracy. In Section 4, we will discuss how we designed the computer code for the implementations of the density function approximation methods; we translate existing code into C++ and offer our optimizations for the existing approximation methods. The results of running benchmark tests on these implementations are discussed in Section 5, and we also include supplemental analyses to complement certain facets of the benchmark results in the Supplemental Materials document. Concluding remarks containing implications for how the field should use these approximation methods and implementations are found in Section 6. All additional mathematical content, such as proofs, is located in the Appendix at the end of this paper. Moreover, the source code for reproducing all analyses reported in the paper can be found on GitHub.222In the folder paper_analysis of: https://github.com/rtdists/fddm
2 The DDM Density Functions
As mentioned in the introduction, the density function of the DDM is the density of the first-passage times of the underlying Wiener process. The DDM density function is defined for data given as a tuple , where is a binary choice and is the associated response time. In the fddm software, we consider the DDM variant with six parameters: , the drift rate; , the inter-trial variability in the drift rate333The parameter in this paper is named sv in the fddm software.; , the threshold separation; , the relative starting point in the diffusion process; , the non-decision time; and , the diffusion coefficient of the underlying Wiener process. In the following subsections we will first explain our simplifications regarding the parameters for expositional purposes, and then we will explicitly provide the density functions using notation which closely resembles that of the recent literature navarro2009fast, gondan2014even.
2.1 Our Interpretation of the Density Functions
For clarity in exposition, we make some simplifications to the six-parameter variant of the DDM that we mentioned above. Although the fddm software includes all six of the aforementioned parameters, our subsequent discussions of the density functions and their properties will only detail the simplified versions that we will discuss imminently. Specifically, we exclude some of the inputs and parameters from our formulations because they are not critical to the mathematical understanding of the density functions. Ultimately, we will denote the density function as by the end of this subsection.
2.1.1 The Two Thresholds of the DDM
The DDM depends on the underlying Wiener process crossing one of two thresholds that are denoted by the binary choice in the data tuple. As the density function of the DDM is the density of the Wiener first-passage times across these two thresholds, we define the full density of the DDM as . However, the PDF is actually calculated in a piecewise manner such that there is a different density function that corresponds to each threshold, for and for , and this is described in the following equation:
Conveniently, there is a simple relationship between and :
Thus, we can write the full density function of the DDM as
Technically, and are defective density functions when considered individually as only WFPT integrates to unity, and and usually do not. We are interested, however, in calculating these defective density functions because they are more useful since it is straightforward to obtain the full PDF from either of these two pieces. Following the precedent set by the literature navarro2009fast, gondan2014even, we will work with the density function at the lower threshold and refer to this simply as either the density function or for the remainder of this paper.
2.1.2 The Underlying Wiener Process
As a reminder, the DDM provides the density function of the first-passage times of the Wiener process, a biased random walk in continuous time. This underlying Wiener process has a drift rate (i.e., the tendency toward one of the thresholds) and a diffusion coefficient (i.e., the amount of noise in the diffusion process). The diffusion coefficient is denoted and only scales the other parameters (i.e., , , and ), so we fix to both simplify our mathematical expressions and follow the precedent set by the recent literature navarro2009fast, blurton2012fast, gondan2014even, blurton2017first.444The default value in the fddm software is as well. Note that other formulations in the literature fix (most notably ratcliff1978theory), so the necessary care must be taken to rescale the parameters when comparing the results of different formulations.
For each trial in our parameterization of the DDM density functions, the drift rate of the underlying Wiener process can be viewed as a draw from normal distribution with mean
and variance. To be explicit, the parameter does not have any effect on noise within the trial as that noise is solely controlled by the underlying Wiener process (i.e., ). Rather, the parameter controls the inter-trial variability of the drift rate of the DDM by allowing the drift rate to change across trials. Letting can predict slow errors (i.e., different average response times between upper and lower responses in the absence of a response bias), a data pattern that earlier evidence accumulation models could not predict ratcliff1978theory. Setting removes all variance from the distribution and thus eliminates any inter-trial variability in the drift rate (i.e., the drift rate of the underlying Wiener process will always be precisely ). It can be verified that setting in either of the four-parameter variable drift rate variants of the density functions, Equations (6) and (7), will simplify to the three-parameter constant drift rate variants of the density functions, Equations (5) and (4), respectively.
2.1.3 Additional Notes
Furthermore, we exclude from our equations because it only serves as an additive shift in the data . That is, we can remap , and we will use this remapping in place of for the remainder of this paper.
Finally, our formulation of the density function uses the parameter as the relative starting point of the diffusion process. Other formulations may use the parameter as the absolute starting point of the diffusion process. As is a scaled version of , the relation between the two parameters is .
2.2 The Density Function Equations
Following the exposition in the previous subsection, we will refer to our interpretation of the DDM density function as either or depending on whether we are including inter-trial variability in the drift rate or not, respectively. Using the previous derivations from the literature feller1968introduction, there are three different versions of the probability density function: “large-time” with constant drift rate, , Equation (4); “small-time” with constant drift rate, , Equation (5); and “small-time” with variable drift rate across trials, , Equation (6). We will introduce another variant of the “large-time” density function that allows for a variable drift rate across trials, , Equation (7).
These density functions are named with a timescale label (“small” vs. “large”) because of their supposed advantageous computational performance for response times on the timescale of their respective labels. The categorization of a “small” and “large” response time depends on the other parameters, but in practice the threshold between these two labels is on the scale of 500ms. In this section, we will present the aforementioned variants of the density functions; each of the density function approximation methods covered later in this paper is related to one of these four variants.
2.2.1 Constant Drift Rate Density Functions
We begin by considering the variant with three parameters, , , and (i.e., ). This variant is known as the constant drift rate density function because the drift rate is held constant across trials due to the lack of inter-trial variability. There are two forms of the constant drift rate DDM variant: the “large-time” and “small-time” density functions. Both forms are found in the literature, and we present our formulations of them here. The “large-time” constant drift rate density function is
The “small-time” constant drift rate density function is
2.2.2 Variable Drift Rate Density Functions
Next we expand the three-parameter variant to the four-parameter variant in order to allow the drift rate to vary across trials (i.e., ). This variant is known as the variable drift rate density function because the drift rate is allowed to vary across trials. There is currently only the “small-time” variable drift rate density function available in the literature, so we will present our formulation of that. In addition, we will introduce its “large-time” counterpart by using the equality of the two constant drift rate density functions. The “small-time” variable drift rate density function is
Notice that both the terms and indices in the infinite sums are identical between the two “small-time” models, Equations (5) and (6), with only the multiplicative scalar differing. Exploiting these identical summations and using the equality of the two constant drift rate density functions Equations (4) and (5), we can write a “large-time” version of the density function that includes inter-trial variability in the drift rate (i.e., ). Detailed steps to arrive at this density function are presented in Appendix A.1. The “large-time” variable drift rate density function is
2.2.3 Converting Between the Constant and Variable Drift Rate Variants
Since the summation for each time scale (“small-time” and “large-time”) is the same regardless of including variability in the drift rate, it follows that there exists a term such that the density function for the constant drift rate variant multiplied by yields the density function for the variable drift rate variant. That is, from the above equations. Note that works for converting both the “large-time” and “small-time” constant drift rate densities to variable drift rate densities. Although we do not use this term in our implementations of the approximation methods, it may be useful in adapting preexisting approximation methods to output the density with variable drift rate. It is pertinent to note, however, that this conversion may produce inaccurate numerical approximations in certain cases. For more details and an example, see the Validity Vignette on the fddm CRAN page555https://cran.r-project.org/package=fddm/vignettes/validity.html#den-ke. The multiplicative term is given below:
3 Approximations to the DDM Density Functions
In order to approximate either the “large-time” or the “small-time” density functions, we must choose how to truncate the terms in the infinite sum so that the overall error in the approximation is less than the desired level of accuracy, . To guarantee this precision, the existing approximation methods first calculate , the number of terms required in the truncated sum so that the overall error of the approximation is less than . Fortunately, the infinite sums for each time scale are identical whether or not inter-trial variability is included; thus any truncation method is equally applicable to the constant drift rate and variable drift rate density functions, scaling the desired error tolerance appropriately.
The literature provides three methods for calculating . navarro2009fast introduce the first two methods, one for the “large-time” approximation and one for the “small-time” approximation. To maximize the efficiency of both approximations, navarro2009fast suggest combining the “large-time” and “small-time” approximations by minimizing the overall number of terms required in one of the summations. In this “combined-time” framework, the two approximations are compared by their precalculation of , and the approximation that requires fewer terms is selected. The third method for calculating is provided by gondan2014even and is another method to determine the number of terms required in the “small-time” truncated sum. This newer method can also be used in the “combined-time” framework of navarro2009fast by directly substituting the original method of precalculating with the new method.
In contrast to the existing approximation methods, we can take an alternative approach by simply adding terms to the truncated “small-time” summation until the error of the approximation is guaranteed to be less than the given error tolerance. This way we avoid the precalculation of , decreasing the number of computations that the approximation incurs. This approximation method was suggested by gondan2014even and is only valid in the “small-time” density function because the terms in this summation can be treated as an alternating and absolutely decreasing convergent sequence. This fact and a formal proof of this approximation method’s validity are provided in Appendix A.3.2. For ease of notation, we will refer to this approximation method as SWSE for the duration of this paper (an acronym for Stop When Small Enough). Moreover, we can implement a simple heuristic approach to use this approximation method in the “combined-time” framework of navarro2009fast, despite not precalculating .
To make matters even more complicated, there exist two mathematically equivalent formulations for the “small-time” summation. We can pair each of these with a different method for calculating the number of terms in the truncation, yielding double the number of available “small-time” approximation methods. The mathematical equivalence of these two sequences is shown in Appendix A.2, and so we will refrain from adding more detail here. The two different styles of summation are shown below in Equation (9), named according to the year they were published by gondan2014even and blurton2017first.
Ultimately, there is one approximation method to the “large-time” density function, and there are three methods to approximate the “small-time” density function. However, by using both styles of summation we double the number of “small-time” approximation methods to six. Then combining those six “small-time” approximation methods with the single “large-time” approximation method yields six “combined-time” approximation methods. In total, there are thirteen approximation methods for the DDM density functions, and we will discuss each of them in the following subsections.
3.1 Large Time
We will begin our discussion of the many density function approximations by temporarily restricting ourselves to the “large-time” density function, Equation (4). The approximation of the “large-time” density function is ultimately a truncation of the infinite summation contained within it, as this infinite sum is technically incalculable because the analytic solution is still unknown. Since it can be shown that this infinite sum converges (see Appendix A.3.1), the most straightforward way to truncate this infinite sum is to only include the first terms666Note that the term with index evaluates to , so we exclude this term from the summation notation to mitigate combinatorial confusion. (i.e., the indices ). Substituting for in Equations (4) and (7) yields the truncated version of the “large-time” density functions:
Figure 1 shows an example of the terms in the “large-time” truncated sum, which display a decaying behavior. This decaying nature of the summation terms makes the summation itself converge and, thus, allows it to be approximated with sufficient accuracy.
There is only one available method for calculating , the number of terms required in the truncated “large-time” summation. navarro2009fast provide this method, and given the desired precision, , we define:
where is the ceiling function; that is, the argument is rounded up to the nearest integer that is greater than or equal to the argument. Using this calculation, it is straightforward to evaluate the rest of the density functions from Equation (10). We will label this approximation method as .
3.2 Small Time
The approximation of the “small-time” density function is also a truncation of the infinite summation contained within it, as this infinite sum is also technically incalculable because the analytic solution is still unknown. Since it can be shown that this infinite sum converges (see Appendix A.3.2), the most straightforward way to truncate this infinite sum is to include only the first terms. However, as there are two summation styles for the “small-time” density functions, we must define how to count the first terms for each summation style.
The style summation extends to infinity in both the positive and negative directions, so we consider the first terms in both directions. More precisely, we include the terms whose indices are no greater than in absolute value (i.e., the indices ). Note that indicates the floor function; that is, the argument is rounded down to the nearest integer that is less than or equal to the argument. In the formulation provided by gondan2014even, can only be odd so that the total number of terms is always exactly . In the case of navarro2009fast, can also be even, resulting in terms for the style truncated summation. Substituting for in Equations (5) and (6) yields the truncated version of the “small-time” density functions with the summation style:
In contrast, the style “small-time” summation only extends to infinity in the positive direction. Thus, it is straightforward to count the first terms in the summation (i.e., the indices ). First, we use the summation style from Equation (9) in place of the summation style that is currently used in Equations (5) and (6). Then we can substitute for in these equations to yield the truncated version of the “small-time” density functions with the summation style:
Figure 2 shows an example of the terms in the “small-time” truncated sum using the summation style. Here, the terms appear to only show a general decay as the index gets further from zero. This decaying behavior is sufficient to demonstrate that the summation itself converges and, thus, we can approximate it with the given accuracy. However, rearranging the terms into the style (i.e., order the indices as) as shown in Figure 3 reveals a distinct oscillatory behavior that allows for an even simpler approximation while maintaining the desired precision. This simpler approximation method is described in Section 3.2.3.
In addition to the two equivalent styles of summation from Equation (9), there are three different methods to truncate the sum. Because of the multiple methods we will separate them into three further subsections, one per truncation method, that will each contain the two equivalent summation styles. Ultimately, we have six distinct approximation methods for the “small-time” density function: , , , , , and .
First, navarro2009fast provided a method for calculating the required number of terms in the truncated “small-time” summation. Given a desired precision , they define
where is the ceiling function, as mentioned previously. From this calculation, the evaluation of the density function approximations can proceed with either summation style or , Equations (12) and (13).
After navarro2009fast published their paper, gondan2014even introduced another method for calculating the required number of terms in the truncated “small-time” summation. It is important to note, however, that gondan2014even provided the number of required pairs of terms in the summation style, and not the number of required individual terms. As we want the number of individual terms, we adapt their formula in Equation (15). Given a desired precision , we define
3.2.3 SWSE (Stop When Small Enough)
Since both of the “small-time” summation styles can be treated as convergent absolutely decreasing series, we can use our SWSE method of truncating the sum. This approximation method was mentioned by gondan2014even, but they opted to only formally present their novel calculation of for the “small-time” truncated summation. The general idea of the SWSE approximation method is to take advantage of the alternating and decreasing nature of the terms in the infinite sum to bound the approximation’s error. After arranging the terms as shown by gondan2014even, we can apply the alternating series test to place an upper bound on the truncation error after including a certain number of terms in the summation. This upper bound is the absolute value of the next term in the sequence; thus we can truncate the infinite sum once one of its terms is absolutely less than the desired precision, . The validity of this approximation method is proven in Appendix A.3.2. As with the two other “small-time” approximation methods, either style of summation ( or ) can be used which results in two more approximation methods for the “small-time” density functions.
3.3 Combining Small and Large Time
navarro2009fast suggest a strategy that uses a combination of the “large-time” and “small-time” approximations, based on whichever approximation is more efficient. Unfortunately, the mechanism that they provide is a quasi-nondeterministic function so we slightly change its practical implementation, as described in Section 4.1. Nonetheless, the concept is logical, and we provide six possible “combined-time” approximation methods based on their mechanism as there are six “small-time” approximation methods and one “large-time” approximation method. These approximation methods largely depend on the precalculation and subsequent comparison of and , the required number of terms in the truncated versions of the “large-time” and “small-time” summations, respectively. After precalculation, either the “large-time” or “small-time“ approximation is used, based on whichever approximation requires fewer terms. The exceptions to this original mechanism are the two “combined-time” approximation methods that leverage the SWSE “small-time” approximation method, as these only require the precalculation of and not .
First, we can use the method for calculating as provided by navarro2009fast as the representation of efficiency for the “small-time” approximation. By precalculating both and , we can then determine which timescale is more efficient. Given a desired precision , they define
Although this approximation method has not been explored in the literature, we can implement the more recent method of calculating from gondan2014even in the same manner to that of navarro2009fast. Precalculating and comparing and allows us to determine which timescale is more efficient. Given a desired precision , we define
Similarly to the previous approximation methods, the density function approximation can proceed with either summation style or , Equations (12) and (13), yielding two more “combined-time” approximation methods.
3.3.3 SWSE (Stop When Small Enough)
The previous “combined-time” approximation methods have both relied on precalculating the required number of terms in both the “large-time” and “small-time” versions of the truncated sum, and , respectively. This raises an issue with our SWSE approximation method because it does not precalculate the required number of terms in the “small-time” truncated sum, . However, we can still use it with the “large-time” approximation method of navarro2009fast by implementing a heuristic switching mechanism. This mechanism will still precalculate , but rather than comparing with a precalculation of , it instead compares to a user-defined value, .777argument max_terms_large in function dfddm() If , then the navarro2009fast “large-time” approximation method will be used; if , then the SWSE “small-time” approximation method will be used. As with the other “combined-time” approximation methods, we can use both “small-time” summation styles ( or ) to yield an additional two “combined-time” approximation methods.
In addition to introducing a novel approximation method for the DDM density function, we wish to determine the fastest such approximation method. While there have been several implementations of these approximation methods, they have largely been in different programming languages voss2008fast, navarro2009fast, gondan2014even, rtdists, RWiener. In order to investigate the relative speeds of the proposed approximation methods, we translated them into C++ so that their efficiencies could be directly compared using the dfddm() function from the fddm software.
Each approximation method from the literature was provided with an implementation in a programming language, namely R gondan2014even and MATLAB navarro2009fast. Even though we wanted to use the implementations in R, we translated them to C++ since it is known to be a comparatively faster programming language. To maintain the user interface via R, we accessed the implementations using the R package Rcpp Rcpp1, Rcpp2, Rcpp3. Since we were able to exploit the minimal overhead of function calls in C++, we wrote the code using a modular structure. This modular construction has three main advantages: first, it allow the user to select exactly which approximation method to use via one flexible interface, function dfddm(); second, it makes the code more interpretable by other researchers; and third, the code is easily adaptable to future additions.
For example, this modularity in the code makes it simple to switch between the two summation styles ( and , as the user only needs to change one function parameter that indicates which summation sub-function should be used in the approximation. Not only does this modularity make for simpler user commands to compare different approximation methods, but it also allows for the straightforward inclusion of future summation styles to the software.
4.1 Obtaining for Multiple Observations
Our only major deviation from the original code implementations was the way in which we calculated
, the number of terms required in the truncated sum. Every implementation assumes an input of binary choices and the associated response times as a vector of tuples, so let that vector have length. Denote the output of the function that determines the number of terms required for the infinite sum by for the response time in the input vector. In the existing implementations navarro2009fast, gondan2014even, every would be calculated and the greatest value would be used for every response time input, that is . However this creates a function such that the output for a particular input can be different depending on the other inputs.
Consider a case where only one response time is input and it requires terms in the density function approximation. Now include an additional response time that requires terms in the approximation. In the original code implementation, the density function would use terms for both inputs in the density function approximation. Since the additional terms are necessarily nonzero (although may be very small in absolute value), this means that the approximation for the first response time will be necessarily different depending on whether or not the second response time input is included in the call to the density function. Using more terms than required in the truncated sum only makes the the approximation more accurate; however, we believe that this quasi-nondeterministic behavior is potentially detrimental to the reproducibility of results.
To avoid this undesirable behavior we simply do not take the maximum of the values and instead use each for its respective response time. Using this method ensures that the approximation is not affected by other extraneous inputs. In the example provided, the density function would use terms for the first input and terms for the second input. There is no increase in computational complexity by using this individual method instead of the maximum method because both methods only calculate each once.
5 Benchmark Testing
As one of our goals is to determine the fastest implementation to accurately approximate the density function, we performed benchmark tests to record the relative speed of each implementation. We compared these benchmark times across the thirteen implementations discussed in Section 3 in addition to including three R
functions from the literature that are commonly used for calculating the same PDF gondan2014even, RWiener, rtdists. To determine which implementations are the most computationally efficient, we test them in two environments. First, we collect benchmark data on the speeds of the implementations across a predefined parameter space. Second, we collect benchmark data on the speeds of the implementations when using them for fitting real-world data using the maximum likelihood method and a standard gradient-descent optimization algorithm; this test is perhaps of more practical value than the first test as the DDM is widely used in parameter estimation.
Since all implementations we are testing are in R, we use the microbenchmark function from the microbenchmark R package microbenchmark to measure their speeds. This function is a common way to collect data on the runtime of functions that can be called from within R since it not only performs warm up trials for each implementation but also mitigates the impact of computational noise (e.g., housekeeping of the operating system) on the results by repeating each function call several times in a randomized order.
Before we benchmark the implementations, we must first decide how to choose the default value for our switching mechanism. In line with how we are benchmarking the implementations themselves, we run benchmark tests on the “combined-time” implementation using different values of in the two environments discussed earlier in this section.
The remainder of this section is split into three subsections. First, we will go over the methods for running the different styles of benchmark tests. Second, we will discuss how we chose the default value of . Third, we present the benchmark results for the implementations by themselves and also in a data fitting scenario. An extended set of benchmark analyses is provided in the Supplemental Materials document on GitHub.888The Supplemental Materials document can be found in the folder paper_analysis of: https://github.com/rtdists/fddm
The methods that we discuss here are applicable to both of the following subsections; the major difference between the benchmark tests in the two subsections is the implementations fed into the benchmark tests. This subsection is broken down into two further subsections: the first part explains the method for benchmarking the implementations on a predefined parameter space, and the second part explains the method for benchmarking the implementations in the data fitting scenario.
5.1.1 Predefined Parameter Space
The first type of benchmark testing simply evaluates a selection of implementations in a predefined parameter space in identical computing environments. The idea behind this type of benchmark testing is to measure the relative speeds of the implementations themselves as accurately as possible, without any distractors such as a data fitting routine or a different programming language. Implementations are selected not only based on what we want to benchmark but also the available options.
The key input to this first type of benchmark testing is the parameter space. We use two different parameter spaces in various parts of our benchmark testing, and these can be found fully defined in Table 1 and Table 2. Table 2 contains response times that range from seconds to seconds. As the DDM is meant to be used for decisions on the timescale of about seconds or less, this parameter space is likely the more practical of the two. However, we also consider an expanded range of response times in Table 1, because the DDM can be applied to situations where larger response times can occasionally occur. To ensure that we cover these use cases, the expanded set of response times in Table 1 ranges from seconds to seconds.
An important distinction in the benchmarking methods is how to input the response times from the parameter space into the implementation. As each implementation will accept one or more response times, we have the choice to either input all the response times as a vector or input the response times individually. The most practical option is to input all of the response times as a vector since this is likely how a researcher would input the data and also how it would be done in a data fitting scenario. In addition, inputting all of the response times as a vector allows the implementations to handle vectorization and avoids the computationally expensive repeated call of the PDF functions through R (i.e., the overhead of calling a function in R compared to, say C++, is considerable). By inputting the response times individually, however, we are able to get a more granular parameter space in the output; this enables us to see how varying the response time can impact the benchmark data.
For our benchmark tests, we visit each point in the predefined parameter space and record the execution times for each implementation 10,000 times inside of the microbenchmark function. Of these 10,000 benchmark data, we only record the median in an effort to eliminate the influence of computational noise on the results. For expositional purposes, we will simply use the term “benchmark times” instead of “median benchmark times” when discussing the results of the benchmark tests.
The methods used in the following subsections will follow this template, but we will elucidate the selection of implementations, the selection of parameter space, and whether or not the response times from the parameter space were input as a vector.
5.1.2 Data Fitting
The second type of benchmark testing measures the speed of optimization routines that use a selection of implementations in the underlying likelihood functions. This type of benchmark testing is perhaps more practical than the first type as the DDM is commonly used for parameter estimation on a set of experimental results. Again, implementations are selected based on what we want to benchmark and the available options. For example, the implementation in RWiener and the implementation provided by gondan2014even do not include inter-trial variability in the drift rate; thus we cannot include them in benchmark testing that pertains to .
The optimization routine that we use is the R function nlminb that uses the maximum likelihood method and a standard gradient-descent optimization algorithm. Since nlminb minimizes the likelihood function instead of maximizing it, we define a family of likelihood functions that return the negative sum of the log-likelihoods of the data. In particular, each likelihood function in this family uses a different implementation to evaluate the log-likelihoods of the data. This way we can use the same optimization process for each implementation, and the only difference in benchmark data arises from the underlying implementations that approximate the DDM density function.
As encountering local optima is a known problem for deterministic optimization algorithms, we run nlminb with eleven different sets of initial values. These sets of initial values are consistent throughout all of our analyses, regardless of which underlying density function approximation is being used in the optimization. The full set of initial values can be found in the supplemental materials, and there are two restrictions to these initial values that we must address. First, the parameter must be less than the response time, so we set the initial values for to be strictly less than the minimum response time for to each individual in the input data frame. Second, we must place a lower bound on that is necessarily greater than zero because the optimization algorithm occasionally evaluates the log-likelihood functions (and thus the underlying density function approximations) using values of equal to its bounds. In the case where optimization algorithm evaluates using , both the density function from the rtdists package and every implementation from dfddm() do not evaluate. In common use this is not an issue because very small values of do not make any sense with regard to the psychological interpretation of the parameter, but this issue can arise in an exploratory optimization environment.
Since nlminb is a deterministic optimization algorithm, we record various results from its optimization process: the number of calls to the likelihood function, the convergence code (i.e., either 0 for success or 1 for failure), and the value of the minimized likelihood function (known as the “objective”). We also place upper and lower bounds on the fitted parameter values to be consistent with our interpretation of the DDM. For each data set, we collected benchmark data from five runs of the data fitting routine inside of the microbenchmark function. From these five runs, we again record only the median time in an effort to eliminate the influence of computational noise; again, we will use the term “benchmark times” instead of “median benchmark times” when discussing the results of the benchmark tests.
The data used for this fitting came from trueblood2018impact, and it contained response time data for 37 individuals. Here we considered only the data from their accuracy condition which provided 200 observations per participant. We used the parameter-fitting criteria described above to fit the parameters , , , and separately for each of the 37 individuals included in the data. We estimated one for each stimulus class that maps onto each of the two response boundaries (i.e., “upper” and “lower”), so that we estimated two drift rates and in total six free parameters per participant.
5.2 Determining the Default Behavior of our Heuristic Switching Mechanism,
In Section 3.3.3, we introduced the idea of a heuristic mechanism that would switch between use of the navarro2009fast “large-time” approximation and the SWSE “small-time” approximation. The main issue with combining these two approximations is that the SWSE approximation does not precalculate the number of terms, which is the basis of previous switching mechanisms. Instead, we use a user-defined value to determine which approximation will be used given the other parameter values. More precisely, we precalculate and then compare this value to . If , then the navarro2009fast “large-time” approximation will be used; if , then the SWSE “small-time” approximation will be used.
The naturally ensuing problem is to find the optimal value of . Although the optimum value for switching depends on the other parameters input to the model, we fix for all inputs because we treat it as a simple, heuristic value that works well enough for most practical cases. The remainder of this section will detail the methods and results that we used to justify our choice of default value, .
In order to find the best default value for , we consider several candidate values. Note that it only makes sense for to be a non-negative integer because the only possible values for are non-negative integers. Based on and calculations at the dfddm() default error tolerance of , we tested the integers using both the and summation styles; this yields sixteen candidate implementations. Notice that indicates that the SWSE “small-time” approximation will always be used.
5.2.1 Predefined Parameter Space
First, we collected benchmark data on the speeds of the sixteen candidate “combined-time” SWSE implementations across a parameter space with response times ranging from to seconds. The response times were input as a vector to the implementations, and this parameter space can be found in Table 1. Whereas this range may seem quite large, we wanted to cover as many use cases as possible when choosing the default for our heuristic method.
|t||0.001, 0.1, 1, 2, 3, 4, 5, 10, 30|
|a||0.25, 0.5, 1, 2.5, 5|
|v||-5, -2, 0, 2, 5|
|w||0.2, 0.5, 0.8|
|0, 0.5, 1, 1.5|
Note. Parameter is fixed to for consistency with the other reported benchmarks.
We show the results of the benchmark tests on the predefined parameter space in Figure 4. Immediately apparent is that there is little difference between the two summation styles, and in these benchmark data, so we will not distinguish between the two in our further analysis of the optimal value for . We also see that is a problematic value because it is essentially just a slower version of the SWSE “small-time” implementation, since it does not use the navarro2009fast “large-time” approximation yet still calculates . Consistent with the results reported below in Section 5.3.1, this variant has a long tail because the SWSE “small-time” implementation is slow for large effective response times.
It appears that relying more on the navarro2009fast “large-time” approximation (i.e., higher value of ) increases the length of the tail in the benchmark results. This is likely because the switching mechanism uses the “large-time” approximation when the “small-time” approximation is more efficient. The value appears to be the fastest using these benchmark data. Setting means that we almost exclusively use the SWSE “small-time” approximation, except when the navarro2009fast “large-time” approximation is exceptionally efficient.
5.2.2 Data Fitting
Second, we used the sixteen candidate “combined-time” SWSE implementations in a simple data fitting scenario. This analysis will not only help to determine the practical performance of each candidate value of , but it will also highlight any potential issues with certain candidate values.
We show the results of the benchmark tests on this data fitting scenario in Figure 5. Similarly to Figure 4 above, there is little difference between the and summation styles. Moreover, we again see the shortcomings of (i.e., just the SWSE “small-time” implementation) because it struggles with large response times. Specifically, we see that in many cases the optimization algorithm does not return the maximum-likelihood estimate and/or reports optimization failures. Since large response times often arise in a typical optimization routine, we should certainly pick a default value to avoid the shortcomings of the SWSE “small-time” implementation. We notice that the benchmark times stay relatively consistent across the other values of , but the tails of the distribution of benchmark times generally elongate with a higher value of . When considering the stability of the implementations for , only are slightly less stable than the other values. Taken together, the two benchmarks results show that is both the fastest option as well as the most stable option. Consequently, we use as the default value going forward.
5.3 Benchmarking all Implementations
Now that we have established the default behavior of our “combined-time” SWSE implementation, we can proceed to compare it against other implementations. In order to determine the most efficient and stable implementation, we will begin by performing benchmark tests on predefined parameter spaces. Following that, we will again use a simple data fitting scenario to test the speeds of the implementations in a practical setting.
5.3.1 Predefined Parameter Spaces
First, we collected benchmark data on the speeds of all of the implementations discussed in this paper. We measured their speeds across a practical parameter space with response times ranging from to seconds. This parameter space is perhaps more practical than the other one that has a larger range of response times because the DDM is meant to be applied to situations where the response times are two seconds or faster. The response times from this parameter space were input as a vector to the implementations, and the parameter space can be found in Table 2.
|t||0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0|
|a||0.5, 1, 1.5, 2, 2.5, 3, 3.5|
|v||-5, -2, 0, 2, 5|
|w||0.3, 0.4, 0.5, 0.6, 0.7|
|0, 1, 2, 3.5|
Note. The parameter is fixed to because this parameter must be greater than zero for the density function from the RWiener package.
Figure 6 shows the distribution of the benchmark times for each implementation. Immediately we notice that the three existing implementations from the literature are significantly slower than the impementations in dfddm(). Of the implementations in dfddm(), there are the three timescales to discuss. First, there is only one “large-time” implementation (), and that appears to be quite slow. Second, there are three “small-time” implementations (), and our implementation of the SWSE approximation outperforms the others. Third, there are three “combined-time” implementations (), and again the combination of the SWSE approximation with the “large-time” approximation provided by navarro2009fast outperforms the others. Overall, the “small-time” implementations appear faster than the “combined-time” ones because all of the response times input were under two seconds. Further analysis of this feature will be discussed imminently.
One surprising result in Figure 6 is that our implementation of the “small-time” approximation provided by navarro2009fast is slightly faster than our implementation of the more recent “small-time” approximation proposed by gondan2014even. This difference in speed is likely because the number of terms required by the navarro2009fast approximation is sometimes smaller than that of the gondan2014even approximation. As shown in Figure 7, uses fewer terms overall; however, typically uses fewer terms for smaller response times, which likely explains the minimal difference across the implementations of the “combined-time” approximations that use these two approximation methods. We also considered that the calculation of could be faster than that of (compare Equation (15) with Equation (14)). However, we got contradictory benchmark data for the execution times of these two calculations from different computers. Please see the Supplementary Materials document on GitHub for more information about this topic.
To investigate why the “small-time” implementations appear to outperform the “combined-time” implementations, we perform another round of benchmark testing. For these tests, we will input the response times individually across the larger parameter space, found in Table 1; this will allow us to see the speed of each implementation as the response time varies across the parameter space. For this set of benchmark results only, we will not consider the response time by itself, but rather we will use the effective response time, . Since the parameters and always exist within the infinite sums as a quotient (regardless of timescale), we consider the quantity as a proxy for the response time.
As we do not wish to clutter the results, we consider only six implementations for this testing that are representative of the currently available implementations. We include all three of the currently available implementations from the literature: the implementations from the rtdists and RWiener packages, as well as the implementation provided by gondan2014even that has been included in dfddm(). In addition to the currently available implementations, we also include the fastest implementation of each timescale in dfddm(): our implementation of the “large-time” approximation given by navarro2009fast, the SWSE “small-time” implementation, and our “combined-time” implementation that uses both the SWSE “small-time” approximation and the “large-time” approximation provided by navarro2009fast.
Figure 8 shows that the preconceived notion of the two timescales is accurate. The “small-time” implementations perform much better for smaller effective response times and struggle when handling larger effective response times. For the “large-time” implementation, the opposite is true; it struggles greatly for smaller effective response times yet is very efficient for larger effective response times. We plotted the benchmark results in the same style for varying the other DDM parameters (, , and ), but those results were uninteresting and are included in the Supplemental Materials document on GitHub for completeness.
5.3.2 Data Fitting
Second, we test the suitable implementations in a simple data fitting routine to estimate the commonly used parameters , , , , and . While we can use every implementation in dfddm(), we cannot use the implementation from the RWiener R package nor the R implementation from gondan2014even because neither of these two implementations includes in their estimation of the DDM density function.999While we could use the conversion multiplier to convert from the constant drift rate density function to the variable drift rate density function, this often leads to inaccurate density estimates. See the Validity Vignette in the fddm CRAN repository at https://cran.r-project.org/package=fddm/vignettes/validity.html#den-ke for more information. Thus we test all of the implementations in dfddm() and the implementation in rtdists.
Figure 9 shows the benchmark times and notable convergence issues in the data fitting. Similarly to the results shown in Figure 6, the implementations in dfddm() are faster than the currently available implementation in rtdists. However, the magnitude of the difference is a lot smaller; whereas in Figure 6 the benchmark time for the rtdists implementation is about five times that of the available options in dfddm(), the benchmark time for the rtdists implementation is only about twice that of the available options in dfddm().
Of the implementations in dfddm(), there are again three timescales to consider. First, the “large-time” implementation alone appears quite erratic by showing many unsuccessful optimization attempts. Second, the SWSE “small-time” implementation appears to be slightly faster than that of gondan2014even or navarro2009fast, but it is comparatively more problematic with many unsuccessful optimization attempts. Third, the combination of the SWSE “small-time” approximation and the “large-time” approximation from navarro2009fast is the fastest and most stable implementation for data fitting, outperforming the other implementations both in terms of benchmark time and successful optimizations.
In addition to the raw computation speed of each implementation, we also recorded the number of function evaluations that the optimization process had to make in order to find the optimum parameter estimates. In general, fewer function calls suggest a more simply shaped parameter space, making it easier for the optimization routine to quickly and accurately determine the optimum parameter estimates. In Figure 10, we see that most of the implementations used required approximately the same number of function calls. The SWSE “small-time” implementation and our implementation of the “large-time” approximation given by navarro2009fast all showed issues in benchmark time as well as function evaluations; however, these two implementations are the most effective when combined.
We performed benchmark tests to determine the fastest approximation method for the density function of the DDM. In addition to directly comparing the existing approximation methods for speed, we also implemented a novel approximation method and compared it to the existing ones. There are three main results to discuss.
First, we implemented two different styles of summation for the “small-time” algorithms, and . There is not much to distinguish these two summation styles in the benchmark results; however, we note that the style appears to marginally outperform the style for most implementations. Whereas the relative performance of these two “small-time” summation styles may vary slightly depending on the implementation and computer used for the calculation, the style uses at most as many terms in the truncated summation as the style (see Section 3.2 for details). For these reasons, the style is the recommended and default “small-time” summation style for the fddm package.
Second, a “combined-time” approximation is better to use in a general setting than either a pure “large-time” or a pure “small-time” approximation. It handles both the small and large effective response times that arise during a standard optimization routine, and this is useful for two reasons. First, it eliminates the inefficiencies of a pure “small-time” approximation when encountering large effective response times, and vice-versa for a pure “large-time” approximation. Combining these two approximations yields an overall more efficient algorithm with benchmark results tightly clustered around the median. Second, it eliminates the inaccuracies and convergence issues that arise in a data fitting scenario from evaluating the PDF for small or large effective response times using a pure “large-time” or “small-time” approximation, respectively. As data fitting is a common use for evaluating the DDM density function, this increased stability is paramount to any implementation.
Lastly, our implementations of these density function approximations are noticeably faster than the current standards found in R packages (i.e., rtdists and RWiener) and unpackaged code from the literature (i.e., from gondan2014even). In particular, our implementation of our novel “combined-time” approximation method is the fastest currently available way to reliably approximate the DDM density functions. Most of the increase in speed for this implementation comes from the use of the faster C++
programming language. However, the SWSE “small-time” approximation method further increased the speed, and using the “large-time” approximation method from navarro2009fast removed the outliers in the tail of the benchmark results.
Although our implementation of our novel approximation method is faster than the currently available implementations in terms of execution time for a set of fixed, selected response times, the benchmark times across the implementations are more similar for data fitting (Figure 9 compared to Figure 6). This decreased difference in the benchmark results is due to the overhead of running an optimization routine in R. Despite its impressive vectorization, R has a lot of overhead when calling functions. Since an optimization routine will make many function calls (e.g., to the likelihood function), the overhead of these function calls makes up a significant part of the overall benchmark time for each approximation. If speed is of the utmost importance, we recommend avoiding R in favor of a programming language that has less overhead in function calls (e.g., pure C++).
Overall, our results show that the implementation that combines the SWSE “small-time” approximation method with the “large-time” approximation method from navarro2009fast is both the fastest and most consistent across broad parameter spaces. If the density function is only being evaluated for small effective response times, then the implementation of the SWSE “small-time” approximation method may be the fastest option. However, since a very common use for evaluating the DDM density function is data fitting, an optimization function would likely encounter convergence issues when exploring parameter values because the pure “small-time” approximation methods, especially the SWSE method, have shown to be problematic when being evaluated at large effective response times. In sum, we suggest researchers interested in fitting the DDM use the default implementation from the fddm package for the most robust and efficient approximation to the DDM density function.
Appendix A Mathematical Proofs
This section will present the various mathematical proofs that we have referenced throughout this paper.
a.1 Derivation of the “Large-Time” Density Function with Variable Drift Rate
In Section 3 3 Approximations to the DDM Density Functions, we provided a version of the “large-time” density function that allowed the drift rate to vary across trials; we will present the steps of this derivation here. We begin by writing the “small-time” density function that allows for inter-trial variability, Equation (6) from Section 3:
Next, we manipulate the infinite sum into the form given by navarro2009fast. Note that in their original paper, navarro2009fast scale prior to including it in the sum. They use , which explains why we have additional terms compared to their formulation. This manipulation gives
Since the “large-time” and “small-time” density functions are equivalent in their constant drift rate versions, we can equate them. The multiplicative term outside the infinite sum is the same in both functions, thus their infinite sums must be equal as well. This equivalence lets us substitute the infinite sum from the expression above with the infinite sum from the “large-time” density function from navarro2009fast:
a.2 Equivalence of “Small-Time” Summation Styles
Let the style terms be denoted as . Similarly, let the style terms be denoted as . Instead of the usual ordering, consider the style terms as the series ; we denote this ordering by . Our goal is to show that this reordering of the style terms produces exactly the same series as the natural ordering of the style terms.
We need to show that the sets and contain precisely the same elements in the same order. We also need to show that the sets and also contain precisely the same elements in the same order, but we will handle this later.
First, consider even values of in . Notice that in this case. Since is even, we can write , for . Thus we have , and the value of each style term with even has precisely one corresponding term of equal value in the style with nonnegative index . As the mapping is order-preserving (i.e., monotone), the order-preserving identity map completes the bijection from the style terms with even index to the style terms with nonnegative index .
Next, consider odd values of in . Notice that in this case. Since is odd, is even; then we can write , for . Thus we have , and the value of each style term with odd has precisely one corresponding term of equal value in the style with negative index . As the mapping (or equivalently, ) is order-preserving, the order-preserving identity map completes the bijection from the style terms with odd index to the style terms with negative index .
Now we will prove the equality of the squared terms in the exponentials. Notice that
Since we have just shown that the sets and are equivalent, we can index both styles of terms with the set of natural numbers (e.g., use the style indexing). Then it follows that the sets and are also equivalent because (1) multiplying by a constant (i.e., ), (2) the square function, and (3) the exponential function are all order-preserving bijections for all nonnegative real numbers.
Since both styles of summation produce precisely the same terms, they are equivalent given the ordering defined above (i.e., ).
a.3 Summations as Convergent Decreasing Series
This section will prove that both the “large-time” and “small-time” infinite summations can be treated as convergent decreasing series.
In Section 3.1, we mentioned that the infinite sum in the “large-time” density functions, Equations 4 and 7, converges. Note that the infinite sum is identical in both the constant drift rate and variable drift rate variants of the density function, so this argument applies to both Equation 4 and Equation 7.
The sum in question is
We will prove that the sum is absolutely convergent and, thus, convergent. Upon taking the absolute value inside the sum, we note that and the exponential are unaffected as they are strictly positive anyway. Since , we can eliminate the term and write the absolute sum:
We will proceed by using the integral test for convergence. Let , a nonnegative function. Note that , so is monotonically decreasing for . For practical purposes101010As shown in this paper, the “large-time” approximation performs poorly for small response times. When combined with the SWSE “small-time” approximation, the usual transition point from using the “small-time” to the “large-time” approximation occurs on the scale of seconds., we would expect and ; substituting these values gives that monotonically decreases for , which is satisfactory for our needs.
Thus, we can proceed with the integral test:
Using the substitution (and so and ), we have
which is finite. Thus, the absolute sum converges and so does the original sum.
In Section 3, we reintroduced a method for truncating the infinite sum from Equation (5); the idea was essentially to stop adding terms to the truncated sum once the terms were smaller than the allowed error tolerance, . Note that for practical purposes of evaluating the infinite sum, we rescale the given error tolerance by the reciprocal of the multiplicative term in front of the infinite sum: . The legitimacy of this algorithm hinges on the alternating series test111111The alternating series test is also known as Leibniz’s rule, Leibniz’s test, or the Leibniz criterion., which places an upper bound on the truncation error of this sum. Thus to prove the validity of this method, we must show that the terms in the summation (1) alternate, (2) monotonically decrease in absolute value, and (3) approach zero in the limit. Since we just demonstrated that the two summation styles ( and ) are equivalent, we will only present the full argument for the style summation as defined below:
Let . Consider the terms of the sum in the order .
First we show that the sign of consecutive terms alternates. Notice that the sign of each term only depends on , and recall that . The term is positive, and then the rest of the terms alternate signs because . It remains to show that decreases monotonically and approaches zero in the limit as .
Next we show that the absolute value of the series decreases monotonically after a certain number of terms. For large values of , the exponential term will dominate the linear term. However, it is important to note that for some values of , , and the value of the terms in the summation may actually increase in absolute value before decreasing. To find this critical point we define as the smallest index in the series such that all terms after the are monotonically decreasing in absolute value according to the ordering from the previous paragraph. Then we have
where both of the infinite sums on the right side of the equation (lines two and three) will combine to satisfy the conditions of the alternating series test. It remains to calculate the value of and provide conditions for its existence.
We treat the individual terms of the summation as a function of , then take the derivative of this function with respect to in order to determine the index at which the terms shift from increasing to decreasing. Define