 # Inverse Sampling of Degenerate Datasets from a Linear Regression Line

When linear regression generates a relationship between a (dependent) scalar response and one or multiple independent variables, various datasets providing distinct graphical trends can develop resembling relationships based on the same statistical properties. Advanced statistical approaches, such as neural networks and machine learning methods, are of great necessity to process, characterize, and analyze these degenerate datasets. On the other hand, the accurate creation of purposedly degenerate datasets is essential to test new models in the research and education of applied statistics. In this light, the present study characterizes the famous Anscombe datasets and provides a general algorithm for creating multiple paired datasets of identical statistical properties.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Originally termed the least-squares fitting, the linear regression method is one of the most widely used analysis tools to primarily investigate trends among variables in various disciplines. Legendre and Gauss initially formulated the regression method from the late 18th to early 19th centuries to understand observed datasets of astronomical phenomena. The modern statistical characteristics of the regression were initially established by Galton’s work that described biological phenomena [1, 2, 3], followed by Yule ’s and Pearson ’s early mathematical formulation. When a linear relationship of a paired dataset provides two fitting coefficients, i.e., the intercept and the slope, the goodness of the regression is often evaluated by the coefficient of determination, denoted as . Although these three outputs provide a good understanding of how the independent variable

is quantitatively correlated to the response variable

, the linear regression’s inherent problem resides in its statistical degeneracy, such that multiple datasets can have indistinguishable statistical properties.

A quartet of visually distinct graphs, having identical regression statistics, were investigated by Anscombe 

, who emphasized the equal significances of graphical visualization and quantitative statistics

[7, 8]. The noticeable heterogeneity of his work’s graph patterns conversely emphasizes the significance of the data degeneracy . Nevertheless, his data generation method was only partially studied , and to the best of our knowledge, the full mechanism is still unknown [11, 10], even if each dataset has only 11 pairs.

In principle, simple linear regression between two variables can be easily extended to multiple and non-linear regressions, which include several variables and their power-wise products, respectively. Regardless of the regression type, a regression method uses a single matrix to relate the input(s) and output(s), and the matrix elements consist of, in general, various products of input variables. To investigate relationships between highly correlated data, multiple matrices can be inserted between the input and output layers, and their elements can be calculated using various non-linear functions . Neural networks and machine learning [13, 14, 15] are some advanced methods within a category of data exploration .

Once a relationship is made, as either an empirical equation or a matrix form, the range of input variables often limits the applicability of the regression, leaving infinite degrees of degeneracy. There can possibly be many combinations of input variables that provide the same output results. For both preliminary tests of any new, advanced regression algorithm, it is necessary to have a data generator that can create manyfold datasets, satisfying the same statistical constraints. In this light, this work revisits linear regression fundamentals, analyzes Anscombe’s quartet data, and provides a possible algorithm to inversely create degenerate datasets of distinct values with predetermined statistical parameters .

## Ii Linear Regression Theory

We consider a linear model, such as

 y=β0+β1x+ϵ (1)

where and

are vectors of

(observed) elements for the independent and response (dependent) variables, respectively;

is a vector of randomly distributed errors of zero mean and finite variance, and

and are regression or fitting parameters, so called the -intercept and slope, respectively. Here, we define the regression function, such as

 Y=β0+β1x (2)

that most closely fits the paired data of of size . Here, statistically meaningful properties include the mean and variance of , i.e., and , respectively; those of , i.e., and , respectively; and the parameter for the

paired points. The goodness of the regression is estimated using the coefficient of determination, denoted as

, defined as

 R2=∑k(Yk−¯y)2∑k(yk−¯y)2=β21SxxSyy=β21σ2xσ2y (3)

and

 β1=SxySxx=σxyσ2x (4)

where is a covariance between and ; and are sums of squares of residuals, i.e., and , respectively; and is a sum of residual products, i.e., . The magnitude and the sign of are given as those of and , respectively. Given a paired dataset, the linear regression process indicates the calculation of and values that minimize the error , and is often straightforward, using various spreadsheet programs or numerical/statistical packages, such as Microsoft Excel, Google Sheets, MATLAB/Octave, python, and R-language. In applied statistics disciplines, it is also important to generate manyfold datasets that accurately satisfy the predetermined statistical properties for various testing and training purposes.

### Revisit to Anscombe’s Quartet

Anscombe’s original quartet, i.e., datasets I–IV, listed in Table 1, is visualized in Fig. 1, representing graphically distinct patterns of ’s with respect to . A brief analysis of the quartet is as follows. Fig. 1(a) shows an apparently linear trend of dataset I, typical in studies of various disciplines. Fig. 1(b) (of circular symbols) shows a parabolic, concave-down trend of , having its peak at . Most points in dataset I and II are closely located near the linear trend line. On the other hand, Fig. 1

(c) has a noticeable outlier above a linear line that passes through the vicinity of the rest of the 10 points. Fig.

1(d) has a bimodal distribution of the 11 data points, i.e., a group of 10 points at one -coordinate and one outlier away from the group. Interestingly, the four datasets of the distinct patterns contain identical statistical properties, summarized in Table 2. In each dataset, the sample size is equally ; the mean and variance of are and , respectively; and those of are and , respectively. (In Anscombe’s original work, sums of and are reported, instead of variances, as 111.0 and 41.25, respectively.) The regression statistics provide the same values of , , and , with acceptable errors. To the best of our knowledge, how Anscombe generated the data quartet has not been well explained in the literature. Because our goal is to create multiple degenerate datasets of the same statistical properties, we here investigate the characteristics of Anscombe’s quartet data in detail.

For a better understanding, we first sorted the datasets in Table 1 in an ascending order of and made Table 3. Note that the sequences of data pairs do not change the statistical results of the linear regression. For example, even if the first two points of dataset I in Table 1, i.e., and , are exchanged to and , the regression statistics of , , and values remain invariant. In Table 3, it is noticed that datasets I–III have an evenly distributed from 4 to 14 with a fixed interval of 1, and the outliers () of datasets III and IV are located near the end of the regression line in . Now, we explain how the and vectors were possibly generated, keeping the statistical constraints discussed above. Figure 1: Plots of Anscombe’s four data sets with the linear regression line of Y=3.0+0.50x, with the shape functions discussed in section II.

### Constraints Applied

The 11 components of in dataset I–III can be represented as for with , so that the component is described as , and the mean of is calculated as

 ¯¯¯x=3+12(N+1) (5)

that provides for . Now, one can use a more flexible relationship between and by having an arbitrary interval , such as

 xk=xk−1+afork=1,2,⋯,N (6)

and then the two parameters of and can be determined by preset constraints of and , such as

 a = σx√6Nm (7) x0 = ¯x−am (8)

where

is an mid-point index. Here, we restrict ourselves to odd

cases for simplicity. Substitution of , (obtained from ), and into Eqs. (7) and (8) results in and , as shown in Table 3. On the other hand, dataset IV has a special set of , containing only two values, denoted as and . Because the sequential indices of do not influence any statistical analysis, the mean of is written as

 ¯x=(N−1)xa+xbN (9)

and further

 (N−1)δxa+δxb=0 (10)

where for . Anscombe used the fixed value of , which is represented below, using and , as

 Sxx=(N−1)δx2a+δx2b=(N−1)Nδx2b (11)

using Eq. (10). Finally, we obtain (for )

 (xa,xb) = (9±1,9∓10)=(−1,10)or(8,19) (12)

where the latter case of was chosen in Anscombe’s original work .

When a paired dataset is fitted on a straight line, the goodness of the linear regression is often estimated using the coefficient of determination of Eq. (3). Alternatively, the slope coefficient can be set as the last constraint, in addition to and , requiring the minimum sample size of to fully implement the six statistical constrains. In the next section, we discuss how to generate the three data points that hold the six statistical constrains.

### A minimum data set of three components

Let’s consider three consecutive values of , with the predetermined constraints of and , to have

 xk=x0+a⋅kfork=−1,0,1 (13)

where , , so that and . The following equations are obtained for of three components, such as

 δy1+δy2+δy3 =0 (14) δy21+δy22+δy23 =2σ2y (15) δx1⋅(δy1−δy3) =2β1σ2x (16)

using and . Analytic solutions of for to are obtained as

 δy2 =±2√3√σ2y−B21 (17) δy1 =−12δy2−B1 (18) δy3 =−12δy2+B1 (19)

where .

Table 4 shows two sets of solutions, denoted as and , while both satisfy all of the constraints indicated above. This degeneracy is due to the squared feature of the variance of Eq. (15). Fig. 2 shows the two sets of versus with , following the same Anscombe’s constraints, except for the sample size. Even with this smallest number of the sample size for a linear regression, two possible cases of degenerate ’s co-exist, having the identical statistical properties. A trivial case is that if , then and , so that and become identical, and so will be located on the regression line. Figure 2: Three points satisfying the given linear regression, and x and y means and variances.

### Generation of degenerate datasets with constraints

#### Satisfying three constraints using three arbitrary points

After the vector of a size of is determined with constraints of and , other three constraints should be satisfied by vector, which include finite and , alternately represented as

 N∑k=1δyk=0 (20)

and

 N∑k=1δ2yk=Syy=(N−1)σ2y (21)

respectively; and finally, defined as a ratio of a covariance between and to a variance of , i.e., , such as

 N∑k=1δxk⋅δyk=(N−1)σxy=(N−1)σ2xβ1 (22)

Because the vector is generated independently, all the constraints for an arbitrary are satisfied by the creation of the

vector, having a degree of freedom of

. In our approach, we generate an initial vector (as a function of the vector), having a specific pattern near the preset regression line of Eq. (2). Then we select the minimum, maximum, and mid-point of the vector and adjust the three values of the corresponding -components to satisfy the constraints of Eqs. (20)–(22). Assume that we already have a sorted vector, i.e., for , and have decided , except , , and , where is theoretically any index between 1 and , i.e., . For simplicity, an index of the mid-point can be used, such as

 m=N+mod\text{{\hbox{\left(N,2\right)}}}2={12Nif N=even12(N+1)if N=odd (23)

where is a remainder when is divided by 2, or simply

 m=floor[12(N+1)] (24)

which is to round off , especially for odd . The above three equations can be rewritten as

 δy1+δyN = −∑′kδyk (25) δy21+δy2N = −∑′kδ2yk+Syy (26) δx1δy1+δxNδyN = −∑′kδxkδyk+Sxxβ1 (27)

where is defined as a summation over , except the first and last indices. Combining Eqs. (25) and (27), we represent and as linear functions of , such as

 δy1 =a1+b1δym (28) δyN =aN+bNδym (29)

where

 a1 = β1(N−1)σ2x+∑′k,k≠m(δxn−δxk)⋅δykδx1−δxN (30) aN = a1 (31) b1 = δxN−δxmδx1−δxN (32) bN = δx1−δxmδxN−δx1 (33)

and substitute Eqs. (28) and (29) into (26) to derive for , such as

 δym=−B±√s′yy+B2−C2 (34)

where

 s′yy = [(N−1)σ2y−∑′kδ2yk]/(1+b21+b2N) (35) B = (a1b1+aNbN)/(1+b21+b2N) (36) C2 = (a21+a2N)/(1+b21+b2N) (37)

In this case, two sets of , and hence , are generated, depending on the sign of the square-root term in Eq. (34). Furthermore, there are no mandatory conditions that the first and last points should be included to meet the constraints. Instead, three arbitrary points within a dataset , e.g., , , and , can be selected as long as they are different, i.e., . Nevertheless, if the -positions of the three points are closely located, then large differences between their -values are expected.

#### Selection of a shape function

In previous sections, we discussed how to determine the components of the vector, especially three points, assuming that the rest points are already properly located near the given regression line of Eq. (2). The predetermination of points is to have the equal numbers of constraints (3) and unknown points (3). Because is also the degree of freedom of the pre-positioned values, the number of patterns that the elements of can make is theoretically infinite. In addition, having as one of the constraints doubles the degeneracy of the created dataset. Here, we consider a function that determines the initial distribution of points, among which , , and are updated to meet the three statistical constraints (, , and ). We name this function as a shape function and discuss how shape functions are used in Anscombe’s datasets in the following.

##### Random distribution

In Fig. 1(a) for dataset I, five points (of index 2, 5, 7, 8, and 11 in Table 3) are located very close to the regression line of : among the rest, half of them are above the regression line, and the other half are below it. The subset consisting of the closest five pairs of 2, 5, 7, 8, and 11 has a regression line of with . In this case, the shape function of dataset I is a linear line, similar to the predetermined regression line plus random biases, such as

 fI(xk)=~Y(xk)+ηk(0,s) (38)

where and

is a random vector, having normally distributed random components with zero mean and a finite variance, denoted as

. For dataset I, should consist of 11 components, selected from a population of normally distributed random numbers, with a mean of

and standard deviation of

.

In Fig. 1(b), the parabolic pattern of dataset II is best fitted using a quadratic shape function

 fII(x)=q(x)=q0+α(x−x∗)2 (39)

where is an -position at an extrema of and is a coefficient of the quadratic term. From Table 3, it is straightforward to find that . By trial and error, we found that with . In addition, the flipped (degenerate) shape function is obtained, such as

 f∗II(x)=2Y(x)−q(x)=5.74+0.126(x−7)2 (40)

and plotted using star symbols. In this case, the three constants of , , and should be simultaneously determined to satisfy the three constraints. Eqs. (20), (21), and (22) are satisfied as follows:

 α = β1(N−1)σ2x∑Nk=1(xk−x∗)3 (41) q0 = ¯y−αNN∑k=1(xk−x∗)2 (42) σ2y = α2N−1N∑k=1(xk−x∗)4−N(q0−¯y)2N−1≡σ∗2y (43)

indicating that , , and as a function of , , and with the predetermined constraint . Therefore, can be obtained by plotting

 Δσ2=σ∗2y−σ2y (44)

with respect to and graphically finding of , as shown in Fig. 3. Here, and are calculated using Eqs. (41) and (42), respectively, using visually found . Similarly, for , we obtained and . These two parameter sets of , , and are used to plot and , shown in Fig. 1(b). Figure 3: The difference between the calculated and predetermined variance of y, i.e., Δσ2(=σ∗2y−σ2y) as a function of x∗. Two values of x∗ for Δσ2=0 are found as approximately 7 and 11.

In Fig. 1(c), a subset of 10 points, excluding the outlier of , are aligned on a straight line, of which the regression line is calculated as

 f=β′0+β′1x→fIII (45)

where , , and , which can be considered as a shape function of dataset III. Compared to the given regression line of Eq. (2), has a higher intercept and a gentler slope, as compared to those of Eq. (2), as well as of dataset I. A condition can be suggested, such as , so that if the slope is stiffer than , i.e., ; then the intercept is located below , and vice versa. After locating points on or near the linear shape function of Eq. (45), one arbitrary point, such as for in dataset III, can be made as an outlier by changing the value. In theory, relocating an outlier position cannot fully satisfy the three constraints. Instead, this outlier can be included as one of the three points used for the degenerate dataset’s creation by replacing the mid-point, i.e., . Furthermore, the first and last points can also be replaced by any two distinct points, if needed.

In Fig. 1(d) of dataset IV, a group of ten points is located at a same -positions, i.e., . These ten points have a mean of 7.0 and a variance of 1.527, which increased to the preset values of and , respectively, by including the outlier of that already satisfies . At , can be modeled as normally distributed random numbers of zero mean and finite variance , such as , similar to Eq. (38), such as

 fIV(xk)={Y(x8)+ηk(0,s)if k≠113+0.5x11if k=11 (46)

Because the last point is fixed, any three points at should be (randomly) selected and updated to meet the given constraints.

#### General algorithm

If a paired dataset shows a monotonous variation of with respect to , or vice versa, then the above-mentioned algorithms can be generalized and used to create degenerate datasets of the same constrains, as follows

1. determine six statistical parameters: , , , , and and calculate ;

2. make an vector of components, having predetermined and ;

3. use a shape function to initialize -components near the given regression line, , of Eq. (2);

4. update the -values of any three points, as needed, using Eqs. (28), (29), and (34) to satisfy the three constraints of , and ;

5. confirm that the generated dataset is characterized by the six parameters listed above.

In general, when constraints are implemented, points are determined using a shape function, and the rest of the points can be determined by analytically solving the constraint equations. The availability of analytic solutions will be limited as the number of constraints increases. In this case, the problem can be described as a linear regression with constraint functions of

 gα(y†1,...,y†β,...,y†nc)=0forα,β=1−nC (47)

where is one of components of , chosen to satisfy constraints. A general root-finding algorithm can be used to numerically find .

## Iii Results and Discussions

### Inverse Sampling of Degenerate Datasets

The creation of one of degenerate paired datasets requires six constraints, such as , , , , and . For a given sample size , the mean and standard deviation of and

determine their central locations and spread degrees. If more than three constraints are considered, the degrees of degeneracy are theoretically infinite, and therefore, one can create as many as degenerate datasets as needed, disregarding their graphical similarities or dissimilarities. If a trend line is made by a linear regression of multiple datasets of the same size, then it is highly probable that the degenerate datasets created from the calculated trend line do not include the original datasets used for the linear regression, but instead can include unexpected forms of meaningful datasets.

In statistical physics, the importance sampling technique is frequently used for efficient Monte Carlo simulations [18, 19], which indicates sampling from only specific distributions that over-weigh the important region. In a microcanonical ensemble of a thermodynamic system, sampling of particles’ positions and velocities are under a constant total energy . Once particle positions are determined, a specific value of kinetic energy is calculated as the total energy subtracted by the position-dependent potential energy, such as , and particle velocities are randomly assigned and carefully adjusted to maintain the kinetic energy. There are many distinct configurations of velocities in the phase space that give the same kinetic energy value. This specific sampling is called inverse sampling, which is analogous to the present work that inversely calculates and samples datasets of specific statistical constraints.

### Paired datasets generated using shape functions

In this section, we create a few paired datasets having the same statistical properties of Anscombe’s quartet. A test shape function we employed is a fourth-order polynomial with respect to , such as

 f(x)=Y(x)+f0(x−h1)(x−h2)(x−h3)(x−h4) (48)

where , , , and are chosen slightly away from the (integer) value; and is a weight factor of the shape function, which is either 0 or . The non-zero magnitude of is selected by trial and error. If .0, then the predetermined regression line becomes the shape function itself. Linear regressions using the shape function are summarized in Fig. 4, as discussed below.

1. These shape functions of positive, negative, and zero values are made and shown in Fig. 4(a), (b), and (c), respectively. Figs. 4(a) and (b) show smooth shape functions with opposite signs, and Fig. 4(c) shows the predetermined linear regression line as the shape function so that initial points are located on the regression line. Fig. 4(d) combines all the datasets and shape functions, generated in (a)–(c), and shows the overall trend.

2. In each of Figs. 4(a)–(c), points initially on the shape function (filled circles) are randomly relocated to new neighboring positions (hollow diamonds) above or below the original positions.

3. Three vertical positions of for , , and are adjusted to two distinct groups (hollow circles and rectangles), of which both satisfy the given constraints using the algorithm discussed above.

Since there are, in principle, an infinite number of available shape functions and multiple ways to locate data points near the shape functionss, identifying data patterns seems to be challenging work, even if the trend seems to follow a noticeable shape. Nevertheless, this work provides in-depth analysis of Anscombe’s original work in terms of data creation, and a straightforward algorithm to reproduce his work, as well as to perform the inverse sampling of regressible datasets. Figure 4: Created data using the shape function of Eq. (48) with (a) f0=+√2×10−2, (b) f0=−√2×10−2, and (c) f0=0: filled circles are points on the shape function; blank diamonds are randomly deviated from the shape functions; three filled diamonds are replaced by either hollow squares or circles, adjusted to meet the statistical constraints listed in Table 2; and (d) a collection of all hollow symbols in (a)–(c), where specific patterns become unnoticeable.

### Further Considerations

While is predetermined independently, the three effective constraints provide the same number of closure equations, reducing the degrees of freedom of from to . It is worth noting that the degeneracy is originated not only by the number of elements, but also from the squared form of , i.e., the expected value of

or the second central moment. Even if we have only three points (

), two datasets are generated due to the intrinsic degenerate characteristics of the variance, as previously shown in Fig. 2. After points are decided for a paired sample of , there are still two degrees of degeneracy, due to the squared feature of variances.

In addition to the three constraints discussed above, one can include more constraints, such as, but not limited to, the least magnitude (instead of squared) of errors, linear regression with perpendicular offsets 

, Kolmogorov-Smirnov statistic, Lagrange multiplier (LM) statistic 

, standardized skewness, standardized Kurtosis, and D-statistic



. Each of these statistics can be used as an additional constraint to quantitatively identify statistical similarities. Adding or replacing some of the above-mentioned constraints to the standard constraints will create too many distinct datasets to visually compare. However, if the created datasets are tested for various indices, they can easily be classified in to several groups of similarities

.

Table 5 shows the third and fourth moments of the -scores of and , denoted as and , respectively. The moment of is defined as a mean value of , i.e., , and the third and fourth moments are called standard skewness and kurtosis, respectively. Because the ’s of datasets I–III are equally evenly distributed, their mean, variance, skewness, and kurtosis are identical, which is not observed in dataset IV. The -skewness of dataset I and II have negative values, indicating that more -values are located below . The larger magnitude of than indicates the quadratic shape function of dataset II locates more data points lower than the regression line than those of dataset I. The two largest -skewnesses of dataset III and IV are ascribed to their outliers. The -kurtosis values of the four datasets show a similar trend to those of skewnesses, and the -Kurtosis values increase from dataset I to dataset IV, also following the similar trend of absolute -skewnesses. Including higher order moments will require the same number of additional constraints to make multiple datasets statistically identical within the range of constraints applied.

## Iv Conclusion

Testing the similarities or identicalness of two datasets from distinct origins is an ubiquitously important issue in statistics, applicable to various studies. When a paired dataset is linearly regressed, the trend line indicates correlation degrees of how the response variable depends on the independent variable

, assuming that one is a cause and the other is an effect, or vice versa. In reality, it is rare to have two or more visually different datasets that provide an identical regression equation. On the other hand, a given regression equation can interpret or explain a number of datasets from various sources. Here, we recognized that a robust method to sample many degenerate datasets satisfying the given constraints is of great necessity, not only in advanced data sciences and applications, but also in applied statistics education at college and graduate levels. In this work, we presented an algorithm to sample many degenerate datasets having the identical six constraints used for a linear regression. Our method is extendable for an arbitrary number of constraints, including higher-order statistical moments, to create statistically closer datasets.

## References

•  Frank J. Massey. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253):68–78, 1951.
•  Francis Galton. Kinship and Correlation. Statistical Science, 4(2):419–431, 1989.
•  Peter J. Ireland, Andrew D. Bragg, and Lance R. Collins. The Effect of Reynolds Number on Inertial Particle Dynamics in Isotropic Turbulence. Part 2. Simulations with Gravitational Effects. Journal of Fluid Mechanics, 796:659–711, 2016.
•  G. Udny Yule. On the Theory of Correlation. Journal of the Royal Statistical Society, 60(4):812–854, 1897.
•  Karl Pearson. The Law Of Ancestral Heredity. Biometrika, 2(2):211–228, 1903.
•  F. J. Anscombe. Graphs in Statistical Analysis. The American Statistician, 27(1):17–21, 1973.
•  R. Dennis Cook and Sanford Weisberg. Graphs in Statistical Analysis: Is the Medium the Message? The American Statistician, 53(1):29–37, 1999.
•  Guillaume A. Rousselet, Cyril R. Pernet, and Rand R. Wilcox. Beyond differences in means: robust graphical methods to compare two groups in neuroscience. European Journal of Neuroscience, 46(2):1738–1748, 2017.
•  R. Dennis Cook. Detection of Influential Observation in Linear Regression. Technometrics, 19(1):15–18, 1977.
•  Lori L. Murray and John G. Wilson.

Generating data sets for teaching the importance of regression analysis.

Decision Sciences Journal of Innovative Education, 19(2):157–166, 2021.
•  Edward Schneider and Corey Dineen.

Adding a dimension to Anscombe’s quartet: Open source, 3-D data visualization: Adding a Dimension to Anscombe’s Quartet: Open Source, 3-D Data Visualization.

Proceedings of the American Society for Information Science and Technology, 50(1):1–3, 2013.
•  Reginald Smith. A mutual information approach to calculating nonlinearity. Stat, 4(1):291–303, 2015.
•  Ahmad Hosseinzadeh, Mansour Baziar, Hossein Alidadi, John L. Zhou, Ali Altaee, Ali Asghar Najafpoor, and Salman Jafarpour. Application of artificial neural network and multiple linear regression in modeling nutrient recovery in vermicompost under different conditions. Bioresource Technology, 303:122926, 2020.
•  Faezehossadat Khademi, Mahmoud Akbari, Sayed Mohammadmehdi Jamal, and Mehdi Nikoo. Multiple linear regression, artificial neural network, and fuzzy logic prediction of 28 days compressive strength of concrete. Frontiers of Structural and Civil Engineering, 11(1):90–99, 2017.
•  C. L. Lin, J. F. Wang, C. Y. Chen, C. W. Chen, and C. W. Yen. Improving the generalization performance of RBF neural networks using a linear regression technique. Expert Systems with Applications, 36(10):12049–12053, 2009.
•  Noam Shoresh and Bang Wong. Data exploration. Nature Methods, 9(1):5–5, 2012.
•  Max Halperin. On Inverse Estimation in Linear Regression. Technometrics, 12(4):727–736, 1970.
•  Michael P. Allen and Dominic J. Tildesley. Computer Simulation of Liquids, volume 1. Oxford University Press, 2017.
•  Jim C. Chen and Albert S. Kim. Monte Carlo Simulation of Colloidal Membrane Filtration: Principal Issues for Modeling. Advances in Colloid and Interface Science, 119(1):35–53, 2006.
•  Harold W. Schranz, Sture Nordholm, and Gunnar Nyman. An efficient microcanonical sampling procedure for molecular systems. The Journal of Chemical Physics, 94(2):1487–1498, 1991.
•  Jorge H. B. Sampaio. An iterative procedure for perpendicular offsets linear least squares fitting with extension to multiple linear regression. Applied Mathematics and Computation, 176(1):91–98, 2006.
•  T. S. Breusch and A. R. Pagan. A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47(5):1287, 1979.
•  Xavier X. Sala-I-Martin. I Just Ran Two Million Regressions. The American Economic Review, 87(2):178–183, 1997.