# Scientific versus statistical modelling: a unifying approach

This paper addresses two fundamental features of quantities modeled and analysed in statistical science, their dimensions (e.g. time) and measurement scales (units). Examples show that subtle issues can arise when dimensions and measurement scales are ignored. Special difficulties arise when the models involve transcendental functions. A transcendental function important in statistics is the logarithm which is used in likelihood calculations and is a singularity in the family of Box-Cox algebraic functions. Yet neither the argument of the logarithm nor its value can have units of measurement. Physical scientists have long recognized that dimension/scale difficulties can be side-stepped by nondimensionalizing the model; after all, models of natural phenomena cannot depend on the units by which they are measured, and the celebrated Buckingham Pi theorem is a consequence. The paper reviews that theorem, recognizing that the statistical invariance principle arose with similar aspirations. However, the potential relationship between the theorem and statistical invariance has not been investigated until very recently. The main result of the paper is an exploration of that link, which leads to an extension of the Pi-theorem that puts it in a stochastic framework and thus quantifies uncertainties in deterministic physical models.

• 5 publications
• 7 publications
06/02/2020

### Zone Theorem for Arrangements in three dimensions

In this note, a simple description of zone theorem in three dimensions i...
01/21/2021

### A Topological Proof of Sklar's Theorem in Arbitrary Dimensions

We prove Sklar's theorem in infinite dimensions via a topological argume...
04/24/2020

### Fiducial Symmetry in Action

Symmetry is key in classical and modern physics. A striking example is c...
07/19/2018

### On Chebotarëv's nonvanishing minors theorem and the Biró-Meshulam-Tao discrete uncertainty principle

Chebotarëv's theorem says that every minor of a discrete Fourier matrix ...
05/05/2015

### Mining Measured Information from Text

We present an approach to extract measured information from text (e.g., ...
12/10/2019

### On the Complexity of Modulo-q Arguments and the Chevalley-Warning Theorem

We study the search problem class PPA_q defined as a modulo-q analog of ...
10/11/2017

### Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

We propose Marve, a system for extracting measurement values, units, and...

## 1 Introduction

Many important discoveries in science have been expressed as deterministic models derived from scientific principles, such as the famous physics law . Interestingly, such models have been studied as an abstract class without reference to the specific applications that led to their creation. One such abstract approach is formulated in terms of the scales of measurement appearing in the model. This approach is the celebrated work of Buckingham and later, Bridgman (see Section 7).

Another abstract approach is in statistics, stemming from Karl Pearson’s establishment of mathematical statistics in the latter part of the eighteenth century (Magnello, 2009). Although statistical models, just like scientific models, had already been developed for specific contexts, Pearson recognized the benefit of studying models more abstractly, as mathematical objects devoid of demanding contextual complexities. In doing so, he recognized the need to incorporate model uncertainty expressed probabilistically, the need to define desirable model properties and to determine the conditions under which these occur, and finally, for applications, the need for practical tools to implement models that possess those properties. Pearson is credited with over publications over his lifetime, in statistics. Karl Pearson’s work paved the way for Fisher, Neyman, Egon Pearson, Wald, de Finetti, Savage, Lindley and many others to develop statistics as a scientific discipline in its own right. One consequence of this work is the formulation of the invariance principle under transformations of scales, proposed by Hunt and Stein in unpublished work (Lehmann and Romano, 2010, Chapter 6) and discussed here in Section 6.

The link between the work on scales of measurement and the statistical invariance principle does not seem to have been recognized until the work of Shen and his coinvestigators (Shen et al., 2014; Shen, 2015)

. Their work will be extended in this paper, which will critically review issues surrounding the topics of dimension and measurement scales. These issues have assumed added importance due to the large size of datasets with the ensuing reliance on machine learning methods and the growing development of artificial intelligence. Using abstract concepts to implement and analyze models in contexts where human intelligence cannot play a role leads to challenging new issues.

To begin, statisticians often write a symbol like and mean a number to be manipulated in a formal analysis in equations, models and transformations. In contrast, scientists will see the symbol as representing some specific aspect of a natural phenomenon or process to be characterized through a combination of basic principles and empirical analysis. The latter would lead to the specification of one or more “dimensions” of , e.g. length. That would then lead to the need to specify an appropriate “scale” for , e.g. categorical, ordinal, interval or ratio, depending on how the characterization is to be done. Finally, for interval and ratio scales, would have some associated units of measurement depending on the nature and resolution of the device making the measurement. How all of these parts of fit together is the subject addressed in the realms of measurement theory and dimensional analysis (DA). While much has been written in this area by nonstatisticians, surprisingly little has been written by statisticians, exceptions being found in Finney (1977) and Hand (1996). Hand considers the much broader area of measurement theory, studying what things can be measured and how numbers can be assigned to measurements. These broad considerations are beyond the scope of this paper. Our paper focusses on the concept of modelling and the importance of the functions and scales that we choose. However, we note that, due to the character of computation, in the end our scales are always discrete and our functions are always algebraic i.e. solutions of polynomial equations such as . These facts do not diminish the importance of our considerations.

This paper considers issues that arise specifically when lies on an interval scale with values on the entire real line and when lies on a ratio scale, that is, with non-negative values and with having a meaning of “nothingness”. A good example to keep in mind is the dimension of “temperature”; it can be measured on the interval scale of Celsius in units of degrees Celsius () or on the ratio scale of Kelvin in units of degrees Kelvin (), with being taken as an absolute zero, unlike . We will see why, in developing statistical models, dimensions, scales and units cannot be ignored. In fact, Sections 2, 3, and 4 provide examples of results that range from meaningless (e.g. in the calculation of the maximum likelihood) to incoherent in least squares modelling. These examples are discussed within a review of basic concepts in dimensional analysis, scales and units. In particular we explore the subjects of quantity calculus and the role of units in dimensional homogeneity.

We next review the basic elements of deterministic scientific modelling and their relevance to statistical modelling (see Section 5). In the review, we see the celebrated contributions of the engineering scientist Edgar Buckingham (Buckingham, 1914), the physical scientist Percy Bridgman (Bridgman, 1931) and the social scientist Luce (Luce, 1959).

The paper’s major contribution lies in its connecting that work in deterministic modelling with the corresponding work on stochastic modelling in statistics embraced by the invariance principle that appeared in unpublished work of Hunt and Stein, more than half a century ago. A connection was recognized in Shen et al. (2014) and Shen (2015), although our approach to creating that linkage is different and more general (Section 6). In fact, in the most general version of our approach, we propose Bayesian modelling by letting the quantities include uncertain population parameters as well as random effects in items sampled from the population of interest. Uncertainty quantification thus becomes a natural byproduct of modelling natural phenomena in the physical and social sciences. Overall, the result is a unified approach that combines the uncertainty of statistics with the determinism of the classical fields of modeling natural phenomena.

In summary, the paper reviews the relationship between dimensional analysis and statistical modeling. To begin with, we see examples of problems that can arise when a statistician ignores the units of measurement in Section 2. Overcoming these difficulties requires a knowledge of quantity calculus, the subject of Section 3; this is the algebra of units of measurement and dimensional homogeneity–the latter ensures for example that the units on both sides of an equation match. Statisticians often transform variables e.g. by the Box–Cox transformation. In Section 4

, we see some issues that arise in doing so. Sometimes the scales are changed unconsciously e.g. when Gaussian distribution on

is adopted as an approximation to a distribution on – this can matter a lot when populations with responses on a ratio scale are being compared. It turns out that when restricted by the need for dimensional homogeneity, the class of models relating the ’s is also restricted; that topic is explored in Section 5. Section 6 brings us into the main contributions of the paper through the application of the invariance principle. The extended invariance principle is applied in Section 7. That section incorporates what statisticians call parameters into modelling and finally links statistical modelling with scientific modelling. The paper wraps up with some concluding remarks in Section 8.

## 2 The unconscious statistician

We present three examples that illustrate some of the concepts we’ll be exploring. The first example illustrates how we often make meaningless statements. The second example, involving the likelihood function, illustrates the problems associated with taking logarithms of quantities with units. The third example, about linear regression, illustrates the importance of the invariance of a model under transformations and the usefulness of having a model defined in terms of unitless quantities.

###### Example 1.

Ignoring scale and units of measurement when creating models can lead to difficulties; we cannot ignore the distinction between numbers and measurements. Consider the Poisson random variable

. The claim is often made that the expected value and variance of

are equal. But if has units, as it did when the distribution was first introduced in 1898 as the number of horse kick deaths in a year in the Prussian army (Härdle and Vogt, 2015), then clearly, the expectation and variance will have different units and therefore cannot be equated.

###### Example 2.

Consider a random variable representing length in millimetres, , independently measured times to yield data . Assume, as is common, that is so large that there is a negligible chance that any of the ’s are negative (we return to this common assumption in Section 4).

Then the maximum likelihood estimate (MLE) of

is easily shown to be the sample average and the MLE of is the maximizer of

 L(σ2)=(σ2)−n/2 exp{−n~σ2/(2σ2)}

where mm and the units of measurement of are also mm. The MLE of is easily found by directly differentiating with respect to and setting the result equal to zero. The MLE is mm. We note that, by any sensible definition of unit arithmetic, is unitless and so the units of are mm.

Of course, the computation would be simpler if we were to maximize the logarithm of , as statisticians commonly do. But this causes some conceptual problems. Using one of the basic properties of the logarithm that flows from its definition, that for any positive and , we see that the log of is equal to

 l(σ2)=log[(σ2)−n/2 exp{−n~σ2/(2σ2)}]=−n2[log(σ2)+~σ2/σ2]. (2.1)

But to be meaningful, each of the two terms and must have the same units of measurements. So, since is unitless, must be unitless. But has units mm, and it is unsettling to have the units disappear simply by taking the logarithm. The problem of logarithms and units is discussed further in subsection 4.3, indicating that calculating the logarithm of the likelihood is, in general, not sensible.

An alternative approach is to recognize that the MLE can be calculated by maximizing functions other than the likelihood. In our example, we can scale the normal likelihood by dividing it by a reference normal likelihood with set to a substantively meaningful . We would then calculate the MLE of and by maximizing this scaled likelihood. This leads us again to , but now the MLE of is found by maximizing the unitless :

 L(σ2)L(σ20)≡L∗(σ2/σ20)=(σ2σ20)−n/2exp{−n~σ22σ20[(σ2σ20)−1]}.

We can now maximize this ratio as a function of the unitless , taking logarithms, differentiating with respect to , setting equal to 0 and solving for , and so mm.

###### Example 3.

In this example, things don’t go so well for an unconscious statistician who ignores units. Here, the data follow the model that relates , a length, to , a time:

 Yi=1+θti+ϵi,i=1,…,2n.

Here the ’s are independent and identically distributed as a for a known . Suppose that hour while hours. Let , and . An analysis might go as follows when two statisticians A and B get involved.

First they both compute the likelihood and learn that the maximum likelihood is found by minimizing the function :

 L(θ)=2n∑i=1[Yi−1−θti]2.

Setting

 dL(θ)dθ=2n∑i=1−2ti[Yi−1−θti]=0,

they find the MLE of to be

 ˆθ=∑2ni=1ti(Yi−1)∑2ni=1t2i=n¯Y1+2n¯Y2−3n5n=¯Y1+2¯Y2−35.

Then for prediction at , they get

 ˆY=1+ˆθ×1=1+¯Y1+2¯Y2−35.

Suppose that foot, or 12 inches, and feet, or 36 inches. Statistician A uses feet and predicts at time hour to be

 ˆYA=1+1+2×3−35=1.8 feet=21.6 inches.

But Statistician B uses inches and predicts at hour to be

 ˆYB=1+12+2×36−35=17.2 inches.

What has gone wrong here? The problem is that the stated model implicitly depends on the units of measure. For instance, the numerical value of the expectation of when is equal to 1, no matter what the units of . When , Statistician A expects to equal 1 foot and Statistician B expects to equal 1 inch. In technical terms, we would say that this model is not invariant under scalar transformations. Invariance is important when defining a model that involves units. However, one could simply avoid the whole problem of units in model formulation by constructing the relationship between and so that there are no units. This is exactly the goal of the Buckingham Pi theorem, presented in Subsection 5.1.

## 3 Dimensional analysis

Key to unifying the work on scales of measurement and the statistical invariance principle is dimensional analysis (DA), a subject taught in the physical sciences but rarely in statistics. Dimensional analysis has a long history, beginning with the discussion of dimension and measurement (Fourier, 1822). Since DA is key to the description of a natural phenomenon, DA lies at the root of scientific modelling. A phenomenon’s description begins with the phenomenon’s features, each of which has a dimension, e.g. ‘mass’ () in physics or ‘utility’ () in economics. Each dimension is assigned a scale e.g. ‘categorical’, ‘ordinal’, ‘ratio’, or ‘interval’, a choice that might be dictated by practical as well as intrinsic considerations. Once the scales are chosen, each feature is mapped into a point on its scale. For a quantitative scale, the mapping will be made by measurement or counting, for a qualitative scale, by assignment of classification. Units of measurement may be assigned as appropriate for quantitative scales, depending on the metric chosen. For example, temperature might be measured on the Fahrenheit scale or on the Celsius scale. This paper will be restricted to quantitative features, more specifically those features on ratio and interval scales.

### 3.1 Dimensional homogeneity

One tenet of working with measured quantities is that units in an expression or equation must “match up”; relationships among measurable quantities require dimensional homogeneity. To check the validity of comparative statements about say and , such as , or , and must be the same dimension, such as time. In addition, to add to , and must also be on the same scale and expressed in the same units of measurement.

To discuss this explicitly, we use a standard notation (JCGM, 2012) and write a measured quantity as , where is the numerical part of and is the unit of measure. For instance, 12 feet = {12} [feet]. Here represents the units of measurement for a dimension in a specific sense. But can also represent units of measurement in a generic sense. For instance, for the dimension length, denoted , serves as information that the dimension has some units of length.

To develop an algebra for measured quantities, for a function we must say what we mean by (usually easy) and (sometimes challenging). The path is clear for a simple function. For example, consider . Clearly we must have , yielding, say, (3 inches) inches. But what if is a more complex function? This issue will be discussed in general in Subsection 4.2 and in detail for in Subsection 4.3.

For simple functions, the manipulation of both numbers and units is governed by an algebra of rules referred to as quantity calculus. This set of rules states that and

• can be added, subtracted or compared if and only if ;

• can always be multiplied to get ;

• can always be divided when to get where and ;

and that

• can be raised to a power that is a rational fraction , provided that the result is not an imaginary number, to get .

Thus it makes sense to transform ozone parts per million (ppm) as ppm since ozone is measured on a ratio scale with a true origin of and hence must be non–negative (Dou et al., 2007).

These rules can be applied iteratively a finite number of times to get expressions that are combinations of products of quantities raised to powers, along with sums and rational functions of such expressions.

This subsection concludes with examples that demonstrate the use of dimensional homogeneity  and quantity calculus. We then provide basic assumptions and rules for quantity calculus.

###### Example 4.

This example concerns a structural engineering model for lumber strength now called the “Canadian model” (Foschi and Yao, 1986). Here is dimensionless and represents the somewhat abstract quantity of the damage accumulated to a piece of lumber by time . When , the piece of lumber breaks. This is the only time when is observed. The Canadian model posits that

 ˙α(t)=a[τ(t)−σ0τs]b+ + c[τ(t)−σ0τs]n+ α(t) (3.1)

where , , , and

are log-normally distributed random effects for an individual specimen of lumber,

, measured in pounds per square inch (psi), is the stress applied to the specimen cumulative to time , (in psi) is the specimen’s short term breaking strength if it had experienced the stress pattern for a fixed known (in psi per unit of time), and is the unitless stress ratio threshold. The expression is equal to if is non-negative and is equal to 0 otherwise. Let denote the random time to failure for the specimen, under the specified stress history curve, meaning .

As has been noted (Köhler and Svensson, 2002; Hoffmeyer and Sørensen, 2007; Zhai et al., 2012; Wong and Zidek, 2018), this model is not dimensionally homogeneous. In particular, the units associated with both terms on the right hand side of the model involve random powers, and , leading to random units, respectively (psi) and (psi). As noted by Wong and Zidek (2018), the coefficients and in (3.1) cannot involve these random powers and so cannot compensate to make the model dimensionally homogeneous.

Rescaling is a formal way of addressing this problem. Zhai et al. (2012) rescale by setting . They let denote the population mean of and write a modified (3.1) as the dimensionally homogenous model

 μ˙α(t)=a∗[π(t)−σ0]b+ + c∗[π(t)−σ0]n+ α(t).

In contrast, Wong and Zidek (2018) propose the modified dimensionally homogeneous model

 μ˙α(t) = [(~aτs)(τ(t)/τs−σ0)+]b + [(~cτs)(τ(t)/τs−σ0)+]n α(t) = [(~aτs)(π(t)−σ0)+]b + [(~cτs)(π(t)−σ0)+]n α(t),

where and are now random effects with units equal to Force Length.

We see that there may be several ways to non-dimensionalize a model. Another method, widely used in the physical sciences, involves always normalizing by the standard units specified by the Systéme International d’Unites̀ (SIU), units such as meters or kilograms. So when the dimensions of a non–negative quantity like absolute temperature have an associated SIU of , can be converted to a unitless quantity by first expressing in SIUs and then by using quantity calculus to rescale it as .

###### Example 5.

The units of parameters in a relationship can be determined by dimensional homogeneity analysis, as we see in the following simple example from Gibbings (2011). Here the model that relates the area of a square to the length of its edge when measurement error is ignored

 a−αol2=0. (3.2)

The additional quantity is a length-to-area conversion factor, playing the key role of ensuring the dimensional homogeneity criterion is satisfied. A noteworthy feature of this model is the relative roles the two quantities and play in modeling this fundamental relationship: is naturally seen as primary while is derivable from and hence secondary. We see this as well in their units of measurement. Using to represent the generic symbol for length, we see that has units of and , units of . The key idea is that a model with a multiplicity of quantities may well be characterized by just a small subset of these quantities that are designated as primary both in terms of their size as well as their units . These two complementary features of , providing dimension and units of measure, play dual roles in the model, a fact often overlooked by statistical modelers.

### 3.2 The problem of scales.

The choice of scale restricts the choice of units of measurement, and these units dictate the type of model that may be used. Thus we need to study scales in the context of model building and hence in the context of quantity calculus. In his celebrated paper, Stevens (1946) starts by proposing four major scales for measurements or observations: categorical, ordinal, interval and ratio. This taxonomy is based on the notion of permissible transformations as is the work of our Section 6. However, our work is aimed at modelling while Stevens’ work is aimed at statistical analysis. Stevens allows permutations as the transformations of data on all four scales, allows strictly increasing transformations for data on the ordinal, ratio and interval scales, allows scalar transformations (

) for data on the ratio and interval scales and allows linear transformations (

) for data on the interval scale.

Stevens created his taxonomy as a basis for classifying the family of all statistical procedures for their applicability in any given situation

(Stevens, 1951). And e.g. Luce (1959)

points out that for measurements made on a ratio-scale the geometric mean would be appropriate for estimating the central tendency of a population distribution according to

Velleman and Wilkinson (1993). In contrast, when measurements are made on an interval-scale the arithmetic mean would be appropriate. The work of Stevens seems to be well-accepted in the social sciences. Ward (2017) calls his work monumental. But Steven’s work is not widely recognized in statistics. Velleman and Wilkinson (1993) reviews the work of Stevens with an eye on potential applications in the then emerging area in statistics of artificial intelligence (AI), hoping to automate data analysis. They claim that “Unfortunately, the use of Steven’s categories in selecting or recommending statistical analysis methods is inappropriate and can often be wrong”. They describe alternative scale taxonomies for statistics that have been proposed, notably by Mosteller and Tukey (Mosteller and Tukey, 1977). A common concern centres on the inadequacies of an automaton to select the statistical method for an AI application. Even the choice of scale itself will depend on the nature of the inquiry and thus is something to be determined by humans. For example, length might be observed on the relatively uninformative ordinal-scale , were it sufficient for the intended goal of a scientific inquiry, rather than on the seemingly more natural ratio–scale .

## 4 Transforming quantities

In statistical modelling, statisticians often transform quantities and their scales on which the data are measured without realizing the difficulties that can arise. For example, ‘height’ lies on a ratio scale since it has a true – a height cannot be below . Approximating the distribution of ‘height’ by a Gaussian distribution may unconsciously take height from its ratio scale to an interval scale. Such a transformation may seem innocuous, merely an approximation of one distribution by another. For example comparing the size of two quantities on a ratio–scale must be made using their ratio, not their difference, whereas the opposite is true on an interval scale where differences are used.

The scale of a quantity may also be changed by unwittingly applying a transformation that requires the quantity to have no units of measurement. One such transformation, an important one in statistics, is the logarithm. We argue in Sections 4.2 and 4.3 that the argument of the logarithm must be unitless.

The scale of the measurement may be transformed for a variety of reasons. The transformation can be relatively simple such as a rescaling, where we know how to transform both the numerical part of and ’s units of measurement. When the transformation is complex, the scale itself might change. For instance, if is measured on a ratio scale, then the logarithm of will be on an interval scale. When the scales themselves change, how are the units of measurement transformed?

Transformations are important. For one thing they can enhance the interpretability of a statistical analysis if chosen in a thoughtful way. For instance in environmental epidemiology, the relative risk of an environmental hazard is defined as the estimated increase in the number of adverse health outcomes due to an increase in , all on a ratio scale. Policy makers can then assess hypothetical risk reductions of, say, , and .

###### Example 6.

We now present a classic rescaling example that illustrates the complexities involved when creating scientific scales. Liquids contain both hydrogen and hydroxide ions. In pure water these ions appear in equal numbers. But the water becomes acidic when there are more hydrogen ions and basic when there are relatively more hydroxide ions. Thus acidity is measured by the concentration of these ions. The customary measurement is in terms of the hydrogen ion concentration, denoted and measured in the Systéme International d’Unites̀ (SIU) of one mole of ions per litre of liquid. These units are denoted c and thus, in our notation, c. However for substantive reasons, the pH index for the acidity of a liquid is now used to characterize acidity. The index is defined by c. Distilled water has a while lemon juice has a level of about . Note that lies on a ratio-scale while lies on an interval scale – the transformation has changed the scale of measurement.

Observe that in Example 6, the units of measurement in were eliminated before transforming by the transcendental function . That raises the question: do we need to eliminate units before applying the logarithm? This question and the logarithmic transformation in science have led to vigorous debate for over six decades (Matta et al., 2010). We highlight and resolve some of that debate below in Section 4.3.

However we begin with an even simpler situation seen in the next subsection, where we study the issues that may arise when interval scales are superimposed on ratio scales.

### 4.1 Scales within scales

This subsection concerns a perhaps unconscious switch in a statistical analysis from a ratio scale, which lies on to an interval scale, which lies on when approximating a distribution. This switch occurs when for example the Gaussian distribution is used to model the relative frequency histogram in a statistical analysis, when the data e.g. human heights are measured on a ratio–scale. This switch is ubiquitous and seen in most elementary statistics textbooks. There an assumed Gaussian sampling distribution model leads to the sample average as a measurement of the population average instead of the geometric mean, which should have been used Luce (1959)

. That same switch is made in such things as regression analysis and the design of experiments. The seductive simplicity has also led to the widespread use of the Gaussian process in spatial statistics and machine learning, despite the light tails of the Gaussian distribution.

The justification of the widespread use of the Gaussian approximation may well lie in the belief that the natural origin of the ratio–scale lies well below the range of values of likely to be found in a scientific study. This may well the explanation of the reliance on interval scales for Celsius and Fahrenheit on planet Earth at least since one would not expect to see temperatures anywhere near the true origin of temperature on the on the Kelvin scale that corresponds to on the Celsius’s interval scale. We would note in passing that these two interval scales for temperature also illustrate the statistical invariance principle (see Subsection 4.3); each scale is a positive affine transformation of the other.

We illustrate the difficulties that can arise when an interval–scale is misused in a hypothetical experiment where measurements are made on a ratio–scale, with serious consequences.

###### Example 7.

Researchers wanted to estimate the treatment effect on a population characteristic measured on a ratio scale. More specifically researchers suspected that the benefit of a treatment on two groups A and B would differ. In the experiment, two independent random samples from groups A and B were selected to provide baseline control information of the two populations, with sample average responses of and . New, independent random samples from groups A and B were given the treatment, with sample averages of the post–treatment responses denoted and . The researchers found that kg and kg, concluding that the difference in treatment effect between groups A and B was negligible, in other words that the impact of the treatment is the same for the two groups. However on reanalysis the researchers realized that baseline measurements yielded kg and kg. Taking account of the fact that responses were measured on a ratio scale, they correctly assessed change via unitless ratios: for population A, the assessment of change is or a 20% reduction and for population B, , or a 10% reduction. This indicates a substantial change for population A, double the change for population B. Switching scales in the analysis led to an incorrect conclusion.

Remark. The justification above for the switch from a ratio– to an interval–scale can be turned into a simple approximation that may help with the interpretation of the data. To elaborate, suppose interest lies in comparing two values of that lie in the a ratio scale with for a known . Interest lies in the relative size of these quantities, i.e. on . It is easily seen that an approximation to may be found throught a Taylor expansion involving the differences and provided the latter is small enough.

Now we turn to other issues that arise with the use of more complex transformations of than mere rescaling of the data but we review and important distinction two types of functions that are used to make such transformations.

### 4.2 Algebraic and transcendental functions

Modelling quantities requires describing their relationship via a functional equation

 u(X1,…,Xp)=0. (4.1)

Desirable properties of along with methods for calculating are discussed in Section 5. At a minimum, the function must satisfy the requirement of dimensional homogeneity. We know how to calculate units when consists of a finite sequence of permissible algebraic operations involving the ’s, possibly combined with parameters. Such operations, which are called “algebraic”, may be formally defined in terms of roots of a polynomial equation that must satisfy the requirement of dimensional homogeneity.

Can involve non-algebraic operations? Non-algebraic functions are called transcendental i.e they “transcend” an algebraic construction. Examples in the univariate case are and and, for a given nonnegative constant , and . The formal definition of a non-algebraic function does not explicitly say whether or not such a function can be applied to quantities with units of measurement. Bridgman (1931) sidesteps this issue by arguing that it is mute since valid representations of natural phenomena can always be nondimensionalized (see Subsection 5.1). But the current Wikipedia entry on the subject states “transcendental functions are notable because they make sense only when their argument is dimensionless” (Wikipedia, 2020). In the next subsection we explore this issue for a specific transcendental function of special importance in statistical science.

### 4.3 The logarithm: a transcendental function

#### 4.3.1 Does the logarithm have units?

To answer this question, first consider applying the logarithm to a unitless quantity . It is sensible to think that its value will have no units, and so we take this as fact.

But what happens if we apply the logarithm to a quantity with units? One school of thought suggests the result is a unitless quantity. The argument is based on the idea that the is the area under the curve of the function (Molyneux, 1991). In other words, define the logarithm of to be the area under the function , from to , with . For , make the change of variables so that is unitless and get

 ∫xy1ud(u) = ∫x/y11a vd(y v) = ∫x/y11vd(v) = ln (x/y).

But this rationale assumes something we don’t know and that is the integral leading to Equation (4.3.1) is the natural logarithm. To know that requires that if we take the derivative of the logarithm, we get the function . That seems to force us to turn to the only available option, its original definition as the inverse of another transcendental function , at least if . In other words

 x=exp (lnx), x≥0.

The chain rule now tells us that

 1=dln(x)dxexp (lnx).

Thus

 dln(x)dx=exp (−lnx)=1x

for any real . Now if we return to Equation (4.3.1), we see that when has units and we define to be the area under the curve, we get . In other words, the result of applying this transcendental function to a dimensional quantity simply causes the units to be lost. In short is unitless even when has units.

#### 4.3.2 Can we take the logarithm of a dimensional quantity with units?

Molyneux (1991) sensibly argues that, since has no units, cannot have units – since ’s units are lost, the result is meaningless. To consider this further, suppose is some measure of particulate air pollution in the logarithmic scale with for some . This measure appears as in a scientific model of the impact of particulate air pollution on health. Experimental data pointed to the value . But we have no idea if air pollution was a serious health problem. So indeed it is disturbing that the value of a function is unitless, no matter what the argument. This property of the logarithm points to the need to nondimensionalize before applying the logarithmic transformation in scientific and statistical modelling, in keeping with the theories of Buckingham, Bridgman and Luce.

One of the major routes taken in debates about the validity of applying to a dimensional quantity involves arguments based one way or another on Taylor expansions (see Appendix B). A key feature of these debates involves the claim that terms in the expansion have different units, thus making the expansion impossible. Key to their argument is taking the derivative of when has units. However, it isn’t completely clear how to differentiate .

Suppose we have a function with argument . We define the derivative of with respect to as follows. Let and and suppose that and have the same units, that is, that . Otherwise, we would not be able to add and in what follows. Then we define

 f′(x) ≡ lim{Δ}→0f(x+Δ)−f(x)Δ = lim{Δ}→0f({x+Δ}[x+Δ])−f({x}[x]){Δ}[Δ] = lim{Δ}→0f({x+Δ}[x])−f({x}[x]){Δ}[x].

For instance, for

 ddxx2 = lim{Δ}→0{x+Δ}2[x]2−{x}2[x]2{Δ}[x]=lim{Δ}→0{x+Δ}2−{x}2{Δ}×[x] = 2{x}[x]=2x.

Using (4.3.2) to differentiate , and recalling that , we first write

 ln(x+Δ)−lnx=ln{x+Δ}−ln{x}.

So

 ddxlnx=lim{Δ}→0ln{x+Δ}−ln{x}{Δ}[x]=dln{x}d{x}×1[x]={1}{x}[x]={1x}×1[x]=1x.

Using this definition of the derivative we can carry out a Taylor series expansion about to obtain

 log(x)=log(a)+∞∑k=1g(k)(a)(x−a)kk!, (4.4)

where

 g(k)(a)=[dklog(x)/dxk∣∣∣x=a.

As , the first term, , in the infinite summation is unitless. Differentiating yields and once again, we see that the term is unitless. Continuing in this way, we see that the summation on the right side of equation (4.4) is unitless, and so the equation satisfies dimensional homogeneity. This reasoning differs from the incorrect reasoning of Mayumi and Giampietro (2010) in their argument that the logarithm cannot be applied to quantities with units because the terms in the Taylor expansion would have different units. Our reasoning also differs from that of Baiocchi (2012) who uses a different expansion to show that the logarithm cannot be applied to measurements with units, albeit without explicitly recognizing the need for to be unitless. The expansion in Equation (4.4) is the same as that given in Massa et al. (2011). Although the latter don’t give it in explicit form for they do use it to discredit the Taylor expansion argument against applying to quantities with units.

## 5 Allowable relationships among quantities

Having explored dimensional analysis and the kinds of difficulties that can arise when scales or units are ignored, we turn to a key step toward our proposed unification of scientific and statistical modelling. We now determine how to relate quantities and hence how to specify the ‘law’ that characterizes the phenomenon which is being modelled.

But what models may be considered legitimate? Answers for the sciences, given long ago, were based on the principle that for a model to completely describe a natural phenomenon, it cannot depend on the units of measurement that might be chosen to implement it. This answer was interpreted in two different ways. In the first interpretation, the model must be non-dimensionalizable i.e. it cannot have scales of measurement and hence cannot depend on units. In the second interpretation, the model must be invariant under all allowable transformations of scales. Both of these interpretations reduce the class of allowable relationships that describe the phenomenon being modelled and place restrictions on the complexity of any experiment that might be needed to implement that relationship.

We begin by revisiting a previous example.

###### Example 8 (continues=ex:gibbons).

The standard dimensions for area and length are and respectively and the standard scales of measurement for them as specified by the Système International d’Unités are and , respectively, the latter being standard for However, the relationship between the area of the square and the length of its sides represented in Equation (3.2) is fundamental. So the relationship is no way dependent on the scales of measurement that are ultimately used provided is appropriately specified. In other words, the dimensions themselves play no fundamental role in this relationship. Therefore it must be possible to re–express that relationship in a dimensionless form. In this case that result is expressible as

 π−1=0. (5.1)

where .

The model that relates area to the length of a side is now dimensionless and hence shows the relationship is fundamental in nature – it does not depend on the scale on which the two quantities and happen to be measured. In this next subsection we see a much more general expression of the same idea.

### 5.1 Buckingham’s Pi-theorem

The section begins with Buckingham’s simple motivating example.

###### Example 9.

This example is a characterization of properties of gas in a container, namely, a characterization of the relationship amongst the ‘pressure’ (), the ‘volume’ () , the number of moles of gas (‘’) and the ‘absolute temperature’ () of the gas. The absolute temperature reflects the kinetic energy of the system and is measured in degrees Kelvin (), the SIUs for temperature. Note that occurs when the kinetic energy is zero and occurs when the temperature is Celsius. A fundamental relationship amongst these quantities is given by

 pvθN−D=0 (5.2)

for some constant that doesn’t depend on the gas. Since the units of are (force length)/(# moles temperature), as expressed, the relationship in (5.2) depends on the dimensions associated with and , whereas the physical phenomenon underlying the relationship does not. Buckingham gets around this by invoking a parameter () with units (# moles temperature)/(force length). He rewrites Equation 5.2 as

 pvRθN−1=0. (5.3)

Thus, in Equation (5.1), , an equation Buckingham calls complete and hence non-dimensionalizable. This equation is known as the Ideal Gas Law, with the ideal gas constant (ide, 2019).

This example of nondimensionalizing by finding one expression, , as in Equation (5.1) can be extended to cases where we must nondimensionalize by finding several functions. This extension is formalized in Buckingham’s Pi-theorem. Here is a formal statement (in slightly simplified form) as stated by Buckingham (1914) and discussed in a modern style in Bluman and Cole (1974).

###### Theorem 1.

Suppose are measurable quantities satisfying a defining relation

 u(X1,…,Xp)=0 (5.4)

that is dimensionally homogeneous. In addition, suppose that there are dimensions appearing in this equation, denoted , and that the dimensions of can be expressed and the dimensions of each can be expressed as . Then Equation (5.4) implies the existence of fundamental quantities, dimensionless quantities with , and a function such that

 U(π1,…,πq)=0. (5.5)

In this way has been nondimensionalized. The choice of in general is not unique.

The theorem is proven constructively, so we can find and . We first determine the fundamental dimensions used in . We then use the quantities to construct two sets of variables: a set of primary variables also called repeating variables and a set of secondary variables, which are non-dimensional. For example, if is the length of a box and is the height and is the width, then there is fundamental dimension, the generic length denoted . We can choose as the primary variable and use to define two new variables and . These new variables, called secondary variables, are dimensionless. Buckingham’s theorem states the algebraic equation relating and can be re-written as an equation involving only and . Note that we could have also chosen either or as the repeating variable.

We now apply the theorem’s proof to an example from fluid dynamics that appears in Gibbings (2011).

###### Example 10.

The example is a model for fluid flow around a sphere and the calculation of the drag force that results. It turns out that the model depends only on something called the coefficient of drag and on a complicated, single dimensionless number called the Reynolds number that incorporates all the relevant dimensions. Our treatment follows those in an online video (Elger, 2011).

To begin with we list all the relevant dimensions, the ‘drag force’ (), ‘velocity’ (), ‘viscosity’ (), ‘fluid density’ () and ‘sphere diameter’ (). We see that we have s in the notation of Buckingham’s theorem. We first note that these five dimensions can be expressed in terms of the three dimensions length (), mass () and time (). We treat these as the three primary dimensions and this tells us that we need at most dimensionless functions to define for our model.

We first write down the units of each of the five dimensions in terms of , and :

 [F]=ML/T2;   [V]=L/T;   [ρ]=M/L3;   [μ]=ML−1T−1;   [D]=L. (5.6)

We now proceed to sequentially eliminate the dimensions , and in all five equations. First we use to eliminate . The first four equations become

 [FD−1]=MT−2;   [VD−1]=T−1;   [D3ρ]=M;   [Dμ]=MT−1.

We next eliminate via , yielding

 [FD−1D−3ρ−1]=T−2;   [VD−1]=T−1;   [DμD−3ρ−1]=T−1,

that is

 [FD−4ρ−1]=T−2;   [VD−1]=T−1;   [μD−2ρ−1]=T−1.

To eliminate , we could use or or even, with a bit more work, . We use , yielding

 [FD−4ρ−1V−2D2]=1   and    [μD−2ρ−1V−1D]=1,

that is

 [FD−2ρ−1V−2]=1   and   [μD−1ρ−1V−1]=1.

All the dimensions are now gone on the right hand side of each of the five equations in (5.6) so we have nondimensionalized the problem and in the process found and as implied by Buckingham’s theorem:

Therefore, for some ,

 U(FρD2V2,μρDV)=0. (5.7)

Remarkably we have also found the famous Reynolds number, (see for example Friedmann et al. (1968)). The Reynolds number determines the coefficient of drag, , and is a fundamental law of fluid mechanics.

If we knew to begin with, we could track the series of transformations starting at (5.6) to find . If, however, we had no specified to begin with, we could use and to determine a model, that is, to find . For instance, we could carry out experiments, make measurements and determine from the data. In either case, we can use to determine the coefficient of drag from the Reynolds number and in turn calculate the drag force.

A link between Buckingham’s approach and statistical modelling was recognized in the paper of Albrecht et al. (2013) and commented on in Lin and Shen (2013). But its link with the statistical invariance principal seems to have been first identified in the thesis of Shen (2015). This connection provides a valuable approach for the statistical modelling of scientific phenomena. But Shen’s approach differs from the one proposed in this paper in Section 6. Shen starts with Buckingham’s approach and thereby a nondimensionalized relationship amongst the variables of interest to build a regression model. We present his illustrative example next.

###### Example 11.

(Shen, 2015) This example concerns a model for the predictive relationship between the volume of wood in a pine tree and its height and diameter . The dimensions are , and . Chen chooses as the repeating variable and calculates the pi-functions and . He then applies the pi-theorem to get the dimensionless version of the relationship amongst the variables:

 π2=g(π1) (5.8)

for some function . He correctly recognizes that is the maximal invariant under the scale transformation group, although the connection to the ratio–scale of Stevens is not made explicitly. He somewhat arbitrarily chooses the class of relationships given by

 π2=kπγ1. (5.9)

He linearizes the model in Equation (5.9) and adds a residual to get a standard regression model, susceptible to standard methods of analysis. In particular the least squares estimate turns out to provide a good fit judging by a scatterplot.

Note that application of the logarithmic transformation is justified since the pi-functions are dimensionless.

Section 6.2 will show how Example 11 may be embedded in a stochastic framework before the Pi–theorem is applied.

### 5.2 Bridgman’s alternative

We now describe an alternative to the approach of Buckingham (1914) due to Bridgman (1931). At around the same time that Edgar Buckingham was working on his theorem, Percy William Bridgman was giving lectures at Harvard on the topic of nondimensionalization that were incorporated in a book whose first edition was published by Yale University Press in 1922. The second edition came out in 1931 (Gibbings, 2011). Bridgman thanks Buckingham for his papers but notes their approaches differ. And so they do. For a start, Bridgman asserts his disagreement with the position that seems to underlie Buckingham’s work that “…a dimensional formula has some esoteric significance connected with the ‘ultimate nature’ of things….”. Thus to those that espouse that point of view it becomes important to ‘…find the true dimensions and when they are found, it is expected that something new will be suggested about the physical properties of the system.” Instead, Bridgman takes measurement itself as the starting point in modelling and even the collection of data: “Having obtained a sufficient area of numbers by which the different quantities are measured, we search for relations between these numbers, and if we are skillful and fortunate, we find relations which can be expressed in mathematical form.” He then seeks to characterize measured quantities as either primary, the product of direct measurement and then secondary quantities such as velocity that are computed from the measurements of the primary ones. Finally he sees the basic scientific issue as that of characterizing one quantity in terms of the others as in our explication of Buckingham’s work above in terms of the function .

Bridgman proves that the functional relationship between secondary and primary measurements, which under what statistical scientists might call “equivariance” under multiplicative changes of scale in the primary units, necessitates that they must be monomials with possible fractional exponents, not unlike the form of the functions above. Thus under the assumed differentially of with respect to its arguments, Bridgman is able to re–derive Buckingham’s formula.

### 5.3 Beyond ratio scales

Nondimensionalization seems more difficult outside of the domain of the physical sciences. For example, the dimensions of quantities such as utility cannot be characterized by a ratio scale. And the choice of the primary dimensions is not generally so clear, although Baiocchi (2012) does provide an example in macroeconomics where time , money , goods and utility may together be sufficient to characterize all other quantities.

So a substantial body of work was devoted to extending the work of Bridgman into the domain of nonratio scales, beginning with the seminal paper of Luce (1959). To quote the paper by Aczél et al. (1986), which contains an extensive review of that work:

‘… Luce shows that the general form of a “scientific law” is greatly restricted by knowledge of the “admissible transformations” of the dependent and independent variables…’
Aczel, Roberts and Rosenbaum 1986

It seems puzzling that this principle has not been much recognized if at all in statistical science, in part perhaps because little attention is paid to such things as dimensions and units of measurement.

The substantial body of research that followed Luce’s publication covers a variety of scales e.g. ordinal among other things. Curiously that work largely ignores the work of Buckingham in favor of Bridgman even though the former preceded the latter. Also ignored is the work on statistical invariance described in Section 6, which goes back to G. Hunt and C. Stein in 1946 in unpublished but well–known work that led to optimum statistical tests of hypothesis,

To describe this important work by Luce and we re-express Equation (1) as

 Xp=u∗(X1,…,Xp−1) (5.10)

for some function and thereby a class of all possible laws that could relate to the predictors , before turning to an empirical assessment of the possibilities. Luce makes the strong assumption that the scale of each is susceptible to a transformation , i.e. . Furthermore he assumes that they may be transformed independently of one another–no structural constraints are imposed. Luce assumes a function such that

 u∗(T1(X1),…,T(p−1)Xp−1)=D(T1, …, T(p−1))u∗(X1,…,Xp−1)

for all possible transformations and choices of . He determines under these conditions that if the lie on ratio–scales along with that

 u∗(X1,…,Xp−1)∝Πp−1i=1 Xαii,

where the ’s are nondimensional constants, which is Bridgman’s result, albeit proved by Luce without assuming differentiability of . If on the other hand some of the are on a ratio–scale while others on an interval–scale and is on an interval scale, then Luce proves cannot exist except in the case where and is on an interval form.

However, the assumption of the independence of the transformations seems unduly strong for many situations as noted by Aczél et al. (1986), and weakening that assumption expands the number of possibilities for the role of . Further work culminated in that of Paganoni (1987) To describe the latter, assume that and

are real vector spaces and

1. and are nonempty subsets of such that:

 X+P ⊂ X λP ⊂ P, for all 0≤λ
2. where denotes the algebra of linear operators of into itself and

 I ∈ R R(P) ⊂ P, for all R∈R
3. if then .

In the notation of Equation (5.10), let . Paganoni (1987) supposes is an affine space with . Furthermore he assumes that the law relating to x satisfies the following functional equation for some functions and

 u∗(R x+P) = α(R,P)u∗(x)+β(R,P) = α(R,P)xp+β(R,P)

where , , , and .

Paganoni (1987) now explores the solution space for Equation (5.3). He identifies two cases: constant and non-constant . Suppose first that . If , then and . Alternatively if , then . But so . For the second case where is not constant, Paganoni (1987) presents his class of solutions in his Theorem 1. It states that the functions and must have one of the following forms:

(i).

and , where

 ψ(RS)=M(R)ψ(S)+ψ(R) M(R S)=M(R)M(S), R,S∈R,R% S∈R.
(ii).

and where

 ψ(RS)=M(R)ψ(S)+ψ(R) M(R S)=M(R)M(S), R,S∈R,R% S∈R, and M(λ R)=λM(R) for all λ>0 A(P+Q)=A(P)+A(Q), P,Q∈P, P+Q∈P A(λP)=λA(P) for all λ>0

Paganoni (1987) goes on to seek more specific forms for the quantities above, but for brevity we will omit those details. But in doing so the author ignores the units of measurement attached to the coordinates of x, a possible limitation of the work.

## 6 Statistical invariance

The stochastic foundation described in this section and the statistical invariance principle were developed within the frequentist (repeated sampling) paradigm. Models like in Equation (4.1) were expressed as conditional expectations. Model uncertainty could be characterized through residual analysis and such things as the conditional variance. Furthermore principled empirical assessments of the validity of could be made given replicate samples.

In Subsection 6.1

the basic elements of an invariant statistical model are described. There, we define the sample space and assume that there is a group of allowable transformations of the sample space that also acts on units of measurement . We discuss the maximal invariant and its probability distribution. We thus create a general process that provides us with a parsimonious stochastic model that correctly accounts for units of measurement. In Subsection

6.2, we apply this work to random variables measured on ratio scales. In Subsection 6.3 we turn to interval scales.

### 6.1 Invariant statistical models

#### The sample space

The sample space is a fundamental building block for the repeated sampling school paradigm of statistics. Its meaning is ambiguous however for it can stand for: (i) the population from which items are to be drawn; (ii) the range of random row vectors of the observable properties of that are to be repeatedly measured by the sampler; (iii) the set of such row vectors that will be randomly observed to yield the dataset for model assessment. Unless otherwise stated, we will use interpretation (ii). Finally where inferential interest focuses on one of the s as a predictand, we label it as as in Equation (5.10) .

The properties of are characterized by ’the ’s whose dimensions e.g. length must be scaled e.g. by the metric scale so they can be measured with appropriate resolution e.g. millimetres. The statistical invariance principle recognizes that the outcome of say a hypothesis test should be the same if the measurement scale were transformed, e.g. from millimetres to centimetres. This requirement was formalized as the invariance principle under an algebraic group of allowable transformations of the sample space . To qualify as a group must include the composition of any two transformations . Furthermore it must contain the identity transformation for which for all and an inverse transformation for any in , meaning .

For any , the set is called the orbit of x. is called transitive if consists of a single orbit. In that case for every pair of elements , there exists a unique such that . Finally the single orbit can be indexed by a nondimensionalized element of the orbit ; every element in the orbit can be obtained from by a transformation .

If is not transitive, the sample space would be the union of its disjoint orbits. The maximal invariant is a function that is constant on orbits and different on different orbits so that it indexes the orbits. Thus any function of that is invariant under transformations must depend on