Is profile likelihood a true likelihood? An argument in favor

Profile likelihood is the key tool for dealing with nuisance parameters in likelihood theory. It is often asserted, however, that profile likelihood is not a true likelihood. One implication is that likelihood theory lacks the generality of e.g. Bayesian inference, wherein marginalization is the universal tool for dealing with nuisance parameters. Here we argue that profile likelihood has as much claim to being a true likelihood as a marginal probability has to being a true probability distribution. The crucial point we argue is that a likelihood function is naturally interpreted as a maxitive possibility measure: given this, the associated theory of integration with respect to maxitive measures delivers profile likelihood as the direct analogue of marginal probability in additive measure theory. Thus, given a background likelihood function, we argue that profiling over the likelihood function is as natural (or as unnatural, as the case may be) as marginalizing over a background probability measure.

03/01/2019

Are profile likelihoods likelihoods? No, but sometimes they can be

We contribute our two cents to the ongoing discussion on whether profile...
03/22/2022

A Bayesian Approach for Shaft Centre Localisation in Journal Bearings

It has been shown that ultrasonic techniques work well for online measur...
03/24/2022

Learning Optimal Test Statistics in the Presence of Nuisance Parameters

The design of optimal test statistics is a key task in frequentist stati...
04/14/2022

Robust Bayesian inference in complex models with possibility theory

We propose a general solution to the problem of robust Bayesian inferenc...
07/11/2012

On Modeling Profiles instead of Values

We consider the problem of estimating the distribution underlying an obs...
10/23/2019

Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation

The likelihood function is a fundamental component in Bayesian statistic...
07/13/2016

Possibilistic Networks: Parameters Learning from Imprecise Data and Evaluation strategy

There has been an ever-increasing interest in multidisciplinary research...

1 Introduction

Consider the opening sentence from the entry on profile likelihood in the Encyclopedia of Biostatistics (Aitkin, 2005):

The profile likelihood is not a likelihood, but a likelihood maximized over nuisance parameters given the values of the parameters of interest.

Numerous similar assertions that profile likelihood is not a ‘true’ likelihood may be found throughout the literature and various textbooks, and is apparently the accepted viewpoint of the statistical community. Importantly, this includes the ‘pure’ likelihood literature, which generally accepts a lack of systematic methods for dealing with nuisance parameters, while still recommending profile likelihood as the most general, albeit ‘ad-hoc’, solution (see e.g. Royall, 1997; Rohde, 2014; Edwards, 1992; Pawitan, 2001). Similarly, recent monographs on characterizing statistical evidence presents favorable opinions of the likelihood approach but criticize the lack of general methods for dealing with nuisance parameters (Aitkin, 2010; Evans, 2015). The various justifications given, however, appear to the present author to rather vague and unconvincing. For example, suppose we modified the above quotation to refer to marginal probability instead of profile likelihood:

A marginal probability is not a probability, but a probability distribution integrated over nuisance variables given the values of the variables of interest.

The above would be a perfectly fine characterization of a marginal probability if the “not a probability, but” part was dropped, i.e.

A marginal probability is a probability distribution integrated over nuisance variables given the values of the variables of interest.

Simply put: the fact that a marginal probability is obtained by integrating over a ‘background’ probability distribution does not prevent the marginal probability from being a true probability. The crucial observation in the case of marginal probability is that integration over variables takes probability distributions to probability distributions.

The purpose of the present article is to point out that there is an appropriate notion of integration over variables that takes likelihood functions to likelihood functions via maximization. This notion of integration is based on the idea of idempotent analysis, wherein one replaces a standard algebraic operation such as addition in a given mathematical theory with another basic algebraic operation, defining a form of ‘idempotent addition’, to obtain a new analogous, self-consistent theory (Maslov, 1992; Kolokoltsov and Maslov, 1997)

. In this case one simply replaces the usual ‘addition’ operations, including the usual (Lebesgue) integration, with ‘maximization’ operations, including taking supremums, to obtain a new, ‘idempotent probability theory’. Maximization in this context is understood algebraically as an idempotent addition operation, hence the terminology. While perhaps somewhat exotic at first sight, this idea finds direct applications in e.g. large deviation theory

(Puhalskii, 2001) and, most relevantly, possibility theory, fuzzy set theory and pure-likelihood-based decision theory (Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017). A popular special instance of idempotent mathematics is so-called ‘tropical mathematics’ in which multiplication is also converted to a new algebraic operation, here addition (see e.g. Speyer and Sturmfels, 2009; Akian, Quadrat and Viot, 1996; Litvinov, 2007; Pachter and Sturmfels, 2004; Bernhard, 2000). That is, the basic ‘addition’ and ‘multiplication’ operations in tropical algebra are interpreted as , respectively, instead of the usual . With the introduction of a logarithmic distance in likelihood theory, multiplication of likelihoods becomes addition of log-likelihoods and we are naturally led to a ‘Tropical Bayesian’ interpretation of (log) profile likelihoods. This provides a formal foundation for the usual intuitive interpretation of (negative) log-likelihoods as ‘cost’ measures.

The present argument is not, of course, without objections. In particular, acceptance or rejection of the present interpretation depends on what one believes the key properties of likelihood should be; this is, perhaps surprisingly, not without significant controversy (Bayarri and DeGroot, 1992; Bjørnstad, 1996; Bayarri, DeGroot and Kadane, 1988). Thus we end with a discussion of various potential objections, including a discussion of some properties one might want a general notion of ‘likelihood’ to satisfy and whether the present interpretation does or does not satisfy these. Despite potential conflicts with some frequentist, evidential and/or Bayesian considerations, we believe that the present interpretation is a clear, self-consistent and suitable foundational concept for ‘pure’ likelihood theory (particularly that developed by Edwards, 1992), and/or for what we propose to call ‘Tropical Bayes’.

2 Likelihood as a possibility measure

Though apparently not well known in the statistical literature, likelihood theory is known in the wider literature on uncertainty quantification to have a natural correspondence to possibility theory rather than to probability theory (Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017). This has perhaps been obscured by the usefulness of likelihood methods as tools in probabilistic statistical inference. It is not our intention to review this wider literature in detail here (see e.g. Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017; Augustin et al., 2014; Halpern, 2017, for more), but to simply point out the implications of this correspondence. In particular, likelihood theory interpreted as a possibilistic, rather than probabilistic theory can be summarized as:

Probability theory with addition replaced by maximization.

As indicated above, this is sometimes known as, for example, ‘idempotent measure theory’, ‘maxitive’ measure theory or ‘possibility’ theory, among other names (see e.g. Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017; Augustin et al., 2014; Halpern, 2017; Maslov, 1992; Kolokoltsov and Maslov, 1997; Puhalskii, 2001, for more). This correspondence perhaps explains the preponderance of maximization methods in likelihood theory, including the methods of maximum likelihood and profile likelihood.

The most important consequence of this perspective is that the usual Lebesgue integration with respect to an additive measure, as in probability theory, becomes, in likelihood/possibility theory, a different type of integration, defined with respect to a maxitive measure. Again, the key point is simply that addition operations (including summation and integration) are replaced by maximization operations (or taking supremums in general).

For completeness, we contrast the key axioms of possibility theory with those of probability theory. Given a set of possibilities of

, assumed to be discrete for the moment for simplicity, and for two discrete sets of possibilities

the key axioms of elementary possibility theory are (Halpern, 2017):

 poss(∅)=0 (2.1) poss(Ω)=1 poss(A∪B)=max{poss(A),poss(B)}

which can be contrasted with those of elementary probability theory:

 prob(∅)=0 (2.2) prob(Ω)=1 prob(A∪B)=sum{prob(A),prob(B)}

where and are required to be disjoint in the probabilistic case, but this is not strictly required in the possibilistic case.

Given a ‘background’ or ‘starting’ likelihood measure, likelihood theory can be developed as a self-contained theory of possibility, where derived distributions are manipulated according to the first set of axioms above. This is entirely analogous to developing probability theory from a background measure, with derived distributions manipulated according to the second set of axioms. As our intention is to consider methods for obtaining derived distributions by ‘eliminating’ nuisance parameters, we need not consider here where the starting measure comes from (but see the Discussion).

To make the correspondences of interest clear in what follows, we first present probabilistic marginalization as a special case of a pushforward measure or, equivalently, as a special case of a general (not necessarily 1-1) change of variables. We then consider the possibilistic analogues.

3 Pushforward probability measures and the delta function method for general changes of variable

Given a probability measure

over a random variable

with associated density , define the new random variable where . This variable is distributed according to the pushforward measure , i.e. .

The density of , here denoted by , is conveniently calculated via the delta function method which is valid for arbitrary changes of variables (not necessarily 1-1):

 q(t)=[T⋆ρ](t)=∫δ(t−T(x))ρ(x)dx. (3.1)

As a side point, we note that this method of carrying out arbitrary transformations of variables is standard in statistical physics (see e.g. Van Kampen, 1992), but is apparently less common in statistics (see the articles Au and Tam, 1999; Khuri, 2004, aimed at highlighting this method to the statistical community).

3.1 Marginalization via the delta function method

The above means that we can interpret marginalization to a component , say, as a special case of a (non-1-1) deterministic change of variables via:

 ρ(x1)=∫δ(x1−projX1(x))ρ(x)dx, (3.2)

where is simply the projection of to its first coordinate. Thus marginalization can be thought of as the pushforward under the projection operator and as a special case of a general (not necessarily 1-1) change of variables .

4 Profile likelihood as marginal possibility and an extension to general changes of variable

As we have repeatedly stressed above, likelihood theory interpreted as a possibilistic, and hence maxitive, measure theory simply means that ‘addition’ operations such as the usual Lebesgue integration are replaced by ‘maximization’ operations such as taking the supremum.

Consider first then the analogue of a marginal probability density, which we will call a marginal possibility distribution and denote by . Starting from a ‘background’ likelihood measure we ‘marginalize’ in the analogous manner to before:

 Lp(x1)=sup{δ(x1−projX1(x))L(x)}=sup{x|projX1(x)=x1}{L(x)}. (4.1)

This is again simply the pushforward under the projection operator, but here under a different type of ‘integration’ - i.e. the operation of taking a supremum. Of course, this is just the usual profile likelihood for .

As above, we need not be restricted to ‘marginal’ possibility distributions: we can consider arbitrary functions of the parameter . This leads to an analogous pushforward operation of to that we denote by :

 Lp(t)=[T⋆pL](t)=sup{δ(t−T(x))L(x)}=sup{x|T(x)=t}{L(x)} (4.2)

which again corresponds to the usual definition of profile likelihood.

5 A simple example comparing marginal probability and marginal possibility

Here we consider a simple example illustrating the difference between probabilistic and possibilistic reasoning, in particular under marginalization/non-1-1 changes of variable.

Suppose you have three suspects in a crime. Through some means or another you decide on the following ‘plausibility’ distribution, where plausibility is used here as a general umbrella term for probability and/or possibility reasoning: suspect one has plausibility 0.4, while the other two suspects each have plausibility 0.3. You also know that suspect one was wearing a red hat at the time of the crime while the other two were wearing blue hats.

According to the above, under a probabilistic interpretation, the most probable perpetrator is suspect one (who wore a red hat); but the most probable hat color of the perpetrator is blue (with probability 0.3 + 0.3 = 0.6). This is a consequence of the additivity of probability theory and the non-1-1 change of variables in going from suspects to hat colors.

On the other hand, if you interpret the given plausibility numbers as a possibility distribution, then according to standard possibility theory the most possible suspect is suspect one and the most possible hat colour is now red, i.e. is the hat color of the most possible suspect, suspect one. Similarly, this is a consequence of the maxitivity of possibility theory.

The difference can be made more extreme given a large number of ‘other’ suspects, each with low plausibility but sharing some common property that the main suspect lacks. Again, these results are a simple consequence of how additivity and maxitivity, respectively, interact with non-1-1 changes of variable (here: person to hat color).

We believe that there are reasonable situations where additivity is desirable, but also reasonable situations in which maxitivity might be preferred. This is a subject worth further debate. We note, however, that a relative probability approach to the problem of statistical evidence, such as that presented in Evans (2015) comes to similar conclusions to that of a possibility approach (Michael Evans, personal communication).

6 Strength of evidence, distances and ‘Tropical Bayes’

As noted in Evans (2015), it is perhaps less controversial to hold that likelihood gives a qualitatively reasonable relative ordering of preference for parameter values in light of data than it is to hold (e.g. Royall, 1997) that it provides a quantitative measure of relative support.

To make some progress towards addressing this distinction, we consider how to define a suitable notion of distance that respects - but is distinct from - a given qualitative ordering. Notions of statistical distance are common in the statistical literature (see e.g. Basu, Shioya and Park, 2011, and references therein); here, however, we follow the ideas developed by Tarantola (2006) of quality spaces and distances defined in these. This leads naturally to the idea of pure likelihood theory as a form of what we propose to call ‘Tropical Bayes’, where the meaning of this term is discussed below.

In particular, given the ordering induced by a likelihood function (and/or profile likelihood function):

 θ1 is preferred to θ2 iff L(θ1)>L(θ2), (6.1)

we can define a likelihood distance via

 DL(θ1,θ2)=|logL(θ2)L(θ1)|=|logL(θ1)L(θ2)|. (6.2)

This distance has the properties of being symmetric, additive and zero iff . Tarantola (2006) argues that this notion of distance is widely applicable for many types of qualitative orderings. In the present case it is, of course, just the well-known log-likelihood ratio function. We propose then that, accepting that the likelihood gives a natural qualitative preference or plausibility ordering, the log-likelihood then gives a natural distance in this ‘qualitative space’. There remains, however, a choice of logarithm base and/or a choice of arbitrary distance scale factor; thus we can’t fully remove some of the ‘qualitative’ features associated with pure likelihood theory without a further choice of reference. One natural choice might be to take the minimum distance to a fully saturated model, i.e. one which can fit the data perfectly, in which case one would be interested in how much ‘fit’ to trade-off against parsimony considerations (Edwards, 1992).

Interestingly, the combination of replacing addition operations by maximization and then working in log-space (wherein multiplication becomes addition) corresponds to completing the ‘tropicalization’ of probability theory: moving from an algebraic structure in terms of to one in terms of . This is the subject of ‘tropical algebra’, which also goes by the name ‘max-plus’ algebra, and is a popular special instance of idempotent mathematics with applications to decision theory, uncertainty quantification, statistical inference and optimization (see e.g. Speyer and Sturmfels, 2009; Akian, Quadrat and Viot, 1996; Litvinov, 2007; Pachter and Sturmfels, 2004; Bernhard, 2000, for some relevant starting points in this area). A natural interpretation of negative log-likelihood functions in this context is as ‘cost measures’; these have also been termed ‘Maslov measures’, due to their origins in Maslov’s idempotent probability theory (Akian, Quadrat and Viot, 1996; Bernhard, 2000). These analogies are explored in detail by Akian, Quadrat and Viot (1996)

, where the natural analogue of a random variable is a decision variable, the analogue of a Markov chain is a Bellman chain (i.e. the Bellman equation from the subject of dynamic programming) and so on.

Finally, however, we note that even if profile likelihood is accepted as the natural analogue of marginal probability, the evidential interpretation of profile likelihood may still have difficulties; this is discussed further below.

7 Discussion

7.1 Objections to profile likelihood

As discussed, it is frequently asserted that profile likelihood is not a true likelihood (Aitkin, 2005; Royall, 1997; Pawitan, 2001; Rohde, 2014; Evans, 2015). Common reasons include: that it is obtained from a likelihood via maximization (Aitkin, 2005), that it is not based directly on observable quantities (Royall, 1997; Pawitan, 2001; Rohde, 2014) and that it lacks particular repeated sampling properties (Royall, 1997; Cox and Barndorff-Nielsen, 1994).

None of the above objections appear to the present author to apply to the following: given a starting or ‘background’ likelihood function, profile likelihood satisfies the axioms of possibility theory, in which the basic additivity axiom of probability theory is replaced by a maxitivity axiom. Profile likelihood is simply the natural possibilistic counterpart to marginal probability, where additive integration is replaced by a maxitive analogue. We thus argue that, if marginal probability is a ‘true’ probability, then profile likelihood should likewise be considered a ‘true’ likelihood, at least when likelihood theory is interpreted in a possibilistic manner. Negative log-likelihood functions can then be naturally interpreted as cost measures in the sense of tropical mathematics.

7.2 Fixed data

Regarding the second two objections mentioned above: observable quantities and repeated sampling properties, it is important to note that the given data must be held fixed to give a consistent background likelihood over which to profile. Given fixed data one has a fixed possibility measure and thus can consider ‘marginal’ - i.e. profile - likelihoods. In contrast, repeated sampling will produce a distribution of such possibility measures, and these may or may not have good frequentist properties. None of this is in contrast to marginal probability: changing the distribution over which we marginalize changes the resulting marginal probability. Of course, despite this caveat, profile likelihood often does have good repeated sampling properties (Royall, 1997; Cox and Barndorff-Nielsen, 1994) and also plays a key role in frequentist theory, though we do not discuss this further here. One consequence is that our conception of profile likelihood does not generally satisfy properties such as zero expectation of the associated score function (Cox and Barndorff-Nielsen, 1994; Pawitan, 2001). These are, however, properties dependent on particular repeated sampling notions such as ‘unbiasedness’, and hence more properly considered as frequentist concepts. The present approach is more suitable for those seeking a non-probabilistic ‘plausibility’ measure, as induced by data that are considered fixed once observed.

7.3 Why?

A natural question, perhaps, is why worry about whether profile likelihood is a true likelihood? One answer is that profile likelihood is a widely used tool but is often dismissed as ‘ad-hoc’ or lacking proper justification. This gives the impression that, for example, likelihood theory is lacking in comparison with e.g. Bayesian theory in terms of systematic methods for dealing with nuisance parameters. By understanding that profile likelihood does in fact have a systematic basis in terms of possibility theory practitioners and students can better understand and reason about a widely popular and useful tool. Understanding the connection to possibilistic as opposed to probabilistic reasoning may also help explain why profile likelihood has emerged as a particularly promising method of identifiability analysis (Raue et al., 2009), where identifiability is traditionally a prerequisite for probabilistic analysis. Of course, as indicated, the price of accepting profile likelihood as a ‘true’ likelihood is an interpretation in terms of pure likelihood theory, and this makes the connections to repeated sampling properties more complicated. We see no need however, to restrict oneself to one perspective on statistical inference - the present possibilistic view can complement

other approaches such as frequentist statistics or Bayesian statistics. Furthermore, this analogy opens strong connections between likelihood theory and the optimization literature; the foundations of such connections have already been explored by e.g.

Akian, Quadrat and Viot (1996); Bernhard (2000) and provide a natural link to pure likelihood decision theory as developed by Cattaneo (2013).

7.4 Ignorance

The possibilistic interpretation of likelihood also helps understand the representation of ignorance. While probabilistic ignorance is not preserved under arbitrary changes of variables (e.g. non-1-1 transformations), even in the discrete case, possibilistic ignorance is in the following sense: if we take the maximum likelihood over a set of possibilities, such as for each , rather than summing them, a flat ‘prior likelihood’ (Edwards, 1969, 1992) over becomes a flat prior likelihood over

. On the other hand, a flat prior probability over

in general becomes non-flat over under non-1-1 changes of variable. Thus a profile prior likelihood has what, in many cases, may be desirable properties as a representation of prior ignorance (see the discussion in Edwards, 1969, 1992, for more on likelihood and the representation of ignorance). This difference in transformation properties was also illustrated in our simple example comparing the probabilistic and possibilistic analysis of criminal evidence. As noted there, however, the relative probabilistic approach a la Evans (2015), reaches conclusions closer to the possibilistic analysis, compared to the conclusions of the ‘absolute’ probabilistic analysis (Michael Evans, personal communication).

7.5 Point function or set function?

Likelihood is traditionally considered a point function as opposed to a set function; this is also related to controversy over defining likelihood functions for so-called composite hypotheses (see e.g. Edwards, 1992; Royall, 1997). Authors such as Basu (2012) have argued, contra e.g. Fisher, that likelihood could be directly extended to a set function. Basu (2012) further developed the argument that this set function could be taken as additive - we are more inclined, here at least, to consider the first possibility, and reject the second. A number of other authors have also considered the question of composite hypotheses, in particular in the context of defining evidence (see e.g. Zhang and Zhang, 2013; Blume, 2013; Bickel, 2012).

We have attempted to avoid the issue of set functions/composite hypotheses somewhat by instead using the concept of a non-1-1 transformation of variables. This allows us to consider the likelihood of subsets of the full/background parameter space based on an indexing statistic, i.e. by using subsets defined via . This approach is based on what amounts to equality constraints, leaving out subsets defined via inequality constraints. It may be desirable to further relax this and simply consider likelihood directly as a set function defined via

 Lp(A)=supx∈A{L(x)} (7.1)

for . This allows for inequality constraints such as those in .

We leave consideration of this approach to future work. Presumably, however, one could recover the present approach by considering some notion of minimal and/or extremal sets of equality constraints, e.g. by restricting attention to those inequality constraints that are active during the profiling/maximization procedure, and hence those that are reduced to binding equality constraints. The interpretation of negative log-likelihoods as cost measures may also be helpful here.

7.6 Evidence

One of the key issues to consider when deciding whether to accept profile likelihoods as ‘true’ likelihoods is whether they can play the same role that ‘full’ likelihoods play in defining evidential measures (Royall, 1997; Aitkin, 2010; Evans, 2015; Zhang and Zhang, 2013; Blume, 2013; Bickel, 2012). Mathematically, it appears clear that profile likelihood is entirely analogous to marginal probability; it is less clear whether - or under what circumstances - one should use marginal (whether maxitive or additive) measures in defining evidence. We believe that this applies equally to the Bayesian approach. A way forward from here would be to separate the questions: first accept profile likelihood as a ‘marginal’ possibility measure, and then investigate under what circumstances marginal measures can be given further evidential interpretations. We suspect that the answer may require additional concepts and/or assumptions like those used in the causal inference literature to separate spurious marginal associations from ‘true’ causation (Pearl, 2009a, b). That is, we suspect that ‘evidence’ may be better defined in causal terms than in either purely probabilistic or purely possibilistic terms. As such, the question of whether or not profile likelihood is a ‘true’ likelihood should be independent of whether it plays the role of an evidential measure, unless the definition of likelihood is itself explicitly supplemented with causal assumptions.

8 Conclusions

We have argued that profile likelihood has as much claim to being a true likelihood as a marginal probability has to being a true probability distribution. In the case of marginal probability, integration over variables takes probability distributions to probability distributions, while in the case of likelihood, maximization takes likelihood functions to likelihood functions. Maximization can be considered in this context as an alternative (idempotent) notion of integration, and a likelihood function as a maxitive possibility measure. There are some conflicts with both Bayesian and frequentist considerations, however: lack of additivity and lack of some repeated sampling properties, respectively. In our view, these conflicts are not necessarily an issue, as neither additivity nor repeated sampling properties such as unbiasedness are beyond objections. Instead we argue that the present approach gives a self-consistent theory suitable for possibilistic statistical analysis, with a well-defined method of treating nuisance parameters, and which continues in the tradition of ‘pure’ likelihood theories. The connection of profile likelihoods to evidential interpretations appears subtle (as is, we believe, the connection of marginal probabilities to evidence); our view is that this issue should be explored further in the context of formulating additional causal

properties that an evidence measure should satisfy, such as those required to classify marginal correlations into ‘spurious’ and ‘true’ causal relationships. Finally, taking profile likelihood seriously as a ‘true’ likelihood leads naturally to the idea of ‘Tropical Bayesian Inference’, a subject yet to be properly explored by the statistical community.

Acknowledgements

The author would like to thank Michael Evans, Marco Cattaneo, Yudi Pawitan, Alexandre Patriota, Christian Robert and Anthony Edwards for useful comments and/or discussions.

References

• Aitkin (2005) [author] Aitkin, MM. (2005). Profile Likelihood. In Encyclopedia of Biostatistics John Wiley & Sons, Ltd.
• Aitkin (2010) [author] Aitkin, MM. (2010). Statistical Inference: An Integrated Bayesian/Likelihood Approach. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press.
• Akian, Quadrat and Viot (1996) [author] Akian, MM., Quadrat, J PJ. P. Viot, MM. (1996). Duality between probability and optimization. In Idempotency (JJ. Gunawardena, ed.) Cambridge University Press.
• Au and Tam (1999) [author] Au, ChiC. Tam, JudyJ. (1999). Transforming Variables Using the Dirac Generalized Function. Am. Stat. 53 270–272.
• Augustin et al. (2014) [author] Augustin, ThomasT., Coolen, Frank P AF. P. A., de Cooman, GertG. Troffaes, Matthias C MM. C. M. (2014). Introduction to imprecise probabilities. John Wiley & Sons.
• Basu (2012) [author] Basu, DD. (2012). Statistical Information and Likelihood: A Collection of Critical Essays by Dr. D. Basu. Springer Science & Business Media.
• Basu, Shioya and Park (2011) [author] Basu, AyanendranathA., Shioya, HiroyukiH. Park, ChanseokC. (2011). Statistical Inference: The Minimum Distance Approach. CRC Press.
• Bayarri, DeGroot and Kadane (1988) [author] Bayarri, MM., DeGroot, MM. Kadane, JJ. (1988). What is the likelihood function? (with discussion). Statistical Decision Theory and Related Topics IV. (eds. SS Gupta and JO Berger) Springer, New York 1 33.
• Bayarri and DeGroot (1992) [author] Bayarri, M JM. J. DeGroot, M HM. H. (1992). Difficulties and ambiguities in the definition of a likelihood function. J. It. Statist. Soc. 1 1–15.
• Bernhard (2000) [author] Bernhard, PierreP. (2000). Max-Plus Algebra and Mathematical Fear in Dynamic Optimization. Set-Valued Analysis 8 71–84.
• Bickel (2012) [author] Bickel, David RD. R. (2012). The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation. Stat. Sin. 22 1147–1198.
• Bjørnstad (1996) [author] Bjørnstad, Jan FJ. F. (1996). On the Generalization of the Likelihood Function and the Likelihood Principle. J. Am. Stat. Assoc. 91 791–806.
• Blume (2013) [author] Blume, Jeffrey DJ. D. (2013). Likelihood and Composite Hypotheses [Comment on “A Likelihood Paradigm for Clinical Trials”]. J. Stat. Theory Pract. 7 183–186.
• Cattaneo (2013) [author] Cattaneo, Marco E GM. E. G. (2013). Likelihood decision functions. Electron. J. Stat. 7 2924–2946.
• Cattaneo (2017) [author] Cattaneo, Marco E G VM. E. G. V. (2017). The likelihood interpretation as the foundation of fuzzy set theory. Int. J. Approx. Reason.
• Cox and Barndorff-Nielsen (1994) [author] Cox, D RD. R. Barndorff-Nielsen, O EO. E. (1994). Inference and asymptotics. Chapman and Hall, London.
• Dubois, Moral and Prade (1997) [author] Dubois, DidierD., Moral, SerafinS. Prade, HenriH. (1997). A Semantics for Possibility Theory Based on Likelihoods. J. Math. Anal. Appl. 205 359–380.
• Edwards (1969) [author] Edwards, Anthony W FA. W. F. (1969). Statistical methods in scientific inference. Nature 222 1233–1237.
• Edwards (1992) [author] Edwards, Anthony W FA. W. F. (1992). Likelihood, expanded ed. Johns Hopkins University Press, Baltimore.
• Evans (2015) [author] Evans, MichaelM. (2015). Measuring Statistical Evidence Using Relative Belief. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press.
• Halpern (2017) [author] Halpern, Joseph YJ. Y. (2017). Reasoning about Uncertainty. MIT Press.
• Khuri (2004) [author] Khuri, Andre IA. I. (2004). Applications of Dirac’s delta function in statistics. Internat. J. Math. Ed. Sci. Tech. 35 185–195.
• Kolokoltsov and Maslov (1997) [author] Kolokoltsov, VasilyV. Maslov, Victor PV. P. (1997). Idempotent Analysis and Its Applications. Springer Science & Business Media.
• Litvinov (2007) [author] Litvinov, G LG. L. (2007). Maslov dequantization, idempotent and tropical mathematics: A brief introduction. J. Math. Sci. 140 426–444.
• Maslov (1992) [author] Maslov, V PV. P. (1992). Idempotent Analysis. American Mathematical Soc.
• Pachter and Sturmfels (2004) [author] Pachter, LiorL. Sturmfels, BerndB. (2004). Tropical geometry of statistical models. Proc. Natl. Acad. Sci. U. S. A. 101 16132–16137.
• Pawitan (2001) [author] Pawitan, YY. (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford science publications. OUP Oxford.
• Pearl (2009a) [author] Pearl, JudeaJ. (2009a). Causal inference in statistics: An overview. Stat. Surv. 3 96–146.
• Pearl (2009b) [author] Pearl, JudeaJ. (2009b). Causality. Cambridge University Press.
• Puhalskii (2001) [author] Puhalskii, AnatoliiA. (2001). Large Deviations and Idempotent Probability. CRC Press.
• Raue et al. (2009) [author] Raue, AA., Kreutz, CC., Maiwald, TT., Bachmann, JJ., Schilling, MM., Klingmüller, UU. Timmer, JJ. (2009). Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25 1923–1929.
• Rohde (2014) [author] Rohde, Charles AC. A. (2014). Introductory Statistical Inference with the Likelihood Function:. Springer International Publishing.
• Royall (1997) [author] Royall, RichardR. (1997). Statistical Evidence: A Likelihood Paradigm. CRC Press.
• Speyer and Sturmfels (2009) [author] Speyer, DavidD. Sturmfels, BerndB. (2009). Tropical Mathematics. Math. Mag. 82.
• Tarantola (2006) [author] Tarantola, AlbertA. (2006). Elements for Physics: Quantities, Qualities, and Intrinsic Theories. Springer Science & Business Media.
• Van Kampen (1992) [author] Van Kampen, Nicolaas GodfriedN. G. (1992). Stochastic processes in physics and chemistry 1. Elsevier.
• Zhang and Zhang (2013) [author] Zhang, ZhiweiZ. Zhang, BoB. (2013). A Likelihood Paradigm for Clinical Trials. J. Stat. Theory Pract. 7 157–177.